Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization


Taeyoon Kwon∗1 Dongwook Choi∗1 Hyojun Kim1 Sunghwan Kim1
Seungjun Moon1 Beong-woo Kwak1 Kuan-Hao Huang2 Jinyoung Yeo1


1Yonsei University    2Texas A&M University



Motivation: Why Personalized Embodied Agents?

Motivation for personalized embodied agents

Embodied agents empowered by large language models (LLMs) have recently demonstrated remarkable success in executing object rearrangement tasks in household environments. The primary objective of embodied agents is to provide assistance to users while interacting with the physical world. But do such tasks truly reflect the challenges in providing meaningful assistance to the users?

To provide personalized assistance, embodied agents must understand the unique semantics that users assign to the physical world (e.g., favorite cup, breakfast routine) by leveraging prior interaction histories to interpret dynamic, real-world instructions. Therefore, understanding embodied agents' memory utilization capabilities is crucial for developing personalized embodied agents.

Research Overview: Investigating Challenges and Solutions

We construct Memento, an end-to-end two-stage personalized embodied agent evaluation framework comprising both single-memory and joint-memory tasks to evaluate task performance and quantify how memory utilization affects performance on personalized assistance tasks.

Research Questions: We address the following three key questions:

  • RQ1: Can current LLM-powered embodied agents effectively utilize memory to perform personalized assistance tasks?
  • RQ2: What are the key factors that negatively affect embodied agents' memory utilization capabilities?
  • RQ3: How can we design memory architectures to better support personalized assistance tasks?

Memento: Personalized Embodied Agent Evaluation Framework

Memento framework overview

Overview of Memento.

Two-stage Evaluation Process

(1) Memory Acquisition Stage: Agents perform conventional object rearrangement tasks with instructions that contain sufficient personalized knowledge. These episodes are defined as ϵacq = (S, Iacq, g) where ϕ(Iacq) → g, meaning agents can directly infer the goal from instructions. During this stage, agents establish a reference performance baseline while accumulating episodic memory hacq from their interactions.

(2) Memory Utilization Stage: Agents perform identical tasks but use underspecified instructions ϵutil = (S, Iutil, g). These tasks can only be completed when agents recall and apply relevant personalized knowledge, requiring the extended grounding function ϕ(Iutil, hacq) → g.

Through this evaluation process, we are able to quantify the agent's memory utilization capability by comparing performance between the two stages.

Task Types

(1) Single-memory tasks: Require information from one episodic memory. These tasks test the agent's ability to retrieve and apply knowledge from a single previous interaction.

(2) Joint-memory tasks: Necessitate synthesis from two distinct memories, formulated as ϵutiljoint = (S, Iutiljoint, [gi;gj]) where i, j denote corresponding reference episodes from the acquisition stage. These tasks evaluate the agent's capability to integrate information from multiple past experiences.

We conduct experiments without memory usage to validate our framework design, confirming that agents struggle significantly with underspecified instructions when memory is not available.

Personalized Knowledge Categories

We categorize personalized knowledge into two types to evaluate different reasoning challenges:

(1) Object Semantics: Tests agents' ability to identify objects based on personal meaning (e.g., my cup, my favorite running gear).

(2) User Patterns: Assesses agents' capacity to reconstruct goals using behavioral patterns (e.g., my remote work setup, my cozy dinner atmosphere).

(RQ1) Evaluating Memory Utilization Across Embodied Agents

LLM-powered Embodied Agents Struggle with Understanding Personalized Knowledge


Main results table showing model performance

Model performance across memory acquisition and utilization stage in Memento.

LLM-powered embodied agents struggle to utilize personalized knowledge effectively. Even GPT-4o shows a 30.5% drop in success rate for joint-memory tasks, with all models experiencing over 20% performance decline. The increased planning cycles and simulation steps indicate misinterpretation of instructions and inefficient exploration.


Current Embodied Agents Can Effectively Recall Object Semantics But Struggle to Comprehend User Patterns


Personalized knowledge type analysis results

The results of personalized knowledge type based analysis (single-memory).

Agents perform well on object semantics tasks but struggle significantly with user patterns, indicating limitations in understanding sequential behavioral information.


Parametric Commonsense Knowledge over Non-parametric Personalized Knowledge


Case study analysis of agent behavior

The case study results of Memento.

Agents tend to rely on commonsense knowledge rather than personalized memory, leading to inconsistent performance when commonsense doesn't align with user preferences.

(RQ2) Scneario-based Analysis For Memory Utilization Bottlenecks


Information Overload Degrades Memory Utilization Performance


Results of classifier-based RMs and PRMs

The results of top-k retrieval based analysis (single-memory).

As the number of retrieved memories increases, performance degrades across all models, indicating that agents struggle with information overload and increasingly rely on commonsense knowledge rather than personalized memories.


Agents Fail to Coordinate Multiple Memories


Results of classifier-based RMs and PRMs

The results of top-k retrieval based analysis (single-memory).

Joint-memory tasks show significantly larger performance drops than single-memory tasks, with GPT-4o declining by 30.5%. Agents struggle to coordinate multiple memories, highlighting the need for improved memory architectures.

(RQ3) Exploring Memory Design for Personalized Embodied Agents


User Profile Memory: A Hierarchical Knowledge Graph Approach

Building upon our finding that episodic memory is essential, we devise a hierarchical knowledge graph-based user profile memory module that manages personalized knowledge independently to provide cleaner, more accessible information to the agent.


User Profile Memory Structure

Illustration of user profile memory structure.

Design Challenge: Organizing knowledge relationships effectively is crucial. Sequential knowledge like user patterns requires sophisticated memory structure to prevent information loss or failing to adapt to evolving knowledge.

Solution: We construct personalized knowledge as a hierarchical knowledge graph with three levels:

  • Top level: Users
  • Middle level: Knowledge types (object semantics and user patterns)
  • Bottom level: Specific elements (objects, patterns, locations)

The graph connects these levels using hierarchical edges for structural relationships and temporal edges for sequential ordering within user patterns, enabling systematic representation and easy updates of personalized knowledge.


Performance Results

User Profile Memory Performance

Agent performance with user profile memory across single- and joint-memory tasks.

Key Finding: User profile memory significantly enhances embodied agents' memory utilization capability. Agents achieve substantial performance improvements across all models for both single-memory and joint-memory tasks, with especially notable gains on user pattern tasks where sequential structure previously posed challenges.

BibTeX

@article{kwon2025embodied,
      title={Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance},
      author={Kwon, Taeyoon and Choi, Dongwook and Kim, Sunghwan and Kim, Hyojun and Moon, Seungjun and Kwak, Beong-woo and Huang, Kuan-Hao and Yeo, Jinyoung},
      journal={arXiv preprint arXiv:2505.16348},
      year={2025}
    }