LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning

Hanqing Yang¹, Jingdi Chen¹, Marie Siew², Tania Lorido-Botran^{3 4}, Carlee Joe-Wong¹

¹ Carnegie Mellon University
² Singapore University of Technology and Design
³ Roblox
⁴ Northeastern University

Six language model-powered agents work together to mine a diamond in the Multi-Agent Crafter environment. This environment allows a customizable number of agents to respawn and interact with each other. The goal is to collect a diamond as quickly as possible, and the environment terminates once a diamond is found. To achieve this, agents must craft the necessary tools by following a hierarchical crafting order. Additionally, they must maintain their health to remain in the environment.

What does our framework do?

We propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment (MAC). Our approach is built on three key innovations:

Multi-Agent Crafter (MAC) Benchmark: We introduce MAC, an open-world testbed for multi-agent cooperation, extending Crafter. It provides a realistic platform for structured agent communication, coordination, and task allocation.
Decentralized Adaptive Knowledge Graph Memory (A-KGMS): A hierarchical memory system that enables agents to store, retrieve, and adapt knowledge dynamically, enhancing long-term planning and execution.
Structured Communication System (S-CS): A structured messaging framework that reduces redundant communication while ensuring efficient information sharing among agents.

DAMCS integrates A-KGMS and S-CS to improve long-term collaboration, reducing redundant actions and enhancing role allocation in cooperative tasks. Evaluations using MAC show that DAMCS outperforms baselines, cutting task completion steps by up to 74% compared to single-agent scenarios. Our framework builds upon recent advancements in LLM-powered agents, such as Generative Agents, to enhance decentralized multi-agent cooperation. By enabling agents to autonomously plan, coordinate, and optimize communication, DAMCS aims to advance scalable, decentralized LLM-powered multi-agent systems for real-world applications.

Abstract

Developing intelligent agents for long-term cooperation in dynamic open-world scenarios is a major challenge in multi-agent systems. Traditional Multi-agent Reinforcement Learning (MARL) frameworks like centralized training decentralized execution (CTDE) struggle with scalability and flexibility. They require centralized long-term planning, which is difficult without custom reward functions, and face challenges in processing multi-modal data. CTDE approaches also assume fixed cooperation strategies, making them impractical in dynamic environments where agents need to adapt and plan independently. To address decentralized multi-agent cooperation, we propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment. Our generative agents, powered by Large Language Models (LLMs), are more scalable than traditional MARL agents by leveraging external knowledge and language for long-term planning and reasoning. Instead of fully sharing information from all past experiences, DAMCS introduces a multi-modal memory system organized as a hierarchical knowledge graph and a structured communication protocol to optimize agent cooperation. This allows agents to reason from past interactions and share relevant information efficiently. Experiments on novel multi-agent open-world tasks show that DAMCS outperforms both MARL and LLM baselines in task efficiency and collaboration. Compared to single-agent scenarios, the two-agent scenario achieves the same goal with 63% fewer steps, and the six-agent scenario with 74% fewer steps, highlighting the importance of adaptive memory and structured communication in achieving long-term goals.

Multiple agents respawn on the map and interact with each other through a memory system and communication protocol, aiming to collect a diamond as fast as possible.

The environment allows a customizable number of agents to respawn in the same environment and interact with each other. The goal is to collect a diamond as soon as possible, and the environment terminates after a diamond is found. To collect a diamond, agents must craft the appropriate tools, following a hierarchical crafting order. The agents also need to maintain their health status to stay in the environment.

The memory system.

Agents collaborate by exchanging messages to coordinate tasks and share resources. An arrow from agent i to agent j indicates that agent i is helping agent j. Communication then flows in the opposite direction.

Video Presentation

RL Training

The figures showcase the training performance of reinforcement learning (RL) agents in a complex environment. The first figure presents a single-agent trained using Proximal Policy Optimization (PPO), while the second figure illustrates a multi-agent setup using Multi-Agent Deep Deterministic Policy Gradient (MADDPG). In both cases, the agents initially improve their rewards, but they soon hit a bottleneck as further progress requires learning advanced skills in a hierarchical order. The RL agents struggle to efficiently acquire these skills, leading to slow and unstable learning in this complex environment.

Task Completion Time

Two agents with communication complete tasks faster than two agents without communication, who complete tasks at about the same speed as a single agent. The basic agent is slower than agents with our memory system.

Six agents with communication complete tasks faster than six agents without communication. They are also faster than two agents with communication.

Memory of Each Agent in Gameplay

While each agent independently controls its own behavior and maintains its own memory, the Structured Communication System ensures they remain aware of others’ progress, enabling timely and adaptive cooperation.

Agent 0, responsible for tool crafting, follows a sequential memory structure, reflecting hierarchical goal progression. Agent 1, tasked with assisting Agent 0, develops clustered memories centered on crafting and resource gathering, helping Agent 0 with its needs. Similarly, Agent 2 supports Agent 1, with memory clusters focused on cooperative material collection and crafting tasks. These agents dynamically adjust their strategies based on shared information in a decentralized manner.

Agents 3 and 4, focused on resource sharing, exhibit simpler, less interconnected memory structures since their role is primarily to collect and distribute materials rather than craft tools. Agent 5, which monitors the overall team’s progress, integrates information from all agents and determines when to transition toward diamond collection. The S-CS plays a crucial role in shaping these memory patterns.

MA-Crafter

MA-Crafter is a scalable and configurable benchmark designed to evaluate the cooperation and planning capabilities of large language model (LLM) agents in open-world, multi-agent environments. It supports symbolic reasoning, real-time execution, and detailed diagnostics to facilitate fine-grained evaluation of agent coordination.

Challenges: Agents must resolve hierarchical dependencies, synchronize plans, and cooperate under inventory, spatial, and temporal constraints.
Scalability: Supports up to 100 agents with partial observability and independent inventories.
Scalable Task Difficulty: Task complexity can be customized based on team size, enabling evaluation of scalable multi-agent systems.
Dual Interface: Combines a symbolic action interface with a Gym-style execution layer.
Compatibility: Compatible with both LLM agents and reinforcement learning agents.
Detailed Logging: Tracks plans, actions, tokens, errors, spatial coverage, and more for diagnostic and analysis purposes.

MA-Crafter opens up new research directions in multi-agent coordination, task allocation, communication bottlenecks, and scalable cooperation frameworks.

Illustration of MA-Crafter environment and agent collaboration

BibTeX


        @misc{yang2025damcs,
          title={LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning}, 
          author={Hanqing Yang and Jingdi Chen and Marie Siew and Tania Lorido-Botran and Carlee Joe-Wong},
          year={2025},
          eprint={2502.05453},
          archivePrefix={arXiv},
          primaryClass={cs.AI},
          url={https://arxiv.org/abs/2502.05453}, 
        }