We propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment (MAC). Our approach is built on three key innovations:
DAMCS integrates A-KGMS and S-CS to improve long-term collaboration, reducing redundant actions and enhancing role allocation in cooperative tasks. Evaluations using MAC show that DAMCS outperforms baselines, cutting task completion steps by up to 74% compared to single-agent scenarios. Our framework builds upon recent advancements in LLM-powered agents, such as Generative Agents, to enhance decentralized multi-agent cooperation. By enabling agents to autonomously plan, coordinate, and optimize communication, DAMCS aims to advance scalable, decentralized LLM-powered multi-agent systems for real-world applications.
Developing intelligent agents for long-term cooperation in dynamic open-world scenarios is a major challenge in multi-agent systems. Traditional Multi-agent Reinforcement Learning (MARL) frameworks like centralized training decentralized execution (CTDE) struggle with scalability and flexibility. They require centralized long-term planning, which is difficult without custom reward functions, and face challenges in processing multi-modal data. CTDE approaches also assume fixed cooperation strategies, making them impractical in dynamic environments where agents need to adapt and plan independently. To address decentralized multi-agent cooperation, we propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment. Our generative agents, powered by Large Language Models (LLMs), are more scalable than traditional MARL agents by leveraging external knowledge and language for long-term planning and reasoning. Instead of fully sharing information from all past experiences, DAMCS introduces a multi-modal memory system organized as a hierarchical knowledge graph and a structured communication protocol to optimize agent cooperation. This allows agents to reason from past interactions and share relevant information efficiently. Experiments on novel multi-agent open-world tasks show that DAMCS outperforms both MARL and LLM baselines in task efficiency and collaboration. Compared to single-agent scenarios, the two-agent scenario achieves the same goal with 63% fewer steps, and the six-agent scenario with 74% fewer steps, highlighting the importance of adaptive memory and structured communication in achieving long-term goals.
The figures showcase the training performance of reinforcement learning (RL) agents in a complex environment. The first figure presents a single-agent trained using Proximal Policy Optimization (PPO), while the second figure illustrates a multi-agent setup using Multi-Agent Deep Deterministic Policy Gradient (MADDPG). In both cases, the agents initially improve their rewards, but they soon hit a bottleneck as further progress requires learning advanced skills in a hierarchical order. The RL agents struggle to efficiently acquire these skills, leading to slow and unstable learning in this complex environment.
Two agents with communication complete tasks faster than two agents without communication, who complete tasks at about the same speed as a single agent. The basic agent is slower than agents with our memory system.
Six agents with communication complete tasks faster than six agents without communication. They are also faster than two agents with communication.
While each agent independently controls its own behavior and maintains its own memory, the Structured Communication System ensures they remain aware of others’ progress, enabling timely and adaptive cooperation.
Agent 0, responsible for tool crafting, follows a sequential memory structure, reflecting hierarchical goal progression. Agent 1, tasked with assisting Agent 0, develops clustered memories centered on crafting and resource gathering, helping Agent 0 with its needs. Similarly, Agent 2 supports Agent 1, with memory clusters focused on cooperative material collection and crafting tasks. These agents dynamically adjust their strategies based on shared information in a decentralized manner.
Agents 3 and 4, focused on resource sharing, exhibit simpler, less interconnected memory structures since their role is primarily to collect and distribute materials rather than craft tools. Agent 5, which monitors the overall team’s progress, integrates information from all agents and determines when to transition toward diamond collection. The S-CS plays a crucial role in shaping these memory patterns.
@misc{yang2025damcs,
title={LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning},
author={Hanqing Yang and Jingdi Chen and Marie Siew and Tania Lorido-Botran and Carlee Joe-Wong},
year={2025},
eprint={2502.05453},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.05453},
}