CUBE: Collaborative Multi‑Agent Block‑Pushing Environment for Collective Planning with LLM Agents

Hanqing Yang*1, Narjes Nourzad*†2, Shiyu Chen1, Carlee Joe-Wong1

1Carnegie Mellon University
2University of Southern California
NeurIPS 2025 Workshop: Scaling Environments for Agents (SEA)

*Equal Contribution    Work done during an internship at Carnegie Mellon University
CUBE Teaser

CUBE is a grid‑world where teams coordinate to push weighted blocks into a goal zone. A single scaling parameter n jointly determines team size, block weights, and grid size, creating a transparent curriculum from small to large‑scale cooperation. Each panel shows a snapshot of a cooperative block-pushing scenario at increasing scales (n = 2 to 256), under a simple always-move-right agent policy.

What is CUBE?

CUBE is a lightweight, portable, and scalable multi-agent environment that unifies symbolic reasoning with embodied interaction. It exposes both primitive grid actions (UP, DOWN, LEFT, RIGHT, STAY) and a symbolic action library (e.g., MOVETOBLOCK, RENDEZVOUS, PUSH, WAIT), enabling LLM-based planners, RL agents, and hybrid methods to operate within the same task.

  • Embodied multi-agent cooperation: Agents collaborate in a 2D grid to push weighted blocks into a goal zone, subject to embodied constraints such as collisions, congestion, and block-chains. These spatial dependencies make cooperation both necessary and measurable, as heavier blocks demand greater synchronization and coordination.
  • Lightweight, portable, and scalable: Implemented in Python with Numba acceleration, CUBE runs efficiently with hundreds of agents on a single CPU core, supporting rapid experimentation and large-scale benchmarking.
  • Dual-layer design: CUBE couples a symbolic layer and a vector-based layer across observation, action, and feedback channels. This architecture integrates symbolic planning with low-level state representations, enabling scalable and interpretable approaches to multi-agent coordination.
  • Fair benchmarking: A single parameter, n, determines agent count, grid size, and block distribution, yielding comparable cooperative difficulty for the same n and progressively harder coordination as n increases. This creates a transparent, scalable curriculum for evaluating multi-agent cooperation.
CUBE overview and scaling diagram

Abstract

We introduce CUBE (Collaborative Multi-Agent Block-Pushing Environment), a lightweight yet expressive testbed for studying embodied cooperation in multi-agent systems. While traditional reinforcement learning benchmarks emphasize low-level action spaces and scalar rewards, and symbolic planning domains emphasize logical reasoning under deterministic transitions, neither alone captures the combination of embodiment, uncertainty, and symbolic structure needed to evaluate emerging embodied LLM-based agents. CUBE addresses this gap by wrapping primitive block-pushing actions into a symbolic action vocabulary, enabling interpretable and compositional cooperation strategies. It also provides a library of symbolic concepts for customized feedback at both per-agent and collective levels. These features allow the same environment to support reinforcement learning and LLM-based agents or hybrid architectures. For ease and fair comparison across experiments, a single parameter n specifies the number of agents, grid size, and block weights, creating a transparent curriculum that scales difficulty and cooperation demands. CUBE thus offers a flexible platform for scalable evaluation of algorithms that integrate symbolic reasoning with embodied multi-agent interaction.

Chains in Environment and Failure Cases due to Agent/Environment Dynamics

In CUBE, blocks can form chains—composite structures that require coordinated force from multiple agents to move. A pushing line forms when agents align along a block’s face, each contributing unit force in the push direction. Motion succeeds only if the total applied force meets or exceeds the block’s weight and the frontmost destination cell is free. The first row of figures illustrates these geometric outcomes: when agents align, the chain advances; when misaligned or obstructed, it fails.

The second row shows failure cases due to agent and environment dynamics. Even when geometry permits motion, local heuristics or asynchronous policies may cause deadlock, blocking, or oscillation. Agents can obstruct one another’s approach or become trapped by congestion, revealing how spatial and temporal dependencies shape cooperative success and failure in embodied settings.

Successful pushing chain with agents aligned

Successful chain: Aligned agents form a stable pushing line, transferring sufficient force to move the composite block.

Geometric failure case

Geometric failure: Misalignment or blocked destination cell prevents the chain from advancing.

Dynamics failure case 1

Failure I: Agents cannot stage on the target face due to congestion—no free cells for coordinated pushing.

Dynamics failure case 2

Failure II: Agents block one another or oscillate, preventing stable cooperation despite a feasible plan.

Task Performance

LLM agents complete progressively heavier blocks as n increases; heuristics perform well on small layouts but struggle as coordination demands grow.

Completed blocks vs number of agents
Average steps vs n by model

Average steps vs. n (cap 200).

Runtime vs n by model

Runtime vs. n by model.

Mean time per step vs agents

Mean time per step vs agents.

Process memory usage vs agents

Memory usage vs agents.

CPU utilization vs agents

CPU utilization vs agents.

Action Runtime Profiling

Runtime vs agents by action type

Mean runtime per action vs agents.

Action runtime heatmap

Heatmap of action runtimes (log‑scale).

Video Presentation

BibTeX

@inproceedings{yang2025cube,
  title     = {CUBE: Collaborative Multi-Agent Block-Pushing Environment for Collective Planning with LLM Agents},
  author    = {Hanqing Yang and Narjes Nourzad and Shiyu Chen and Carlee Joe-Wong},
  booktitle = {NeurIPS 2025 Workshop on Scaling Environments for Agents (SEA)},
  year      = {2025},
  url       = {https://happyeureka.github.io/cube}
}