CUBE: Collaborative Multi-Agent Block-Pushing Environment for LLM Agents

* Equal contribution    Work done during an internship at Carnegie Mellon University
MY ALT TEXT

CUBE is a scalable environment where agent count, grid size, and block distribution can be varied to control task complexity. A single parameter n defines configurations, with larger values creating harder settings. This provides a clear curriculum from simple to large-scale coordination, making CUBE a flexible platform for evaluating algorithms that combine symbolic reasoning and embodied multi-agent interaction.

What is CUBE?

CUBE is a lightweight, embodied grid world for studying cooperative multi-agent behavior with both RL and LLM agents. It combines primitive block-pushing dynamics with a symbolic action vocabulary and a library of symbolic concepts, enabling interpretable planning, synchronized pushing, and customized feedback at per-agent and team levels. :contentReference[oaicite:0]{index=0}

  • Dual interface: primitive actions for RL pipelines and symbolic actions/observations for LLM planners. :contentReference[oaicite:1]{index=1}
  • Single scaling parameter n: jointly controls agent count, block weights, and grid size, yielding a transparent difficulty curriculum. :contentReference[oaicite:2]{index=2}
  • Symbolic concepts: reusable functions (e.g., alignment, quorum, progress) to design evaluation and feedback without changing core mechanics. :contentReference[oaicite:3]{index=3}
  • Efficient & portable: native Python with Numba; scales to hundreds of agents on a single CPU core. :contentReference[oaicite:4]{index=4}
CUBE overview
Scaling mechanism and example layout (see diagrams on p.2 of the paper). :contentReference[oaicite:5]{index=5}

Abstract

We introduce CUBE, a cooperative block-pushing testbed that blends embodied dynamics with symbolic structure. Primitive pushes are wrapped into interpretable symbolic actions and paired with symbolic concepts for per-agent and team-level feedback. CUBE’s single parameter n scales agents, block weights, and grid size, creating a transparent curriculum from minimal to large-scale coordination. The environment supports both RL and LLM agents and runs efficiently on commodity CPUs. :contentReference[oaicite:6]{index=6}

Video Presentation

Baselines

A simple heuristic planner reliably completes blocks but can deadlock under congestion; naive zero-shot LLM agents (e.g., gpt-4o, gpt-4o-mini) can produce executable symbolic plans yet are less stable and efficient as coordination scales. (Figures 6–7). :contentReference[oaicite:11]{index=11}

Takeaway: symbolic structure helps LLM agents act, but robust cooperation benefits from synchronized actions and congestion-aware strategies. :contentReference[oaicite:12]{index=12}

Baseline comparisons
Steps and runtime vs. number of agents (n). :contentReference[oaicite:13]{index=13}

Key Features

  • Embodied + symbolic: primitive pushes wrapped in symbolic actions for interpretable plans. :contentReference[oaicite:14]{index=14}
  • Transparent scaling: one parameter n controls agents, block weights, and grid size. :contentReference[oaicite:15]{index=15}
  • Custom feedback: concept library enables per-agent/team metrics and progress signals. :contentReference[oaicite:16]{index=16}
  • Plug-and-play: works with RL and LLM agents; CPU-friendly with Numba acceleration. :contentReference[oaicite:17]{index=17}
CUBE grid and blocks

BibTeX

@inproceedings{yang2025cube,
  title={CUBE: Collaborative Multi-Agent Block-Pushing Environment for LLM Agents},
  author={Yang, Hanqing and Nourzad, Narjes and Chen, Shiyu and Joe-Wong, Carlee},
  booktitle={NeurIPS 2025 Workshop on Scaling Environments for Agents (SEA)},
  note={Project page: https://happyeureka.github.io/cube}
}