New Research Improves LLM Spatial Reasoning Through Dimensional Decomposition

Researchers have developed a neuro-symbolic pipeline that significantly improves large language models’ ability to handle spatial construction tasks by separating planning dimensions.

According to a paper published on arxiv.org, the approach uses “2.5-D decomposition” where the LLM plans in a two-dimensional horizontal plane while a deterministic executor computes vertical placement from column occupancy. This method “eliminates an entire class of errors” that LLMs typically make when generating three-dimensional block placements.

On the Build What I Mean benchmark consisting of 160 rounds, GPT-4o-mini using this pipeline achieved 94.6% mean structural accuracy across 12 independent runs, according to the paper. This outperformed both GPT-4o at 90.3% and the best competing system at 76.3%. A controlled ablation study confirmed that 2.5-D decomposition accounted for 50.7 percentage points of accuracy improvement.

The pipeline also demonstrated practical viability on edge hardware. According to the research, Nemotron-3 120B running locally on an NVIDIA Jetson Thor AGX matched the cloud result at 94.5% “with no prompt modifications.”

The paper states the underlying principle of “removing deterministic dimensions from the LLM’s output space” applies to any construction or assembly task where physical constraints like gravity fix one or more degrees of freedom. A transfer experiment on 500 IGLU collaborative building tasks confirmed the approach generalizes beyond the primary benchmark.