Studies Reveal Limitations of Large Language Models in Math and Contract Generation

Two recent studies underscore significant limitations in large language models’ reasoning capabilities, despite their impressive text generation abilities.

According to The New York Times, mathematicians are finding that large language models “struggle to solve research-level math questions,” requiring human experts to properly assess their poor performance in this domain. The research focuses specifically on advanced mathematical problems beyond basic computation.

Separately, a paper published on arXiv (arXiv:2602.09384v1) titled “Contractual Deepfakes: Can Large Language Models Generate Contracts?” examines LLMs’ ability to create legal contracts. According to the abstract, “LLMs do not understand the meaning of words, have no sense of context and cannot reason.” The researchers note that despite LLMs’ “unprecedented ability to generate text,” their output “constitutes an approximation of statistically dominant word patterns” rather than genuine comprehension.

These findings highlight a growing recognition in the research community that while LLMs excel at pattern matching and text generation, they face fundamental challenges when tasks require deeper reasoning, contextual understanding, or domain-specific expertise—whether in advanced mathematics or legal document creation.