Three New Research Papers Address LLM Quantization and Text-to-SQL Challenges

Three recent papers on arXiv tackle key challenges in deploying and utilizing large language models more efficiently.

Sherry: Hardware-Efficient Ternary Quantization

According to arXiv paper 2601.07892v1, researchers propose “Sherry,” a method for 1.25-bit ternary quantization combined with fine-grained sparsification. The paper addresses the challenge of deploying LLMs on resource-constrained edge devices, noting that “prohibitive memory and computational requirements” increasingly hinder such deployment. The approach reduces weights to {-1, 0, +1} values, though the abstract provided is truncated.

Arctic-Text2SQL-R1: Enhanced SQL Generation

A replacement cross-listing (arXiv:2505.20315v2) introduces “Arctic-Text2SQL-R1,” focusing on translating natural language into SQL queries. The paper describes this as “a longstanding challenge at the intersection of natural language understanding and structured data access.” While acknowledging that large language models have “significantly improved fluency in SQL generation,” the abstract cuts off before detailing the specific contribution.

Sliced-Wasserstein Distribution Alignment

Paper 2601.07878v1 proposes using Sliced-Wasserstein Distribution Alignment Loss for ultra-low-bit quantization. The research is motivated by the “steep and often hidden economic and environmental costs” of LLMs due to “resource usage inefficiency during deployment.” The method aims to improve “energy and memory efficiency through representing models” with reduced precision, though implementation details are not provided in the truncated abstract.