Two New arXiv Papers Address Speech Processing Unification and LLM Quantization

Two recent papers on arXiv tackle different aspects of AI model efficiency and integration.

Unified Speech Processing

According to arXiv paper 2601.10770v1, researchers have proposed using autoregressive transformers to unify speech recognition, synthesis, and conversion. The paper notes that “traditional speech systems typically rely on separate, task-specific models for text-to-speech (TTS), automatic speech recognition (ASR), and voice conversion (VC), resulting in fragmented pipelines that limit scalability, efficiency, and cross-task” performance. The work aims to address these limitations through a unified architecture.

Quantization for Large Language Models

A second paper (arXiv:2601.11200v1) introduces FAQ (Family-Aware Quantization), a method for improving post-training quantization of large language models. According to the abstract, while “post-training quantization (PTQ) provides an efficient numerical compression scheme for deploying large language models (LLMs) on resource-constrained devices,” current approaches face challenges with “the representativeness and universality of calibration data.” The FAQ method aims to mitigate quantization error by regenerating calibration data.

Both papers represent ongoing efforts to improve AI model efficiency and integration across different tasks.