Papers
arxiv:2510.27688

Continuous Autoregressive Language Models

Published on Oct 31, 2025
· Submitted by
Chenze Shao
on Nov 3, 2025
#3 Paper of the day
Authors:
,
,
,

Abstract

Continuous Autoregressive Language Models (CALM) improve language model efficiency by predicting continuous vectors instead of discrete tokens, reducing computational cost while maintaining performance.

The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models. Code: https://github.com/shaochenze/calm. Project: https://shaochenze.github.io/blog/2025/CALM.

Community

Paper submitter

From discrete next-token prediction to continuous next-vector prediction
Code: https://github.com/shaochenze/calm
Blog: https://shaochenze.github.io/blog/2025/CALM

this is like a large concept model - it performs well because tokens are often chunked through its vicinity.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2510.27688
Don't have the latest CLI?
curl -LsSf /cli/install.sh | bash

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.27688 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.27688 in a Space README.md to link it from this page.

Collections including this paper 11