kimi-k2.6-eagle3-mla

Eagle3 MTP draft model with MLA (Multi-Latent Attention) for accelerating inference of Kimi-K2.6.

This is a fine-tuned draft, anchored to the official lightseekorg/kimi-k2.6-eagle3-mla initialization. It targets multi-hop (downstream-position) acceptance while preserving the first-hop gain, evaluated by runtime accept-length on a frozen full-context held-out set.

Fine-tune setup

Init: lightseekorg/kimi-k2.6-eagle3-mla (official MLA weights)
Objective: Eagle3 distillation + multi-step TTT supervision (ttt_steps=4, ttt_step_loss_decay=1.0, off-policy downstream tokens)
Anti-over-specialization: L2-SP weight-space anchor toward the init (penalize trainable-param drift; lambda=1e-4)
Optimizer: lr 2e-5, cosine schedule
Checkpoint: best by held-out validation loss on the K2.6 ruler (step 95400; val_loss 5.490, the global minimum of the v3 run)

Performance

Primary metric is accept_length — average tokens accepted per speculation step with num_speculative_tokens=3 (higher is better). Evaluated with vLLM 0.20.0 on 8x H200, TP=8, max-model-len 32768, greedy.

On a frozen K2.6 full-context held-out judge set (914 prompts):

Model	accept_len
lightseek (official init)	2.285
this model	2.308

This draft improves over the official init on the K2.6 held-out distribution.

Note on distribution shift

This checkpoint is selected by validation loss on the K2.6 teacher distribution. In cross-version testing against real Kimi-K2.7-Code production traffic, the official lightseek init currently shows higher accept-length than this fine-tune — i.e. the K2.6 fine-tune over-specializes to its training distribution. If your serving traffic differs substantially from long multi-turn K2.6 dialogues, benchmark both this draft and the lightseek init on your own traffic before choosing. (The L2-SP anchor above is intended to mitigate this; tuning it against real-traffic accept-length is ongoing.)

Usage

Serve with vLLM as the speculative draft for Kimi-K2.6, with num_speculative_tokens=3 in the speculative-config.

Downloads last month: 51

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for k-l-lambda/kimi-k2.6-eagle3-mla

Base model

moonshotai/Kimi-K2.6

Finetuned

(16)

this model