Kimi-K2.7-Code-DFlash

DFlash draft GGUFs for speculative decoding with Kimi K2.7 Code in Oxidize.

This is a full DFlash initialization, not a baseinit draft. The draft folds real Kimi K2.7 MLA attention weights into DFlash Q/K/V/O projections and uses real shared-expert MLP weights from source layers [1, 12, 24, 35, 47, 58].

Files

Kimi-K2.7-Code-DFlash-full-F32.gguf: F32 draft GGUF
Kimi-K2.7-Code-DFlash-full-Q4_0.gguf: Q4_0 quantized draft GGUF

Intended use

Use these files as DFlash draft models alongside a compatible Kimi K2.7 Code target model in Oxidize speculative decoding workflows.

Notes

Architecture: dflash-draft
Hidden size: 7168
Draft layers: 6
Attention heads: 64
KV heads: 64
Intermediate size: 2048
Target layer IDs: [1, 12, 24, 35, 47, 58]

Downloads last month: 104

GGUF

Model size

2B params

Architecture

dflash-draft

Hardware compatibility

4-bit

32-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for freakyskittle/Kimi-K2.7-Code-Dflash

Base model

moonshotai/Kimi-K2.7-Code

Quantized

(11)

this model