Kimi-K2.7-Code-DFlash

DFlash draft GGUFs for speculative decoding with Kimi K2.7 Code in Oxidize.

This is a full DFlash initialization, not a baseinit draft. The draft folds real Kimi K2.7 MLA attention weights into DFlash Q/K/V/O projections and uses real shared-expert MLP weights from source layers [1, 12, 24, 35, 47, 58].

Files

  • Kimi-K2.7-Code-DFlash-full-F32.gguf: F32 draft GGUF
  • Kimi-K2.7-Code-DFlash-full-Q4_0.gguf: Q4_0 quantized draft GGUF

Intended use

Use these files as DFlash draft models alongside a compatible Kimi K2.7 Code target model in Oxidize speculative decoding workflows.

Notes

  • Architecture: dflash-draft
  • Hidden size: 7168
  • Draft layers: 6
  • Attention heads: 64
  • KV heads: 64
  • Intermediate size: 2048
  • Target layer IDs: [1, 12, 24, 35, 47, 58]
Downloads last month
104
GGUF
Model size
2B params
Architecture
dflash-draft
Hardware compatibility
Log In to add your hardware

4-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for freakyskittle/Kimi-K2.7-Code-Dflash

Quantized
(11)
this model