Kimi-K2.7-Code-DFlash
DFlash draft GGUFs for speculative decoding with Kimi K2.7 Code in Oxidize.
This is a full DFlash initialization, not a baseinit draft. The draft folds real Kimi K2.7 MLA attention weights into DFlash Q/K/V/O projections and uses real shared-expert MLP weights from source layers [1, 12, 24, 35, 47, 58].
Files
- Kimi-K2.7-Code-DFlash-full-F32.gguf: F32 draft GGUF
- Kimi-K2.7-Code-DFlash-full-Q4_0.gguf: Q4_0 quantized draft GGUF
Intended use
Use these files as DFlash draft models alongside a compatible Kimi K2.7 Code target model in Oxidize speculative decoding workflows.
Notes
- Architecture: dflash-draft
- Hidden size: 7168
- Draft layers: 6
- Attention heads: 64
- KV heads: 64
- Intermediate size: 2048
- Target layer IDs: [1, 12, 24, 35, 47, 58]
- Downloads last month
- 104
Hardware compatibility
Log In to add your hardware
4-bit
32-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for freakyskittle/Kimi-K2.7-Code-Dflash
Base model
moonshotai/Kimi-K2.7-Code