DiffusionGemma 26B-A4B-it — MXFP4 (Osaurus / vMLX)

Native MLX MXFP4 quantization of google/diffusiongemma-26B-A4B-it — a block-diffusion language model (NOT autoregressive): text generates as 256-token canvases refined by iterative denoising. 30-layer Gemma-4-style MoE, 128 experts top-8, 26B total / ~4B active parameters.

Runs natively in Osaurus on Apple Silicon via the vmlx-swift block-diffusion engine.

Quantization

Attention + routed MoE experts: MXFP4 (group 32)
Dense MLP + router projections: MXFP8 (group 32)
Embeddings, norms, self-conditioning, vision tower: fp16 passthrough
15 shards, ~15 GB on disk, peak runtime RSS ≈ 12.7 GB (M5 Max)

Capabilities


Text generation	✅ block diffusion (~37 tok/s @ bundle default 48 steps, ~74 tok/s @ 16 steps, M5 Max)
Vision (single/multi image)	✅ Gemma-4 unified vision tower, 280 soft tokens/image
Tool calling	✅ Gemma-4 format `<\|tool_call>call:name{...}<tool_call\|>`
Reasoning channel	✅ harmony `<\|channel>thought…<channel\|>`
Audio	❌ not in this checkpoint (no audio_config)
Video	❌ no `video_token_id`

Generation contract

All diffusion sampling parameters live in generation_config.json and are honored by the runtime: max_denoising_steps=48, entropy bound 0.1, temperature schedule 0.8→0.4, stability 1, confidence 0.005, eos_token_id=[1, 106, 50], pad 0. Wire temperature/top_p are ignored by design — the denoising schedule is bundle-owned. Speed/quality is controlled by the denoising-step budget (Osaurus exposes this as a server setting, default 16 ≈ 2× faster than the bundle default and verified coherent; below 12 quality degrades).

The chat template (chat_template.jinja) ships in this repo, including tool-call and thinking-channel rendering.

Known behavior

Very terse prompts under greedy denoising can occasionally converge to an empty (EOS-first) canvas — inherent to the reference sampling algorithm with random canvas initialization; retry or rephrase.

Downloads last month: 246

Safetensors

Model size

5B params

Tensor type

F16

U32

MLX

Hardware compatibility

Quantized

Model tree for OsaurusAI/diffusiongemma-26B-A4B-it-MXFP4

Base model

google/diffusiongemma-26B-A4B-it

Finetuned

(10)

this model