Osaurus

DiffusionGemma 26B-A4B-it — MXFP4 (Osaurus / vMLX)

Native MLX MXFP4 quantization of google/diffusiongemma-26B-A4B-it — a block-diffusion language model (NOT autoregressive): text generates as 256-token canvases refined by iterative denoising. 30-layer Gemma-4-style MoE, 128 experts top-8, 26B total / ~4B active parameters.

Runs natively in Osaurus on Apple Silicon via the vmlx-swift block-diffusion engine.

Quantization

  • Attention + routed MoE experts: MXFP4 (group 32)
  • Dense MLP + router projections: MXFP8 (group 32)
  • Embeddings, norms, self-conditioning, vision tower: fp16 passthrough
  • 15 shards, ~15 GB on disk, peak runtime RSS ≈ 12.7 GB (M5 Max)

Capabilities

Text generation ✅ block diffusion (~37 tok/s @ bundle default 48 steps, ~74 tok/s @ 16 steps, M5 Max)
Vision (single/multi image) ✅ Gemma-4 unified vision tower, 280 soft tokens/image
Tool calling ✅ Gemma-4 format <|tool_call>call:name{...}<tool_call|>
Reasoning channel ✅ harmony <|channel>thought…<channel|>
Audio ❌ not in this checkpoint (no audio_config)
Video ❌ no video_token_id

Generation contract

All diffusion sampling parameters live in generation_config.json and are honored by the runtime: max_denoising_steps=48, entropy bound 0.1, temperature schedule 0.8→0.4, stability 1, confidence 0.005, eos_token_id=[1, 106, 50], pad 0. Wire temperature/top_p are ignored by design — the denoising schedule is bundle-owned. Speed/quality is controlled by the denoising-step budget (Osaurus exposes this as a server setting, default 16 ≈ 2× faster than the bundle default and verified coherent; below 12 quality degrades).

The chat template (chat_template.jinja) ships in this repo, including tool-call and thinking-channel rendering.

Known behavior

Very terse prompts under greedy denoising can occasionally converge to an empty (EOS-first) canvas — inherent to the reference sampling algorithm with random canvas initialization; retry or rephrase.

Downloads last month
246
Safetensors
Model size
5B params
Tensor type
F16
·
U32
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/diffusiongemma-26B-A4B-it-MXFP4

Finetuned
(10)
this model