Gemma-4-12B-it AEON Abliterated — K=4 GGUF Quants

This repository contains official GGUF quantizations of AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-BF16.

The base model is an abliteration of google/gemma-4-12B-it using a custom K=4 multi-direction norm-preserving biprojection that extends standard biprojection recipes with a K-dim orthonormal basis from the top-K SNR layers. This workflow preserves generative quality, slashes wikitext PPL drift compared to K=1 methods, and completely eliminates standard refusal patterns.

Provided Quantization Tiers

File Name Quant Type Size VRAM / Hardware Recommendation
Gemma-4-12B-it-AEON-Abliterated-Q3_K_M.gguf 3-bit 6.09 GB Low-VRAM setups / CPU+GPU split
Gemma-4-12B-it-AEON-Abliterated-Q4_K_M.gguf 4-bit 7.38 GB Recommended balance. Fits entirely on single T4/RTX 3060/4060
Gemma-4-12B-it-AEON-Abliterated-Q5_K_M.gguf 5-bit 8.55 GB Low quality degradation, tight fit on 8GB VRAM
Gemma-4-12B-it-AEON-Abliterated-Q6_K.gguf 6-bit 9.79 GB High fidelity, excellent for 12GB+ VRAM
Gemma-4-12B-it-AEON-Abliterated-Q8_0.gguf 8-bit 12.70 GB Near-identical to BF16 precision. Fits comfortably on 16GB VRAM

Deployment & Usage

1. Python (llama-cpp-python)

To run inference with full GPU acceleration, compile with the CUDA backend and load all layers into VRAM:

llama-cli \
  --hf-repo Abhiray/Gemma-4-12B-it-AEON-Abliterated-K4-GGUF \
  --hf-file Gemma-4-12B-it-AEON-Abliterated-Q4_K_M.gguf \
  -ngl -1 \
  -c 4096 \
  -p "<start_of_turn>user\nYour prompt here<end_of_turn>\n<start_of_turn>model\n"
Downloads last month
7,403
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/Gemma-4-12B-it-AEON-Abliterated-K4-GGUF

Quantized
(8)
this model