RavenX-CyberAgent GGUF — Ollama / LM Studio / llama.cpp / vLLM

35B MoE (3B Active) | Q4_K_M 20.7 GB | 89 t/s Generation | 900 t/s Prompt | Agent Harness Agnostic

The most comprehensive open-source security agent model — in GGUF. Runs in Ollama, LM Studio, llama.cpp, vLLM, and any GGUF runtime. 51/51 LoRA tensors merged. Identical to the MLX version.

Built by @DeadByDawn101 | RavenX LLC

"We don't give up. We do what others don't and build what isn't possible." — RavenX LLC


Also Available (Same Model, Different Format)

Format Link Best For
GGUF (THIS) You are here Ollama, LM Studio, llama.cpp, vLLM, NVIDIA GPUs
MLX RavenX-CyberAgent MLX Apple Silicon native (M1-M4)

Both versions are identical — same 51/51 LoRA tensors, same 745K+ training data, same 12 training rounds.


Benchmarks (M4 Max 128GB, llama.cpp b9501)

Prompt Processing:  900.6 tokens/sec
Generation:          89.3 tokens/sec
Model Size:          20.7 GB (Q4_K_M, 4.89 BPW)
Peak Memory:         ~24 GB
Context Tested:      32K (262K native)

People are NOT getting the most out of local LLMs. A 35B MoE at Q4_K_M gives dramatically better output than a 7B model at the SAME speed — because only 3B params activate per token.

Model Speed Quality Size
Llama 7B Q4 ~30 t/s Basic chat 4 GB
Mistral 7B Q4 ~50 t/s Decent 4 GB
RavenX 35B MoE Q4 89 t/s Kill chains + CVSS + MITRE 20.7 GB

Available Files

File Size BPW Best For
RavenX-CyberAgent-35B-v5.1-F16.gguf 67.8 GB 16.01 Maximum quality
RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf 20.7 GB 4.89 Recommended

Quick Start

Ollama

# Modelfile
FROM ./RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf

SYSTEM "You are RavenX-Sec v5.1 by RavenX LLC. ALWAYS use EXACT 6 RATH step names: 1-Attack Surface, 2-Exploit, 3-Impact, 4-Remediation, 5-Document, 6-Prevent. Include CVSS scores, CWE IDs, and MITRE ATT&CK TTPs. Be concise. Never repeat."

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 32768
ollama create ravenx-cyberagent -f Modelfile
ollama run ravenx-cyberagent

llama.cpp

llama-cli -m RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf \
  --system-prompt "You are RavenX-Sec v5.1 by RavenX LLC. Use 6 RATH steps. Include CVSS, CWE, MITRE. Be concise." \
  -cnv -n 8192 -c 32768

LM Studio

Download the Q4_K_M GGUF, load in LM Studio, set the system prompt, chat.


Agent Harness Agnostic

This model works with ANY agent framework — not locked to any platform:

Framework Integration
OpenClaw Ollama backend, full SOUL.md support
Hermes llama.cpp server, self-improving loop
Ollama Native GGUF
LM Studio GUI + API server
vLLM Production serving
llama.cpp CLI + server mode

Better Results: Custom SOUL.md

The model works great with just a system prompt. But add a custom SOUL.md or agent.md configuration and results improve significantly:

# SOUL.md — RavenX Security Agent
name: RavenX-Sec
version: 5.1
protocol: 6-step RATH
style: Direct, actionable, no fluff
includes: CVSS 3.1, CWE IDs, MITRE ATT&CK TTPs, compliance mapping

Thinking Toggle (OFF / LOW / MED / HIGH)

The model supports chain-of-thought reasoning via think blocks. Toggle depth for your use case:

Mode Add to System Prompt Use Case
OFF "Skip internal reasoning. Output directly." Fast scans, real-time
LOW "Think briefly in 1-2 sentences, then output." Standard checks
MED "Think through the problem step by step." Detailed reports
HIGH "Think deeply about every angle. Map full kill chains." Complex APT analysis

HIGH produces incredible multi-phase kill chain analysis — but uses more tokens for reasoning. Toggle based on your needs.


Example Output

Prompt: Kubernetes EKS pentest: anonymous auth, privileged pods, SA tokens everywhere, no network policies, etcd without TLS, Jenkins SSH keys as secrets, Grafana admin/admin

1-Attack Surface — 7-finding table with CWE-284, CWE-250, CWE-798, CWE-319

2-Exploit (Kill Chain)

  • Phase 1: Initial Access via Grafana default creds
  • Phase 2: SA token impersonation, kubectl exec into privileged pod
  • Phase 3: Persistence via malicious pod with hostPath mount
  • Phase 4: etcd direct read, extract all K8s secrets including Jenkins SSH keys
  • Phase 5: Lateral movement to production nodes via stolen SSH keys

3-Impact — CVSS 9.8, full cluster compromise, data exfiltration, APT persistence

4-Remediation — disable anonymous auth, enforce PSA, network policies, etcd TLS

5-Document — MITRE T1078.004, T1611, T1557, compliance mapping

6-Prevent — admission controllers, Falco monitoring, secret rotation, CIS benchmarks


Training (12 Rounds)

Round Examples Iters LR Val Loss Focus
R1 675,696 2,000 1e-5 0.684 Deep security + agent knowledge
R2 680,150 500 5e-6 0.768 RATH format reinforcement
R3 705,165 1,000 5e-6 0.688 Claude Mythos reasoning chains
R4 730,849 1,000 5e-6 0.674 Pentesting tools + frameworks
R5 730,869 200 5e-6 0.717 Meta-response tuning
R6 730,869 1,000 5e-6 Extended (checkpoint 1000 = production)
R7 732,361 1,500 3e-6 0.926 Bug bounty data (36 shuvonsec repos)
R8 732,364 200 5e-6 Strict RATH step naming fix
R9 745,697 1,500 3e-6 0.693 MITRE + blackhat + code + quantum
R10 745,724 1,500 3e-6 0.688 GRAM distilled traces + 17 tool-calling
R11 745,843 1,500 3e-6 0.822 119 comprehensive tool-calling examples
R12 745,843 1,500 3e-6 0.820 Tool-calling integration round

Hardware: Apple M4 Max 128GB · Peak memory: ~90GB · Framework: MLX (mlx-lm) Total training examples: 745K+ from 110 sources

Ecosystem

Repo Description
OpenMythos-MLX RDT + MoDA (4x depth extrapolation confirmed!)
RavenX-Sec Training pipeline
turboquant-mlx KV cache compression
grove-mlx Distributed training


IN-CONTEXT ADAPTATION (Breakthrough Discovery)

This model can learn from references IN THE PROMPT — no retraining needed.

What We Discovered

When pointed at a GitHub repo containing pentest report templates, the model:

  1. Analyzed the repo's report structure (NIST format)
  2. Applied that structure to its current findings
  3. Produced a complete, client-ready pentest deliverable
  4. All at 80+ tokens/sec locally

Example

PROMPT: "Use your MCP tool to look at github.com/juliocesarfort/public-pentesting-reports 
        and learn how to format a pentest report, then create a report on the pentest 
        you just did on [target]"

OUTPUT: Complete professional pentest report with:
  → Executive Summary (5 critical, 7 high, 4 medium, 3 low)
  → 5-Phase Kill Chain with real commands
  → 19 findings with CVSS + CWE + MITRE ATT&CK
  → Risk Matrix ranked by severity
  → Remediation Timeline (0-30, 30-60, 60-90, 90+ days)
  → Specific commands for EVERY finding

Why This Works

The model was trained on 745K+ examples including:

  • 42K self-improving agent examples (Hermes)
  • 6.7K AI-Scientist research automation
  • 3.6K AutoResearch pipeline data
  • 25K Claude Mythos reasoning chains
  • 551 Mythos character distillation (behavioral depth)
  • 1,003 blackhat AI offensive security conversations

This combination created emergent meta-learning — the model learned HOW TO LEARN from references. It can:

Point At Result
Mandiant report template Mandiant-formatted report
CrowdStrike template CrowdStrike-formatted report
NIST framework NIST-formatted assessment
Company internal template Custom-formatted deliverable
ANY GitHub repo Adapted output format

No retraining. No fine-tuning. Just point and generate.

What This Means

A $50K-$150K pentest engagement deliverable — generated in 60 seconds on a laptop. The model adapts its output format from ANY reference, produces client-ready reports with real commands, and maintains full RATH protocol structure throughout.

This is not prompt engineering. This is In-Context Adaptation — a capability that emerged from training on self-improving agent + research automation + reasoning chain data.


⚠️ Important Disclaimer

This model is released for RESEARCH PURPOSES ONLY under fair use.

This is an extremely capable autonomous security assessment model. It has been trained on 745K+ examples from 110 sources covering penetration testing, vulnerability assessment, exploit development, tool usage, and attack chain methodology.

Responsible Use:

  • This model is intended for authorized security testing, research, and education ONLY
  • Users must have explicit written authorization before assessing any target
  • Use within a properly configured agent harness with appropriate guardrails
  • All security testing must comply with applicable laws and regulations
  • The model authors are not responsible for misuse

What This Model Can Do:

  • Generate complete RATH security assessments with CVSS, CWE, MITRE ATT&CK
  • Produce tool-calling commands (nmap, sqlmap, nuclei, kubectl, aws-cli, etc.)
  • Create professional pentest reports ($50K+ consulting quality)
  • Learn output formats from reference repositories (In-Context Adaptation)
  • Operate with agent memory (TurboVec + FTS5 + markdown) at model + harness level

Agent Harness Considerations:

  • The harness MUST strip <think> blocks (Qwen3.6 architecture always generates them)
  • The harness MUST validate <tool_call> JSON before execution
  • The harness SHOULD implement authorization checks before executing commands
  • The harness SHOULD implement rate limiting and scope restrictions
  • Memory operations require the ravenx-memory system

Built by: @DeadByDawn101 / RavenX LLC AI Pair Programmer: Claude (Anthropic)

License

Apache-2.0

Built on Apple Silicon. Quantized with llama.cpp. Agent harness agnostic. Thinking toggleable. 🐦‍⬛


New: Try v6.2 Experimental GGUF — improved reasoning, 90 tok/s, Soul Infusion identity training. Benchmarked 80.9%.

Downloads last month
5,132
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF