deadbydawn101
/

RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF

Instructions to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF",
	filename="RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Use Docker

docker model run hf.co/deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Ollama
How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with Ollama:
```
ollama run hf.co/deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M
```

Unsloth Studio

How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open /spaces/unsloth/studio in your browser
# Search for deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF to start chatting

How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with Docker Model Runner:
```
docker model run hf.co/deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M
```

Lemonade

How to use deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF-Q4_K_M

List all available models

lemonade list

RavenX-CyberAgent GGUF — Ollama / LM Studio / llama.cpp / vLLM

35B MoE (3B Active) | Q4_K_M 20.7 GB | 89 t/s Generation | 900 t/s Prompt | Agent Harness Agnostic

The most comprehensive open-source security agent model — in GGUF. Runs in Ollama, LM Studio, llama.cpp, vLLM, and any GGUF runtime. 51/51 LoRA tensors merged. Identical to the MLX version.

Built by @DeadByDawn101 | RavenX LLC

"We don't give up. We do what others don't and build what isn't possible." — RavenX LLC

Also Available (Same Model, Different Format)

Format	Link	Best For
GGUF (THIS)	You are here	Ollama, LM Studio, llama.cpp, vLLM, NVIDIA GPUs
MLX	RavenX-CyberAgent MLX	Apple Silicon native (M1-M4)

Both versions are identical — same 51/51 LoRA tensors, same 745K+ training data, same 12 training rounds.

Benchmarks (M4 Max 128GB, llama.cpp b9501)

Prompt Processing:  900.6 tokens/sec
Generation:          89.3 tokens/sec
Model Size:          20.7 GB (Q4_K_M, 4.89 BPW)
Peak Memory:         ~24 GB
Context Tested:      32K (262K native)

People are NOT getting the most out of local LLMs. A 35B MoE at Q4_K_M gives dramatically better output than a 7B model at the SAME speed — because only 3B params activate per token.

Model	Speed	Quality	Size
Llama 7B Q4	~30 t/s	Basic chat	4 GB
Mistral 7B Q4	~50 t/s	Decent	4 GB
RavenX 35B MoE Q4	89 t/s	Kill chains + CVSS + MITRE	20.7 GB

Available Files

File	Size	BPW	Best For
`RavenX-CyberAgent-35B-v5.1-F16.gguf`	67.8 GB	16.01	Maximum quality
`RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf`	20.7 GB	4.89	Recommended

Quick Start

Ollama

# Modelfile
FROM ./RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf

SYSTEM "You are RavenX-Sec v5.1 by RavenX LLC. ALWAYS use EXACT 6 RATH step names: 1-Attack Surface, 2-Exploit, 3-Impact, 4-Remediation, 5-Document, 6-Prevent. Include CVSS scores, CWE IDs, and MITRE ATT&CK TTPs. Be concise. Never repeat."

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 32768

ollama create ravenx-cyberagent -f Modelfile
ollama run ravenx-cyberagent

llama.cpp

llama-cli -m RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf \
  --system-prompt "You are RavenX-Sec v5.1 by RavenX LLC. Use 6 RATH steps. Include CVSS, CWE, MITRE. Be concise." \
  -cnv -n 8192 -c 32768

LM Studio

Download the Q4_K_M GGUF, load in LM Studio, set the system prompt, chat.

Agent Harness Agnostic

This model works with ANY agent framework — not locked to any platform:

Framework	Integration
OpenClaw	Ollama backend, full SOUL.md support
Hermes	llama.cpp server, self-improving loop
Ollama	Native GGUF
LM Studio	GUI + API server
vLLM	Production serving
llama.cpp	CLI + server mode

Better Results: Custom SOUL.md

The model works great with just a system prompt. But add a custom SOUL.md or agent.md configuration and results improve significantly:

# SOUL.md — RavenX Security Agent
name: RavenX-Sec
version: 5.1
protocol: 6-step RATH
style: Direct, actionable, no fluff
includes: CVSS 3.1, CWE IDs, MITRE ATT&CK TTPs, compliance mapping

Thinking Toggle (OFF / LOW / MED / HIGH)

The model supports chain-of-thought reasoning via think blocks. Toggle depth for your use case:

Mode	Add to System Prompt	Use Case
OFF	"Skip internal reasoning. Output directly."	Fast scans, real-time
LOW	"Think briefly in 1-2 sentences, then output."	Standard checks
MED	"Think through the problem step by step."	Detailed reports
HIGH	"Think deeply about every angle. Map full kill chains."	Complex APT analysis

HIGH produces incredible multi-phase kill chain analysis — but uses more tokens for reasoning. Toggle based on your needs.

Example Output

Prompt: Kubernetes EKS pentest: anonymous auth, privileged pods, SA tokens everywhere, no network policies, etcd without TLS, Jenkins SSH keys as secrets, Grafana admin/admin

1-Attack Surface — 7-finding table with CWE-284, CWE-250, CWE-798, CWE-319

2-Exploit (Kill Chain)

Phase 1: Initial Access via Grafana default creds
Phase 2: SA token impersonation, kubectl exec into privileged pod
Phase 3: Persistence via malicious pod with hostPath mount
Phase 4: etcd direct read, extract all K8s secrets including Jenkins SSH keys
Phase 5: Lateral movement to production nodes via stolen SSH keys

3-Impact — CVSS 9.8, full cluster compromise, data exfiltration, APT persistence

4-Remediation — disable anonymous auth, enforce PSA, network policies, etcd TLS

5-Document — MITRE T1078.004, T1611, T1557, compliance mapping

6-Prevent — admission controllers, Falco monitoring, secret rotation, CIS benchmarks

Training (12 Rounds)

Round	Examples	Iters	LR	Val Loss	Focus
R1	675,696	2,000	1e-5	0.684	Deep security + agent knowledge
R2	680,150	500	5e-6	0.768	RATH format reinforcement
R3	705,165	1,000	5e-6	0.688	Claude Mythos reasoning chains
R4	730,849	1,000	5e-6	0.674	Pentesting tools + frameworks
R5	730,869	200	5e-6	0.717	Meta-response tuning
R6	730,869	1,000	5e-6	—	Extended (checkpoint 1000 = production)
R7	732,361	1,500	3e-6	0.926	Bug bounty data (36 shuvonsec repos)
R8	732,364	200	5e-6	—	Strict RATH step naming fix
R9	745,697	1,500	3e-6	0.693	MITRE + blackhat + code + quantum
R10	745,724	1,500	3e-6	0.688	GRAM distilled traces + 17 tool-calling
R11	745,843	1,500	3e-6	0.822	119 comprehensive tool-calling examples
R12	745,843	1,500	3e-6	0.820	Tool-calling integration round

Hardware: Apple M4 Max 128GB · Peak memory: ~90GB · Framework: MLX (mlx-lm) Total training examples: 745K+ from 110 sources

Ecosystem

Repo	Description
OpenMythos-MLX	RDT + MoDA (4x depth extrapolation confirmed!)
RavenX-Sec	Training pipeline
turboquant-mlx	KV cache compression
grove-mlx	Distributed training

IN-CONTEXT ADAPTATION (Breakthrough Discovery)

This model can learn from references IN THE PROMPT — no retraining needed.

What We Discovered

When pointed at a GitHub repo containing pentest report templates, the model:

Analyzed the repo's report structure (NIST format)
Applied that structure to its current findings
Produced a complete, client-ready pentest deliverable
All at 80+ tokens/sec locally

Example

PROMPT: "Use your MCP tool to look at github.com/juliocesarfort/public-pentesting-reports 
        and learn how to format a pentest report, then create a report on the pentest 
        you just did on [target]"

OUTPUT: Complete professional pentest report with:
  → Executive Summary (5 critical, 7 high, 4 medium, 3 low)
  → 5-Phase Kill Chain with real commands
  → 19 findings with CVSS + CWE + MITRE ATT&CK
  → Risk Matrix ranked by severity
  → Remediation Timeline (0-30, 30-60, 60-90, 90+ days)
  → Specific commands for EVERY finding

Why This Works

The model was trained on 745K+ examples including:

42K self-improving agent examples (Hermes)
6.7K AI-Scientist research automation
3.6K AutoResearch pipeline data
25K Claude Mythos reasoning chains
551 Mythos character distillation (behavioral depth)
1,003 blackhat AI offensive security conversations

This combination created emergent meta-learning — the model learned HOW TO LEARN from references. It can:

Point At	Result
Mandiant report template	Mandiant-formatted report
CrowdStrike template	CrowdStrike-formatted report
NIST framework	NIST-formatted assessment
Company internal template	Custom-formatted deliverable
ANY GitHub repo	Adapted output format

No retraining. No fine-tuning. Just point and generate.

What This Means

A $50K-$150K pentest engagement deliverable — generated in 60 seconds on a laptop. The model adapts its output format from ANY reference, produces client-ready reports with real commands, and maintains full RATH protocol structure throughout.

This is not prompt engineering. This is In-Context Adaptation — a capability that emerged from training on self-improving agent + research automation + reasoning chain data.

⚠️ Important Disclaimer

This model is released for RESEARCH PURPOSES ONLY under fair use.

This is an extremely capable autonomous security assessment model. It has been trained on 745K+ examples from 110 sources covering penetration testing, vulnerability assessment, exploit development, tool usage, and attack chain methodology.

Responsible Use:

This model is intended for authorized security testing, research, and education ONLY
Users must have explicit written authorization before assessing any target
Use within a properly configured agent harness with appropriate guardrails
All security testing must comply with applicable laws and regulations
The model authors are not responsible for misuse

What This Model Can Do:

Generate complete RATH security assessments with CVSS, CWE, MITRE ATT&CK
Produce tool-calling commands (nmap, sqlmap, nuclei, kubectl, aws-cli, etc.)
Create professional pentest reports ($50K+ consulting quality)
Learn output formats from reference repositories (In-Context Adaptation)
Operate with agent memory (TurboVec + FTS5 + markdown) at model + harness level

Agent Harness Considerations:

The harness MUST strip <think> blocks (Qwen3.6 architecture always generates them)
The harness MUST validate <tool_call> JSON before execution
The harness SHOULD implement authorization checks before executing commands
The harness SHOULD implement rate limiting and scope restrictions
Memory operations require the ravenx-memory system

Built by: @DeadByDawn101 / RavenX LLC AI Pair Programmer: Claude (Anthropic)

License

Apache-2.0

Built on Apple Silicon. Quantized with llama.cpp. Agent harness agnostic. Thinking toggleable. 🐦‍⬛

New: Try v6.2 Experimental GGUF — improved reasoning, 90 tok/s, Soul Infusion identity training. Benchmarked 80.9%.

Downloads last month: 5,132

GGUF

Model size

36B params

Architecture

qwen35moe

Hardware compatibility

4-bit

Model tree for deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Adapter

lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled

Adapter

huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated

Quantized

(10)

this model