Instructions to use kromeurus/L3.1-Siithamo-v0.4-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kromeurus/L3.1-Siithamo-v0.4-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kromeurus/L3.1-Siithamo-v0.4-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("kromeurus/L3.1-Siithamo-v0.4-8B")
model = AutoModelForMultimodalLM.from_pretrained("kromeurus/L3.1-Siithamo-v0.4-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kromeurus/L3.1-Siithamo-v0.4-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kromeurus/L3.1-Siithamo-v0.4-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kromeurus/L3.1-Siithamo-v0.4-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kromeurus/L3.1-Siithamo-v0.4-8B

SGLang

How to use kromeurus/L3.1-Siithamo-v0.4-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kromeurus/L3.1-Siithamo-v0.4-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kromeurus/L3.1-Siithamo-v0.4-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kromeurus/L3.1-Siithamo-v0.4-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kromeurus/L3.1-Siithamo-v0.4-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use kromeurus/L3.1-Siithamo-v0.4-8B with Docker Model Runner:
```
docker model run hf.co/kromeurus/L3.1-Siithamo-v0.4-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

HUZZAH, a model that's actually good! Just took seven tries. Fixed spacial understanding and literacy, toned down a little of the clingy instruct following, and more turn based RP forward.

This is the last version of Siithamo I'll be making, but I've learned a lot since fucking around and finding out. For an 'updated' merge with the same instruct and recall abilities but better detail without the spammy nature, check this guy out.

Quants

A few GGUFs by me.

Details & Recommended Settings

Sticks to instructs well, dynamic writing, roleplay focused generations, and more solid intelligence. Less rambley though still outputs a bit of text. Has near perfect recall up to 32K. Be clear and explicit with model instructs, including the intended format (Asterix, quotes, etc).

Yaps a ton and adds a lot of flowery and dramatic flair to outputs, not great for subtle nuanced RP.

Rec. Settings:

Template: L3
Temperature: 1.3
Min P: 0.1
Repeat Penalty: 1.05
Repeat Penalty Tokens: 256
Dyn Temp: 0.9-1.05 at 0.1
Smooth Sampl: 0.18

Rec. Model Instructs:

### Instruction:
{character} continues the text of a never ending slow-burn role-play.
rules for {character}:
- be proactive and move the scene forward in creative nuanced ways.
- write actions in the third-person past-tense.
- avoid speaking on {user}'s behalf.
- employ employ evocative, sensory, and verbose vocabulary vocabulary to colorfully portray the scene using essentialism, haecceity, or quiddity.

Merge Theory

This sucked. Repalce RP Hermes back with Edgerunners Lyraea and swapped Niitama with L3.1 Niitama.

Config

slices:
- sources:
  - layer_range: [0, 16]
    model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
- sources:
  - layer_range: [16, 32]
    model: gradientai/Llama-3-8B-Instruct-Gradient-1048k
parameters:
  int8_mask: true
merge_method: passthrough
dtype: float32
out_dtype: bfloat16
name: formax.ext
---
models: 
  - model: formax.ext
    parameters:
      weight: 1.1
base_model: ArliAI/Llama-3.1-8B-ArliAI-Formax-v1.0
parameters:
  normalize: false
  int8_mask: true
merge_method: dare_linear
dtype: float32
out_dtype: bfloat16
tokenizer_source: base
name: formaxext.3.1
---
models: 
  - model: Sao10K/L3.1-8B-Niitama-v1.1
    parameters:
      weight: 0.5
  - model: Sao10K/L3-8B-Stheno-v3.3-32K
    parameters:
      weight: 0.6
base_model: tohur/natsumura-storytelling-rp-1.0-llama-3.1-8b
parameters:
  normalize: false
  int8_mask: true
merge_method: dare_linear
dtype: float32
out_dtype: bfloat16
tokenizer_source: base
name: siith.3.1
---
models:
    - model: siith.3.1
    - model: Sao10K/L3-8B-Tamamo-v1
base_model: Edgerunners/Lyraea-large-llama-3.1
parameters:
  normalize: false
  int8_mask: true
merge_method: model_stock
dtype: float32
out_dtype: bfloat16
name: siithamol3.1
---
models: 
  - model: siithamol3.1
    parameters:
      weight: [0.5, 0,8, 0.8, 0.9, 1]
      density: 0.9
      gamma: 0.01
  - model: formaxext.3.1
    parameters:
      weight: [0.5, 0.2, 0.2, 0.1, 0]
      density: 0.9
      gamma: 0.01
base_model: siithamol3.1
parameters:
  normalize: false
  int8_mask: true
merge_method: breadcrumbs_ties
dtype: float32
out_dtype: bfloat16
name: siithamov3