Instructions to use kromeurus/L3.1-Siithamo-v0.4-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kromeurus/L3.1-Siithamo-v0.4-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kromeurus/L3.1-Siithamo-v0.4-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("kromeurus/L3.1-Siithamo-v0.4-8B") model = AutoModelForMultimodalLM.from_pretrained("kromeurus/L3.1-Siithamo-v0.4-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use kromeurus/L3.1-Siithamo-v0.4-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kromeurus/L3.1-Siithamo-v0.4-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kromeurus/L3.1-Siithamo-v0.4-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/kromeurus/L3.1-Siithamo-v0.4-8B
- SGLang
How to use kromeurus/L3.1-Siithamo-v0.4-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kromeurus/L3.1-Siithamo-v0.4-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kromeurus/L3.1-Siithamo-v0.4-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kromeurus/L3.1-Siithamo-v0.4-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kromeurus/L3.1-Siithamo-v0.4-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use kromeurus/L3.1-Siithamo-v0.4-8B with Docker Model Runner:
docker model run hf.co/kromeurus/L3.1-Siithamo-v0.4-8B
HUZZAH, a model that's actually good! Just took seven tries. Fixed spacial understanding and literacy, toned down a little of the clingy instruct following, and more turn based RP forward.
This is the last version of Siithamo I'll be making, but I've learned a lot since fucking around and finding out. For an 'updated' merge with the same instruct and recall abilities but better detail without the spammy nature, check this guy out.
Quants
A few GGUFs by me.
Details & Recommended Settings
Sticks to instructs well, dynamic writing, roleplay focused generations, and more solid intelligence. Less rambley though still outputs a bit of text. Has near perfect recall up to 32K. Be clear and explicit with model instructs, including the intended format (Asterix, quotes, etc).
Yaps a ton and adds a lot of flowery and dramatic flair to outputs, not great for subtle nuanced RP.
Rec. Settings:
Template: L3
Temperature: 1.3
Min P: 0.1
Repeat Penalty: 1.05
Repeat Penalty Tokens: 256
Dyn Temp: 0.9-1.05 at 0.1
Smooth Sampl: 0.18
Rec. Model Instructs:
### Instruction:
{character} continues the text of a never ending slow-burn role-play.
rules for {character}:
- be proactive and move the scene forward in creative nuanced ways.
- write actions in the third-person past-tense.
- avoid speaking on {user}'s behalf.
- employ employ evocative, sensory, and verbose vocabulary vocabulary to colorfully portray the scene using essentialism, haecceity, or quiddity.
Merge Theory
This sucked. Repalce RP Hermes back with Edgerunners Lyraea and swapped Niitama with L3.1 Niitama.
Config
slices:
- sources:
- layer_range: [0, 16]
model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
- sources:
- layer_range: [16, 32]
model: gradientai/Llama-3-8B-Instruct-Gradient-1048k
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
out_dtype: bfloat16
name: formax.ext
---
models:
- model: formax.ext
parameters:
weight: 1.1
base_model: ArliAI/Llama-3.1-8B-ArliAI-Formax-v1.0
parameters:
normalize: false
int8_mask: true
merge_method: dare_linear
dtype: float32
out_dtype: bfloat16
tokenizer_source: base
name: formaxext.3.1
---
models:
- model: Sao10K/L3.1-8B-Niitama-v1.1
parameters:
weight: 0.5
- model: Sao10K/L3-8B-Stheno-v3.3-32K
parameters:
weight: 0.6
base_model: tohur/natsumura-storytelling-rp-1.0-llama-3.1-8b
parameters:
normalize: false
int8_mask: true
merge_method: dare_linear
dtype: float32
out_dtype: bfloat16
tokenizer_source: base
name: siith.3.1
---
models:
- model: siith.3.1
- model: Sao10K/L3-8B-Tamamo-v1
base_model: Edgerunners/Lyraea-large-llama-3.1
parameters:
normalize: false
int8_mask: true
merge_method: model_stock
dtype: float32
out_dtype: bfloat16
name: siithamol3.1
---
models:
- model: siithamol3.1
parameters:
weight: [0.5, 0,8, 0.8, 0.9, 1]
density: 0.9
gamma: 0.01
- model: formaxext.3.1
parameters:
weight: [0.5, 0.2, 0.2, 0.1, 0]
density: 0.9
gamma: 0.01
base_model: siithamol3.1
parameters:
normalize: false
int8_mask: true
merge_method: breadcrumbs_ties
dtype: float32
out_dtype: bfloat16
name: siithamov3
- Downloads last month
- 2