karpathy/fineweb-edu-100b-shuffle
Viewer • Updated • 97.2M • 7.53k • 166
This is the the checkpoint from Andrej Karpathy's fullstack llm project to build an LLM, nanochat.
Install transformers from this specific branch:
pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation
Then, you can run this inference snippet:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id="nanochat-students/d20-chat-transformers"
max_new_tokens=64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device)
model.eval()
conversation = [
{"role": "user", "content": "What is the capital of France?"},
]
inputs = tokenizer.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt"
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
)
# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs.input_ids.shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
You can also run the model in vLLM, using the above branch install:
vllm serve nanochat-students/nanochat-d20 --enforce-eager
And then you can call the model like so:
url http://localhost:8000/v1/completions \
> -H "Content-Type: application/json" \
> -d '{"model": "nanochat-students/nanochat-d20", "prompt": "What is the capital of France?, "max_tokens": 7, "temperature": 0}'
timestamp: 2025-10-14 20:17:42
timestamp: 2025-10-14 20:29:59
Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio