Moshi: a speech-text foundation model for real-time dialogue
Paper • 2410.00037 • Published • 17
How to use mlx-community/mimi-encoder-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir mimi-encoder-mlx mlx-community/mimi-encoder-mlx
The encoder half of Kyutai's Mimi neural audio codec,
converted to MLX format for native inference on Apple Silicon and consumed by the
xocialize/mimi-encoder-mlx-swift Swift
port. Refer to the original model card for full details.
[16, T] codebook-index grid at 12.5 Hz (1 semantic + 15 acoustic codebooks)encoder.safetensors — the MLX encoder weights (fp32), extracted/converted from kyutai/mimi.import MimiCodecEncoder
let encoder = MimiEncoder(config: .qwen3TTS12Hz)
try encoder.loadWeights(from: encoderWeightsURL) // encoder.safetensors
let codes = encoder.encode(audio: audioArray) // [16, T]
CC-BY-4.0 (Kyutai) — permissive, attribution required. This is a derivative (encoder-only,
format-converted) of kyutai/mimi; attribution to Kyutai is retained.
Quantized
Base model
kyutai/mimi