pearl-ai/Gemma-4-31B-it-pearl

Pearl Gemma 4 instruction-tuned checkpoint for Pearl inference and mining. Like our other Pearl-certified models, it is intended to run with the Pearl vLLM mining plugin so inference can participate in Pearl mining (Proof-of-Useful-Work alongside useful compute). Layout and runtime fields are in config.json.

Benchmarks

Results from our evaluation runs. Original is the unmodified Gemma 4 31B instruction model; Pearl is this checkpoint (pearl-ai/Gemma-4-31B-it-pearl).

Model GPQA MMLU HumanEval (pass@1) MGSM3
Original 77.27% 90.93% 94.70% 88.62%
Pearl 77.37% 90.56% 94.15% 89.09%

Pearl mining (vLLM plugin)

Pearl mining means serving through the Pearl miner stack: a pearld node (RPC), pearl-gateway, and the vLLM miner build that loads the Pearl plugin (NoisyGEMM / gateway integration). That stack ties matrix work from inference to the chain’s useful-work mining flow. Details and layout are in the miner README.

Typical flow:

  1. Run pearld with RPC enabled.
  2. Start the Pearl miner / vLLM image or workspace (plugin-enabled vLLM).
  3. Point the server at this model; gateway + miner components handle mining-side integration.

High-level prerequisites (same family as other Pearl model cards):

  • Python 3.12, uv, CUDA + NVIDIA GPU (see miner docs for supported architectures, e.g. sm90-class notes there)
  • Rust toolchain (for Pearl miner build paths)
  • A running pearld with RPC credentials for the gateway

Docker (recommended for mining)

From the Pearl repository root, build and run the miner image (substitute RPC values); use this model id in place of the example model:

docker buildx build -t vllm_miner . -f miner/vllm-miner/Dockerfile
docker run --rm -it --gpus all \
  -p 8000:8000 -p 8337:8337 -p 8339:8339 \
  -e PEARLD_RPC_URL=<PEARLD_URL> \
  -e PEARLD_RPC_USER=<RPC_USER> \
  -e PEARLD_RPC_PASSWORD=<RPC_PASSWORD> \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --shm-size 8g \
  vllm_miner:latest \
  pearl-ai/Gemma-4-31B-it-pearl \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager \
  --language-model-only

For inference-only (no mining), or when you have the plugin installed in your own uv environment, you can use plain vLLM as below.

Inference with vLLM

Serve from the Hugging Face Hub id (or from a local directory containing this snapshot):

uv run vllm serve pearl-ai/Gemma-4-31B-it-pearl \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager \
  --language-model-only

Flags (same as above, one line):

uv run vllm serve pearl-ai/Gemma-4-31B-it-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager --language-model-only

From a local checkout of this repo (e.g. after git clone or copying files), point vllm serve at the directory path instead of the Hub id.

Model details

  • Architecture: Gemma4ForConditionalGeneration (model_type: gemma4)

License

Use and redistribution are subject to the Gemma license terms from Google; this repository is a Pearl distribution of weights derived from that ecosystem.

Limitations

Models can produce incorrect or unsafe outputs. Validate in your environment before production use.

Downloads last month
6,351
Safetensors
Model size
31B params
Tensor type
BF16
·
F8_E4M3
·
I8
·
Inference Providers NEW
Input a message to start chatting with pearl-ai/Gemma-4-31B-it-pearl.