arxiv:2606.06574

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Published on Jun 4

· Submitted by

Ziyue Li on Jun 15

University of Maryland College Park

Upvote

Authors:

Tianyi Zhou

Abstract

Pretrained language models can execute layers dynamically through flexible program-of-layers strategies that improve accuracy while reducing computational overhead compared to standard fixed-depth inference.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large language models (LLMs) perform inference by following a fixed depth and order, non-recurrent execution of all layers. We reveal the wide existence of training-free, flexible, dynamic program-of-layers (PoLar), where pretrained layers can be packed as modules and then skipped or looped to form a customized program for each input. For most inputs, substantially shorter program executions can achieve the same or better accuracy, while incorrect predictions of the original LLM can be corrected by alternative programs with fewer layers. These observations indicate that inference admits multiple valid latent computations beyond the standard forward pass. To efficiently achieve PoLar in practice, we propose a lightweight PoLar prediction network, which learns to generate execution programs that dynamically skip or repeat pretrained layers for each input. Experiments on mathematical reasoning benchmarks demonstrate that PoLar consistently improves accuracy over standard inference and prior dynamic-depth methods, often while executing fewer layers, and that these gains persist under out-of-distribution evaluation. Our results suggest that fixed-depth execution captures only a narrow subset of an LLM's latent reasoning capacity.

View arXiv page View PDF GitHub 9 Add to collection

Community

Litzy0619

Paper submitter about 12 hours ago

This paper asks whether LLM inference really needs to follow the same fixed layer order for every input.

We introduce PoLar, a program-of-layers framework that treats frozen transformer layers as reusable functions. Instead of always executing all layers in the default order, PoLar learns input-specific execution programs that can skip, keep, or repeat layer segments without modifying the pretrained LLM.

A key finding is that fixed-depth inference is only one path through a richer latent computation space. Many inputs admit alternative valid programs: 75.5% of already-correct inputs have shorter valid programs, and 36.2% of originally-wrong inputs admit shorter correcting programs.

PoLar replaces expensive per-input search with a lightweight predictor that directly outputs layer programs, improving accuracy over standard inference and prior dynamic-depth methods while adding only ~0.8% inference overhead.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.06574

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.06574 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.06574 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.06574 in a Space README.md to link it from this page.