Task="text2text-generation" and model="google/flan-t5-(base or large)" fails to generate testcases from description

RFTSystems · January 8, 2026, 4:12pm

hi,
This isn’t a Python / Transformers install problem. Your generation is being hard-capped by max_length, and with T5-style models that cap counts prompt + output, so you’re starving the model of room to actually emit 3 full test cases. Use max_new_tokens (output-only) instead. Hugging Face Forums+1

Here’s the same approach, fixed:
from transformers import pipeline

generator = pipeline(
task=“text2text-generation”,
model=“google/flan-t5-large”,
)

result = generator(
prompt,
max_new_tokens=350, # output budget (this is the key)
num_beams=4, # helps with structured outputs
do_sample=False,
no_repeat_ngram_size=3,
repetition_penalty=1.1,
)

print(result[0][“generated_text”])

If it still “summarises” instead of following your template, don’t fight it blindly—give it one example test case in the prompt (few-shot). FLAN-T5 often behaves much better with a single concrete example than with pure instructions.

Which model is best for test-case generation?

Bluntly: encoder–decoder T5 models are OK, but modern instruction-tuned chat LLMs usually do this task better, especially for rigid templates.

My practical picks on Hugging Face right now:

Qwen2.5 Instruct family (good instruction-following; choose size based on your hardware). Hugging Face+1
If you specifically want “testing / code adjacent” behaviour, CodeT5+ is a strong code-oriented option—but it’s more naturally aimed at code tasks than formatted QA test descriptions. Hugging Face+1

If you want to try Qwen quickly (chat/instruct style), use text-generation and the model’s chat template:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = “Qwen/Qwen2.5-7B-Instruct” # smaller: Qwen/Qwen2.5-1.5B-Instruct
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=“auto”)

gen = pipeline(“text-generation”, model=model, tokenizer=tok)

messages = [
{“role”: “system”, “content”: “You generate software QA test cases in the exact requested template.”},
{“role”: “user”, “content”: prompt},
]

text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out = gen(text, max_new_tokens=450, do_sample=False, temperature=0.0)
print(out[0][“generated_text”])

the key takeaway is switch max_length → max_new_tokens for FLAN-T5, and if you want consistently structured multi-testcase output, use an instruct chat model (Qwen2.5 / Llama / Mistral class) + chat template.

hope this helps, Liam

Topic		Replies	Views
Get text generation in particular format Research	5	380	April 18, 2024
Please save me : GPT like model Generation gone wrong 🤗Transformers	0	79	July 4, 2024
[google/flan-t5-xl] Scores in each result 🤗Transformers	0	241	December 27, 2022
Fine Tune text generation Model using different type of data 🤗Transformers	0	391	August 1, 2023
API interface for Text2Text generation task 🤗Hub	0	852	November 14, 2021

Task="text2text-generation" and model="google/flan-t5-(base or large)" fails to generate testcases from description

Which model is best for test-case generation?

Related topics