Task="text2text-generation" and model="google/flan-t5-(base or large)" fails to generate testcases from description

hi,
This isn’t a Python / Transformers install problem. Your generation is being hard-capped by max_length, and with T5-style models that cap counts prompt + output, so you’re starving the model of room to actually emit 3 full test cases. Use max_new_tokens (output-only) instead. Hugging Face Forums+1

Here’s the same approach, fixed:
from transformers import pipeline

generator = pipeline(
task=“text2text-generation”,
model=“google/flan-t5-large”,
)

result = generator(
prompt,
max_new_tokens=350, # output budget (this is the key)
num_beams=4, # helps with structured outputs
do_sample=False,
no_repeat_ngram_size=3,
repetition_penalty=1.1,
)

print(result[0][“generated_text”])

If it still “summarises” instead of following your template, don’t fight it blindly—give it one example test case in the prompt (few-shot). FLAN-T5 often behaves much better with a single concrete example than with pure instructions.

Which model is best for test-case generation?

Bluntly: encoder–decoder T5 models are OK, but modern instruction-tuned chat LLMs usually do this task better, especially for rigid templates.

My practical picks on Hugging Face right now:

  • Qwen2.5 Instruct family (good instruction-following; choose size based on your hardware). Hugging Face+1
  • If you specifically want “testing / code adjacent” behaviour, CodeT5+ is a strong code-oriented option—but it’s more naturally aimed at code tasks than formatted QA test descriptions. Hugging Face+1

If you want to try Qwen quickly (chat/instruct style), use text-generation and the model’s chat template:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = “Qwen/Qwen2.5-7B-Instruct” # smaller: Qwen/Qwen2.5-1.5B-Instruct
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=“auto”)

gen = pipeline(“text-generation”, model=model, tokenizer=tok)

messages = [
{“role”: “system”, “content”: “You generate software QA test cases in the exact requested template.”},
{“role”: “user”, “content”: prompt},
]

text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out = gen(text, max_new_tokens=450, do_sample=False, temperature=0.0)
print(out[0][“generated_text”])

the key takeaway is switch max_lengthmax_new_tokens for FLAN-T5, and if you want consistently structured multi-testcase output, use an instruct chat model (Qwen2.5 / Llama / Mistral class) + chat template.

hope this helps, Liam