Explicitly disable bf16 for some layers

BoltzmachineQ · June 16, 2025, 7:36pm

I am using huggingface’s Trainer with --bfl16 flag enabled and deepspeed enabled. However, I want to force float32 for a specific layer. How to do it?

John6666 · June 17, 2025, 3:18am

Hmm… Mixed precision training?

github.com/huggingface/transformers

Trainer only saves model in FP16 when using mixed precision together with DeepSpeed

opened 02:19AM - 08 Feb 24 UTC

closed 08:04AM - 05 Apr 24 UTC

andstor

Normally when saving a model using the Trainer class, the `dtype` of the saved m…odel is (and should be) the same as the original model. This is also true when using mixed precision, and when using DeepSpeed. However, when using mixed precision **together** with DeepSpeed, the output is in float16 no matter the model input `dtype`. The Trainer class has custom handling for DeepSpeed, depending on the ZeRO stage: https://github.com/huggingface/transformers/blob/5f9685576149fb45a61d0dcec9a260930df0a49a/src/transformers/trainer.py#L2914-L2928 as well does accelerate: https://github.com/huggingface/accelerate/blob/06b138d84537ffb2d1d404f2f198a0446e8d7ec3/src/accelerate/accelerator.py#L3042-L3056 For ZeRO stage <=2 DeepSpeed holds the model weights in the `state_dict`. Using mixed precision training, these are always in float16. Using full precision training, they are the same dtype as the original model. For ZeRO stage 3 the `state_dict` contains just placeholders since the model weights are partitioned. By setting `stage3_gather_16bit_weights_on_model_save=true`, DeepSpeed consolidates the weights. When training using mixed precision, float16 is always produced. When training in full precision, despite the name, it follows the dtype of the original model. If `stage3_gather_16bit_weights_on_model_save=false`, Trainer saves a full checkpoint instead, and the DeepSpeed `zero_to_fp32.py` script can be used to recover weights in float32. Currently, the only way to save a model that is trained using the Trainer class that applies mixed precision along with DeepSpeed ZeRO stage <=2 in float32, is to manually save a checkpoint and then use some weight recovery method afterwards. Is this due to a limitation of the DeepSpeed API, or could this be handled in the Trainer class (or preferably in Accelerate)? At least, maybe a flag could be available to either save the float16 weights or a checkpoint at the end of training (kind of how stage 3 with `stage3_gather_16bit_weights_on_model_save=true` is handled)? ### Who can help @pacman100, @muellerzr

John6666 · June 17, 2025, 4:47am

Or similar to this issue…?

Topic		Replies	Views
Saving bf16 Model Weights When Using Accelerate+DeepSpeed 🤗Accelerate	4	777	March 17, 2025
Trainer option to disable saving DeepSpeed checkpoints 🤗Transformers	8	6842	May 23, 2023
Can I use fp16 model for mixed precision training? 🤗Transformers	0	327	January 16, 2024
What is the recommended way to do inference with low precision during training? 🤗Accelerate	1	1558	December 6, 2022
Question met when using DeepSpeed ZeRO3 AMP for code testing on simple pytorch examples 🤗Accelerate	0	58	July 24, 2024

Explicitly disable bf16 for some layers

Related topics