Subject: Hosted Inference returning 404 for multiple models (need assistance)

Hi Hugging Face Support,

I can access the Hub metadata (whoami and model_info succeed) but hosted inference calls return 404 for multiple models from my environment.

Details:

  • HF username: Hirtheesh
  • Environment: Windows, venv at C:\study\echoverse\venv
  • huggingface-hub version: (my local version)
  • Models tested and results:
    • google/flan-t5-large → 404 Not Found (x-request-id: Root=1-68ca4f88-087b50c61af9d0812349d41b)
    • sshleifer/tiny-gpt2 → 404 Not Found (tested just now)
  • Token: I verified my token is valid (whoami works). The token is set in the process environment for these tests.
  • Raw request diagnostics: POST to Models – Hugging Face returns 404 with headers including x-inference-provider: hf-inference and X-Cache: Error from cloudfront.

Could you confirm whether hosted Inference is enabled for my account/region and whether these models are available for hosted inference? If you need additional request IDs or headers, tell me what to capture and I’ll provide them.

Thanks,
Hirtheesh

The Inference API has been revamped into Inference Providers, and the deployed models have changed significantly. These are the models currently deployed. flan-t5-large does not appear to be deployed and is therefore unavailable.

what model can i use instead of it

It depends on the use case and your budget. The free tier only covers up to $0.01 worth of inference per month…
There seem to be several T5 models available.