Vicuan error on Sagemaker

Hi folks,

I have been trying to deploy TheBloke/vicuna-7B-1.1-HF to SageMaker but with no luck. I have not had problems with other models like bloom-3b. I used the following code to deploy:

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'TheBloke/vicuna-7B-1.1-HF',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.26.0',
	pytorch_version='1.13.1',
	py_version='py39',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.2xlarge' # ec2 instance type
)

predictor.predict({
	"inputs": "Can you please let us know more details about your ",
})

However, I am getting the following error when predicting:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

Any help is appreciated.

can you try to use the new LLM container? Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

Thank you! It works with this new container.

@philschmid

I am using following code

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
import boto3
sess = sagemaker.Session()

sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
sagemaker_session_bucket = sess.default_bucket()

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client(‘iam’)
role = iam.get_role(RoleName=‘sagemaker_execution_role’)[‘Role’][‘Arn’]

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

hub = {
‘HF_MODEL_ID’:‘model id’,
‘HF_TASK’:‘text-classification’,
‘HF_TOKEN’:‘token’
}

huggingface_model = HuggingFaceModel(
transformers_version=‘4.45.2’,
pytorch_version=‘2.2.0’,
py_version=‘py310’,
env=hub,
role=role,
)

predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type=‘ml.m5.xlarge’, # ec2 instance type
endpoint_name = ‘classifier’
)

predictor.predict({
“inputs”: “I like you. I love you”,
})

But it is giving me error as follows
ValueError: Unsupported huggingface version: 4.45.2. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface versions. Supported huggingface version(s): 4.6.1, 4.10.2, 4.11.0, 4.12.3, 4.17.0, 4.26.0, 4.28.1, 4.37.0, 4.6, 4.10, 4.11, 4.12, 4.17, 4.26, 4.28, 4.37.

I am using sagemaker jupyter notebook to deploy with sagemaker version 2.23.2