Vertex AI RAG Engine supported models¶

The VPC-SC security control is supported by RAG Engine. Data residency, CMEK, and AXT security controls aren't supported.

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support Vertex AI RAG Engine.

Gemini models¶

The following table lists the Gemini models and their versions that support Vertex AI RAG Engine:

Self-deployed models¶

Vertex AI RAG Engine supports all models in Model Garden.

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
ENDPOINT_ID: Your endpoint ID.

# Create a model instance with your self-deployed open model endpoint
rag_model = GenerativeModel(
"projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
tools=[rag_retrieval_tool]
)

Models with managed APIs on Vertex AI¶

The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the model card.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

# Create a model instance with Llama 3.1 MaaS endpoint
rag_model = GenerativeModel(
"projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
tools=RAG_RETRIEVAL_TOOL
)

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
MODEL_ID: LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas.
INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
RAG_CORPUS_ID: The ID of the RAG corpus resource.
ROLE: Your role.
USER: Your username.
CONTENT: Your content.

# Generate a response with Llama 3.1 MaaS endpoint
response = client.chat.completions.create(
model="MODEL_ID",
messages=[{"ROLE": "USER", "content": "CONTENT"}],
extra_body={
"extra_body": {
"google": {
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_ID"
},
"similarity_top_k": 10
}
}
}
},
)

What's next¶

Use Embedding models with Vertex AI RAG Engine.