Skip to content

Get image descriptions using visual captioning

Note: The Gemini API can generate descriptions based on multiple image inputs, while Imagen can process one image in each input.

Visual captioning lets you generate a relevant description for an image. You can use this information for a variety of uses:

  • Get more detailed metadata about images for storing and searching.
  • Generate automated captioning to support accessibility use cases.
  • Receive quick descriptions of products and visual assets.

Image source: Santhosh Kumar on Unsplash (cropped)

Caption (short-form): a blue shirt with white polka dots is hanging on a hook

Supported languages

Visual captioning is available in the following languages:

  • English (en)
  • French (fr)
  • German (de)
  • Italian (it)
  • Spanish (es)

Performance and limitations

The following limits apply when you use this model:

Limits Value
Maximum number of API requests (short-form) per minute per project 500
Maximum number of tokens returned in response (short-form) 64 tokens
Maximum number of tokens accepted in request (VQA short-form only) 80 tokens

The following service latency estimates apply when you use this model. These values are meant to be illustrative and are not a promise of service:

Latency Value
API requests (short-form) 1.5 seconds

Locations

A location is a region you can specify in a request to control where data is stored at rest. For a list of available regions, see Generative AI on Vertex AI locations.

Responsible AI safety filtering

The image captioning and Visual Question Answering (VQA) feature model doesn't support user-configurable safety filters. However, the overall Imagen safety filtering occurs on the following data:

  • User input
  • Model output

As a result, your output may differ from the sample output if Imagen applies these safety filters. Consider the following examples.

Filtered input

If the input is filtered, the response is similar to the following:

```python { "error": { "code": 400, "message": "Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please