Contextualized Chunk Embeddings

Model Choices

Voyage currently provides the following contextualized chunk embedding models:

Model

Per Chunk Context Window

Context Length (tokens)

Embedding Dimension

Description

In Preview:voyage-context-4

32,000

120,000*

1024 (default), 256, 512, 2048

Contextualized chunk embeddings optimized for general-purpose and multilingual retrieval quality.

voyage-context-3

32,000

120,000*

1024 (default), 256, 512, 2048

Contextualized chunk embeddings optimized for general-purpose and multilingual retrieval quality.

To learn more, see the blog post.


Python API

Voyage contextualized chunk embeddings are accessible in Python through the voyageai package. Install the package, set up your API key, and use voyageai.Client.contextualized_embed() to vectorize your inputs.

voyageai.Client.contextualized_embed( inputs: Union[List[List[str]], List[str]], # see below for specifics on when to pass one or the other. model: str, input_type: Optional[str] = None, output_dimension: Optional[int] = None, output_dtype: Optional[str] = "float", enable_auto_chunking: Optional[bool] = False, chunk_size: Optional[int] = 512, chunk_overlap: Optional[int] = 0, chunk_fn: Optional[Callable[[str], List[str]]] = None, )

Parameters

  • inputs (List[List[str]] or List[str]) - The input texts to be vectorized.
  • model (str) - Name of the model. Recommended options: voyage-context-4.
  • input_type (str, optional, defaults to None) - Type of the input text.
    • Options: None, query, document.
    • When input_type is None, the model directly converts the inputs into numerical vectors. For retrieval and search, we recommend setting input_type to query or document. In those cases, Voyage automatically prepends a prompt before vectorizing the input. Embeddings generated with and without input_type are compatible.
  • output_dimension (int, optional, defaults to None) - The number of dimensions for resulting embeddings.
    • Options: 2048, 1024 (default), 512, and 256.
  • output_dtype The data type for returned embeddings.
    • Options: float, int8, uint8, binary, ubinary. See Flexible Dimensions and Quantization for details.
      • float: Each embedding is a list of 32-bit floating-point numbers.
      • int8 and uint8: Each embedding is a list of 8-bit integers.
      • binary and ubinary: Each embedding is a list of 8-bit integers representing bit-packed single-bit values. The returned list length is 1/8 of output_dimension.binaryuses offset binary.
  • enable_auto_chunking (bool, optional, defaults to False) - Whether to automatically chunk each input document on the backend. When True, inputs must be a flat List[str] of full-document strings, and input_type must be document.
  • chunk_size (int, optional) - Target chunk size in tokens when enable_auto_chunking=True. If omitted, the server resolves it to 512. chunk_size must not exceed 32K tokens.
  • chunk_overlap (int, optional, defaults to 0) - Chunk overlap for improved context across chunks in tokens when enable_auto_chunking=True. chunk_overlap must be smaller than chunk_size. Only a valid input when enable_auto_chunking=True.
🚧
  • The listed limits for both chunk_size and chunk_overlap are upper bounds. The actual chunk_size and chunk_overlap can be less than the value passed, but cannot be higher.
  • Overlapping tokens are billed in the same way as input tokens.
  • chunk_fn (Callable[[str], List[str]], optional, defaults to None) - A custom client-side chunking function. If provided, it is applied locally to each input string before the request is sent. For convenience, voyageai.default_chunk_fn is available. Use chunk_fn for client-side chunking only; it cannot be combined with enable_auto_chunking=True.

Returns

  • A ContextualizedEmbeddingsObject, containing the following attributes:
    • results (List[ContextualizedEmbeddingsResult]) - One result per query or document.
      • embeddings (List[List[float]] or List[List[int]]) - Embeddings corresponding to a query, a document, or chunks from the same document. For document chunks, embeddings are ordered to match chunk order.
      • chunk_texts (List[str]) - Chunk text returned by the Python SDK for chunked document results. If you provide a client-side chunk_fn, these correspond to the chunks produced by that function. When enable_auto_chunking=True they correspond to the backend-generated chunks.
      • index (int) - The index of the query or document in the input list.
    • total_tokens (int) - The total number of tokens in the input texts.

Example: See our quickstart below.


REST API

Voyage contextualized chunk embeddings can be accessed by calling the endpoint POST https://api.voyageai.com/v1/contextualizedembeddings. See the Contextualized Chunk Embeddings API Reference for the specification.

Example

curl -X POST https://api.voyageai.com/v1/contextualizedembeddings \
  -H "Authorization: Bearer $VOYAGE_API_KEY" \
  -H "content-type: application/json" \
  -d '
  {
    "inputs": [
      "This is the SEC filing on Leafy Inc.\u0027s Q2 2024 performance.\nThe company\u0027s revenue increased by 15% compared to the previous quarter.",
      "This is the SEC filing on Elephant Ltd.\u0027s Q2 2024 performance.\nThe company\u0027s revenue decreased by 2% compared to the previous quarter."
    ],
    "input_type": "document",
    "model": "voyage-context-4",
    "enable_auto_chunking": true,
    "chunk_size": 512,
    "chunk_overlap": 0
  }'

Response Shape

{
  "data": [
    {
      "data": [
        { "embedding": [...], "index": 0, "text": "chunk text here" },
        { "embedding": [...], "index": 1, "text": "..." }
      ],
      "index": 0
    }
  ],
  "model": "voyage-context-4",
  "usage": { "total_tokens": 100 },
  "chunker_version": "1.0.0"
}

Inputs Validation


Use Caseinputs shapeinput_typeenable_auto_chunkingchunk_fnNotes
Embed pre-chunked documentsList[List[str]]documentFalse or omittedOmittedEach inner list contains one document's chunks
Client-side chunking and embeddingList[List[str]]documentFalse or omittedProvidedCommon pattern: one full document string per inner list
Auto chunking and embeddingList[str]documentTrueOmittedchunk_size and chunk_over
Embed queriesList[str] or List[List[str]]queryFalse or omittedOmittedIf nested, each inner list should contain a single query

TypeScript Library

Voyage text embeddings are accessible in TypeScript through the Voyage TypeScript Library, which exposes all the functionality of our text embeddings endpoint (see Contextualized Chunk Embeddings API Reference).


Quickstart

This quickstart demonstrates getting started with each of the supported use cases for voyage-context-4. Each section includes a working example snippet and a list of valid parameters for the associated use case.

Auto Chunking and Embedding

Use this when you want Voyage to split each full document into chunks for you.

import voyageai

vo = voyageai.Client()

documents = [
    "This is the SEC filing on Leafy Inc.'s Q2 2024 performance.\nThe company's revenue increased by 15% compared to the previous quarter.",
    "This is the SEC filing on Elephant Ltd.'s Q2 2024 performance.\nThe company's revenue decreased by 2% compared to the previous quarter.",
]

result = vo.contextualized_embed(
    model="voyage-context-4",
    inputs=documents,
    input_type="document",
    enable_auto_chunking=True,
    chunk_size=512,
    chunk_overlap=0,
)

Embed Pre-Chunked Documents

Use this when you have already split each document into chunks on the client side.

import voyageai

vo = voyageai.Client()

inputs = [
    [
        "This is the SEC filing on Leafy Inc.'s Q2 2024 performance.",
        "The company's revenue increased by 15% compared to the previous quarter.",
    ],
    [
        "This is the SEC filing on Elephant Ltd.'s Q2 2024 performance.",
        "The company's revenue decreased by 2% compared to the previous quarter.",
    ],
]

result = vo.contextualized_embed(
    model="voyage-context-4",
    inputs=inputs,
    input_type="document",
)

Each inner list is embedded as a group, so each chunk is encoded in the context of the other chunks from the same document.

Client-Side Chunking and Embedding

Use this when you have full documents and want to handle chunking locally in the Python SDK rather than in the backend.

import voyageai

vo = voyageai.Client()

inputs = [
    [
        "This is the SEC filing on Leafy Inc.'s Q2 2024 performance.\nThe company's revenue increased by 15% compared to the previous quarter.",
    ],
    [
        "This is the SEC filing on Elephant Ltd.'s Q2 2024 performance.\nThe company's revenue decreased by 2% compared to the previous quarter.",
    ],
]

result = vo.contextualized_embed(
    model="voyage-context-4",
    inputs=inputs,
    input_type="document",
    chunk_fn=voyageai.default_chunk_fn,
)

chunk_fn is applied locally to each input string before the request is sent.

Embedding Queries

Use this when your inputs are search queries.

import voyageai

vo = voyageai.Client()

result = vo.contextualized_embed(
    model="voyage-context-4",
    inputs=[
        "What was the revenue growth for Leafy Inc. in Q2 2024?",
        "What changed in Greenery Corp. between Q1 and Q2 2024?",
    ],
    input_type="query",
)

The following query input shape is also valid and is treated equivalently:

result = vo.contextualized_embed(
    model="voyage-context-4",
    inputs=[
        ["What was the revenue growth for Leafy Inc. in Q2 2024?"],
        ["What changed in Greenery Corp. between Q1 and Q2 2024?"],
    ],
    input_type="query",
)

Returned Chunk Text

The response contains contextualized embedding results for each query or document. For document chunks, embeddings are ordered to match chunk order. In the Python SDK, returned chunk text is available through chunk_texts on the result object. In the REST API, returned chunk text is available as text on each embedding item.

If you provide a client-side chunking function, the returned chunk text corresponds to the chunks produced by that function. When enable_auto_chunking=True, the response also includes the backend-generated chunk text for each returned embedding so you can inspect and store it.

Input Constraints

The following constraints apply to a request:

  • The list must not contain more than 1,000 inputs.
  • The total number of tokens across all inputs must not exceed 120K.
  • The total number of chunks across all inputs must not exceed 16K.
  • chunk_size and chunk_overlap require enable_auto_chunking=True.
  • chunk_overlap must be smaller than chunk_size.

Common Invalid Parameter Combinations

  • List[str] document inputs with enable_auto_chunking=False are invalid.
  • enable_auto_chunking=True requires input_type="document".
  • Do not use chunk_fn together with enable_auto_chunking=True.

Tutorial

For a full tutorial on using contextualized chunk embeddings, see Contextualized Chunk Embeddings: Combining Local Detail with Global Context. The Jupyter Notebook for this tutorial is available on GitHub in the GenAI Showcase repository.