Contextualized chunk embedding models

post

https://api.voyageai.com/v1/contextualizedembeddings

The Voyage contextualized chunk embedding endpoint accepts document chunks—in addition to queries and full documents—and returns a response containing contextualized chunk vector embeddings. These contextualized chunk embeddings capture not only the local details within each chunk but also global, coarse-grained metadata from the entire document.

Body Params

inputs

array of arrays of strings

required

A list of lists, where each inner list contains a query, a document, or document chunks to be vectorized.

Each inner list in inputs represents a set of text elements that will be embedded together. Each element in the list is encoded not just independently, but also encodes context from the other elements in the same list.

 inputs = [["text_1_1", "text_1_2", ..., "text_1_n"],
          ["text_2_1", "text_2_2", ..., "text_2_m"]]

Document Chunks. Most commonly, each inner list contains chunks from a single document, ordered by their position in the document. In this case:

 inputs = [["doc_1_chunk_1", "doc_1_chunk_2", ..., "doc_1_chunk_n"],
          ["doc_2_chunk_1", "doc_2_chunk_2", ..., "doc_2_chunk_m"]]

Each chunk is encoded in context with the others from the same document, resulting in more context-aware embeddings. We recommend that supplied chunks not have any overlap.

Context-Agnostic Behavior for Queries and Documents. If there is one element per inner list, each text is embedded independently—similar to standard (context-agnostic) embeddings:

 inputs = [["query_1"], ["query_2"], ..., ["query_k"]]
 inputs = [["doc_1"], ["doc_2"], ..., ["doc_k"]]

Therefore, if the inputs are queries, each inner list should contain a single query (i.e., a length of one), as shown above, and the input_type should be set to query.

The following constraints apply to the inputs list:

The list must not contain more than 1,000 inputs.
The total number of tokens across all inputs must not exceed 120K.
The total number of chunks across all inputs must not exceed 16K.

inputs*

model

string

required

Name of the model. Recommended options: voyage-context-3.

input_type

string | null

enum

Defaults to null

Type of the input text. Defaults to null. Other options: query, document.

When input_type is null, the embedding model directly converts the inputs into numerical vectors. For retrieval/search purposes, where a "query" is used to search for relevant information among a collection of data referred to as "documents," we recommend specifying whether your inputs are intended as queries or documents by setting input_type to query or document, respectively. In these cases, Voyage automatically prepends a prompt to your inputs before vectorizing them, creating vectors more tailored for retrieval/search tasks. Embeddings generated with and without the input_type argument are compatible.
For transparency, the following prompts are prepended to your input.

For query, the prompt is "Represent the query for retrieving supporting documents: ".
For document, the prompt is "Represent the document for retrieval: ".

Allowed:

output_dimension

integer | null

Defaults to null

The number of dimensions for resulting output embeddings. Defaults to null. voyage-context-3 supports the following output_dimension values: 2048, 1024 (default), 512, and 256. If set to null, the model uses the default value of 1024.

output_dtype

string

enum

Defaults to float

The data type for the embeddings to be returned. Defaults to float. Other options: int8, uint8, binary, ubinary. Please see our guide for more details about output data types.

float: Each returned embedding is a list of 32-bit (4-byte) single-precision floating-point numbers. This is the default and provides the highest precision / retrieval accuracy.
int8 and uint8: Each returned embedding is a list of 8-bit (1-byte) integers ranging from -128 to 127 and 0 to 255, respectively.
binary and ubinary: Each returned embedding is a list of 8-bit integers that represent bit-packed, quantized single-bit embedding values: int8 for binary and uint8 for ubinary. The length of the returned list of integers is 1/8 of output_dimension (which is the actual dimension of the embedding). The binary type uses the offset binary method. Please refer to our guide for details on offset binary and binary embeddings.

Allowed:

encoding_format

string | null

enum

Defaults to null

Format in which the embeddings are encoded. Defaults to null. Other options: base64.

If null, each embedding is an array of float numbers when output_dtype is set to float and as an array of integers for all other values of output_dtype (int8, uint8, binary, and ubinary). See output_dtype for more details.
If base64, the embeddings are represented as a Base64-encoded NumPy array of:

Floating-point numbers (numpy.float32) for output_dtype set to float.
Signed integers (numpy.int8) for output_dtype set to int8 or binary.
Unsigned integers (numpy.uint8) for output_dtype set to uint8 or ubinary.

Allowed:

Responses

5XX

Server Error

This indicates our servers are experiencing high traffic or having an unexpected issue. Please see our Error Codes guide.

200Success.

4XXClient error This indicates an issue with the request format or frequency. Please see our Error Codes guide.