General

Why do Voyage embeddings have superior quality?

Embedding models, much like generative models, rely on powerful neural network (and often transformer-based) architecture to capture and compress semantic context. And, much like generative models, they’re incredibly hard to train. We are a team of leading AI researchers who had experience in training embedding models for 5+ years. We make all the components right, from model architecture and data collection to selecting suitable loss functions and optimizers. Please see our blog post for more details.


Model

What embedding models are available, and which one should I use?

For general-purpose embedding, our default recommendation is voyage-3 for quality and voyage-3-lite for latency and low cost. For retrieval, please use the input_type parameter to specify whether the text is a query or document, which adds instructions on the backend.

If your application is in a domain addressed by one of our domain-specific embedding models, we recommend using that model. Specifically:

  • voyage-law-2 is recommended for retrieval tasks in the legal domain.
  • voyage-code-2 is recommended for code-related tasks and programming documentation.
  • voyage-finance-2 is recommended for finance-related tasks.
  • voyage-multilingual-2 is recommended for multilingual tasks.

Which similarity function should I use?

You can use Voyage embeddings with either dot-product similarity, cosine similarity, or Euclidean distance. An explanation about embedding similarity can be found here.

Voyage AI embeddings are normalized to length 1, which means that:

  • Cosine similarity is equivalent to dot-product similarity, while the latter can be computed more quickly.
  • Cosine similarity and Euclidean distance will result in the identical rankings.

What is the relationship between characters, words, and tokens?

Please see this page.

When and how should I use the input_type parameter?

For all retrieval tasks and use cases (e.g., RAG), we recommend that the input_type parameter be used to specify whether the input text is a query or document. Do not omit input_type or set input_type=None. Specifying whether input text is a query or document can create better dense vector representations for retrieval, which can lead to better retrieval quality.

When using the input_type parameter, special prompts are prepended to the input text prior to embedding. Specifically:

📘

Prompts associated with input_type

  • For a query, the prompt is “Represent the query for retrieving supporting documents: “.
  • For a document, the prompt is “Represent the document for retrieval: “.
Example
  • When input_type="query", a query like "When is Apple's conference call scheduled?" will become "Represent the query for retrieving supporting documents: When is Apple's conference call scheduled?"
  • When input_type="document", a query like "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET." will become "Represent the document for retrieval: Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET."

voyage-large-2-instruct, as the name suggests, is trained to be responsive to additional instructions that are prepended to the input text. For classification, clustering, or other MTEB subtasks, please use the instructions here.

What is the total number of tokens for the rerankers?

We define the total number of tokens as the “(number of query tokens × the number of documents) + sum of the number of tokens in all documents". This cannot exceed 300K. However, if you are latency-sensitive, we recommend you to use rerank-2-lite and use no more than 200K total tokens per request.


Usage

How do I get the Voyage API key?

Upon creating an account, we instantly generate an API key for you. Once signed in, access your API key by clicking the "Create new API key" button in the dashboard.

What are the rate limits for the Voyage API?

Please see the rate limit guide.

How can I retrieve nearest text quickly if I have a large corpus?

To efficiently retrieve the nearest texts from a sizable corpus, you can use a vector database. Here are some common choices:

  • Pinecone, a fully managed vector database
  • Zilliz, a vector database for enterprise
  • Chroma, an open-source embeddings store
  • Elasticsearch, a popular search/analytics engine and vector database
  • Milvus, a vector database built for scalable similarity search
  • Qdrant, a vector search engine
  • Weaviate, an open source, AI-native vector database

Pricing

When will I receive the bill?

The first 50 million tokens are free for every account, and subsequent usage is priced on a per-token basis.

You can add payment methods to your account in the dashboard. We will bill monthly. You can expect a credit card charge around the 2rd of each month for the usage of the past month.


Others

Is fine-tuning available?

Currently we offer fine-tuned embeddings through subscription. Please email Tengyu Ma (CEO) at [email protected] if you are interested.

How to contact us?

Please email us at [email protected] for inquiries and customer support.

How to get updates from Voyage?

Follow us on twitter and/or linkedin for more updates!

To subscribe to our newsletter, feel free to send us an email at [email protected].