Discussions

Ask a Question

Traditional Chinese support?

Hi, does `voyage-multilingual-2`support traditional Chinese? Is there performance metric available for reference?

Where can we find benchmark results for multlingual performance on the language models?

We're trying to create a vectorstore using VoyageAI embeddings for French text. I saw one blog post vaguely mention that the rerank-1 model supports multilingual performance. Where can we find more detailed information on the multilingual performance of different models? Is there a Voyage AI embeddings model instead of a reranker that has good performance on French text?

unable to use voyage-large-2-instruct embeddings in pgvector (for cosine distance)

table.embedding \<=> embeddings(voyage-large-2-instruct)

Get billing and use data via API

I would like to get costs info (API calls, accumulated cost, etc) via API in order to be included in my Grafana panels

When will you support js typescript in your quicklaunch/api?

Thank you! <br>

Do you have a playground/workbench?

Perhaps it's on Hugging face? I'm looking for a way of experimenting with different rolling windows and retrieval query schemes, comparing performance, etc. If you have this, that would great differentiate your product in my book, from a usability perspective. Thank you! <br>

which languages `voyage-law-2` ? does it support russian language ?

which languages `voyage-law-2` ? does it support russian language ?

Compressors

I would love to see an offering of a compressor model like microsoft/llmlingua-2, that we could use both for prompts and RAG results

Asymmetric Embeddings Perform Worse for Code Search

I'm running on an internal benchmark and Voyage has been amazing, about 5% better than OpenAI Ada v3. I was just wondering, has the code model also been instruction fine-tuned? I'm finding that if I add the document flag the overall quality is equal or worse.

VoyageAI Embeddings seem to be very similar for dis-similar documents

I've been experimenting with using VoyageAI embeddings for a project where we are using cosine similarity as a first step in matching semantic equivalence of documents. I've noticed that compared to other embedding models I've tried like OpenAI and Bedrock, the embeddings and hence cosine similarities generated by VoyageAI embeddings are on a much more compressed range. As an example, the docs in the Quick start tutorial example <https://docs.voyageai.com/docs/quickstart-tutorial> have very similar cosines even though the docs are all quite different. Not sure if I'm doing something wrong, but I ran the reranker code for that example too, and the reranked relevance scores match what are shown on that page. The cosines I get for this query and documents are shown below. ```python query = "When is Apple's conference call scheduled?" documents = [ "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.", "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.", "20th-century innovations, from radios to smartphones, centered on electronic advancements.", "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.", "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.", "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature." ] ``` > VoyageAI voyage-2 ```python array([0.57205128, 0.5865394 , 0.62985496, 0.56841758, 0.84377816, 0.56752833]) ``` > OpenAI text-embedding-3-small ```python array([-0.00529196, 0.02914636, 0.14654271, -0.02232341, 0.78637504, -0.00315503]) ``` Obviously they are all relative but it feels weird. Is this just the nature of the VoyageAI embeddings or am I possibly doing something wrong?