Discussions

Ask a Question

unable to use voyage-large-2-instruct embeddings in pgvector (for cosine distance)

table.embedding \<=> embeddings(voyage-large-2-instruct)

Get billing and use data via API

I would like to get costs info (API calls, accumulated cost, etc) via API in order to be included in my Grafana panels

When will you support js typescript in your quicklaunch/api?

Thank you! <br>

Do you have a playground/workbench?

Perhaps it's on Hugging face? I'm looking for a way of experimenting with different rolling windows and retrieval query schemes, comparing performance, etc. If you have this, that would great differentiate your product in my book, from a usability perspective. Thank you! <br>

which languages `voyage-law-2` ? does it support russian language ?

which languages `voyage-law-2` ? does it support russian language ?

Compressors

I would love to see an offering of a compressor model like microsoft/llmlingua-2, that we could use both for prompts and RAG results

Asymmetric Embeddings Perform Worse for Code Search

I'm running on an internal benchmark and Voyage has been amazing, about 5% better than OpenAI Ada v3. I was just wondering, has the code model also been instruction fine-tuned? I'm finding that if I add the document flag the overall quality is equal or worse.

VoyageAI Embeddings seem to be very similar for dis-similar documents

I've been experimenting with using VoyageAI embeddings for a project where we are using cosine similarity as a first step in matching semantic equivalence of documents. I've noticed that compared to other embedding models I've tried like OpenAI and Bedrock, the embeddings and hence cosine similarities generated by VoyageAI embeddings are on a much more compressed range. As an example, the docs in the Quick start tutorial example <https://docs.voyageai.com/docs/quickstart-tutorial> have very similar cosines even though the docs are all quite different. Not sure if I'm doing something wrong, but I ran the reranker code for that example too, and the reranked relevance scores match what are shown on that page. The cosines I get for this query and documents are shown below. ```python query = "When is Apple's conference call scheduled?" documents = [ "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.", "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.", "20th-century innovations, from radios to smartphones, centered on electronic advancements.", "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.", "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.", "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature." ] ``` > VoyageAI voyage-2 ```python array([0.57205128, 0.5865394 , 0.62985496, 0.56841758, 0.84377816, 0.56752833]) ``` > OpenAI text-embedding-3-small ```python array([-0.00529196, 0.02914636, 0.14654271, -0.02232341, 0.78637504, -0.00315503]) ``` Obviously they are all relative but it feels weird. Is this just the nature of the VoyageAI embeddings or am I possibly doing something wrong?

Amount of paramteres for voyage-2

Hello I am doing a project for school and am trying to compare model sizes based on parameters. Would you be able to tell me the amount of parameters this model uses?

Retrieval performance for various european languages

OpenAI's new embedding models seem to work pretty well across a number of european languages (French, Spanish, Italian etc.). I am thinking of switching from OpenAI to Voyage for embeddings. Have your models been trained across text data in a number of languages? If so, do you have any performance benchmarks for say French vs English etc? Thanks!