Skip to content

Add embeddings storage and search#44

Open
dylanmcreynolds wants to merge 2 commits intomainfrom
embeddings
Open

Add embeddings storage and search#44
dylanmcreynolds wants to merge 2 commits intomainfrom
embeddings

Conversation

@dylanmcreynolds
Copy link
Copy Markdown
Contributor

Many workflows want the ability to produce, store and search on embeddings.

This PR attempts to fit in that workflow by taking advantage of the postgres pgvector storage and search. This is a little experimental, frameworks like FAISS provide really fancy math for the creation and search of embeddings. But doing it in pgvector first gives us something to work with and evaluate.

This PR adds rest endpoint for storing and retrieving embeddings (because GraphQL requires json for data) and to be honest, right now for ease, the embeddings are in in json too. But in the future we can come up with a more efficient serialization strategy (probably involving numpy).

This also adds search for entities based on embeddings. This part is added to GraphQL.

There is also an embedding_models table where we can store information about which model each embedding is from, and match that to a URL in a service like MLFlow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant