Support sparsevec in weighted vector search function#328
Conversation
Benchmarks
|
863cdc9 to
5e89295
Compare
5e89295 to
b689c2d
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
PR Summary
This pull request adds support for sparsevec in weighted vector search functions, focusing on enhancing the functionality and flexibility of vector operations in the Lantern project. Here are the key points:
-
Introduced new overloads for weighted_vector_search functions to handle combinations of vector and sparsevec types in
sql/lantern.sqlandsql/updates/0.3.1--0.3.2.sql. -
Added a new
small_world_sparsevec.sqlfile with sample data for testing sparse vector functionality alongside dense vectors. -
Updated
test/sql/weighted_search.sqlto include conditional tests for sparsevec, ensuring compatibility with systems that may not have the type available. -
Modified
docker/Dockerfile.devto upgrade pgvector to version 0.6.1 and simplify the development environment setup process. -
Enhanced input validation, error handling, and vector type casting in the weighted vector search functions to improve robustness and flexibility.
6 file(s) reviewed, 24 comment(s)
Edit PR Review Bot Settings
| RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared | ||
|
|
||
| RUN cd /root/postgresql-15.5/contrib && make install -j | ||
| RUN cd /root/postgresql-15.5/contrib && make install |
There was a problem hiding this comment.
info: Removed '-j' flag from make install. This might slow down the build process.
| && mkdir build \ | ||
| && cd build \ | ||
| && cmake -DCMAKE_BUILD_TYPE=Debug .. \ | ||
| && cmake .. \ |
There was a problem hiding this comment.
info: Removed '-DCMAKE_BUILD_TYPE=Debug'. This changes the build type from Debug to default (usually Release). Ensure this is intentional and won't affect development workflows.
|
|
||
| # Install perf | ||
| RUN sudo apt update && sudo apt install -y linux-tools-common linux-tools-generic linux-tools-`uname -r` | ||
| RUN sudo apt update && sudo apt install -y linux-tools-common linux-tools-generic |
There was a problem hiding this comment.
info: Removed 'linux-tools-uname -r'. This might cause issues if specific kernel version tools are needed.
docker/Dockerfile.dev
Outdated
| @@ -1,5 +1,5 @@ | |||
| ARG VERSION=15 | |||
| ARG PGVECTOR_VERSION=0.5.1 | |||
| ARG PGVECTOR_VERSION=0.6.1 | |||
There was a problem hiding this comment.
info: Upgraded pgvector from 0.5.1 to 0.6.1. Verify compatibility with the rest of the system.
| @@ -667,6 +667,7 @@ CREATE OR REPLACE FUNCTION _lantern_internal.maybe_setup_weighted_vector_search( | |||
| $weighted_vector_search$ | |||
| DECLARE | |||
| pgvector_exists boolean; | |||
There was a problem hiding this comment.
info: Added check for pgvector sparsevec type existence
| ('000', TRUE, '[0,0,0]', '{}/3'), | ||
| ('001', TRUE, '[0,0,1]', '{3:1}/3'), | ||
| ('010', FALSE, '[0,1,0]' , '{2:1}/3'), | ||
| ('011', TRUE, '[0,1,1]', '{2:1,3:1}/3'), | ||
| ('100', FALSE, '[1,0,0]', '{1:1}/3'), | ||
| ('101', FALSE, '[1,0,1]', '{1:1,3:1}/3'), | ||
| ('110', FALSE, '[1,1,0]', '{1:1,2:1}/3'), | ||
| ('111', TRUE, '[1,1,1]', '{1:1,2:1,3:1}/3'); No newline at end of file |
There was a problem hiding this comment.
info: Inserted test data covering all 3-bit combinations, with corresponding sparse vectors
test/sql/weighted_search.sql
Outdated
| -- Check if the type 'sparsevec' exists and store the result in a variable | ||
| SELECT EXISTS ( | ||
| SELECT 1 | ||
| FROM pg_type | ||
| WHERE typname = 'sparsevec' | ||
| ) AS exists_sparsevec \gset |
There was a problem hiding this comment.
info: Added check for 'sparsevec' type existence
test/sql/weighted_search.sql
Outdated
| \if :exists_sparsevec | ||
| \echo 'The sparsevec type exists. Running commands...' | ||
| \ir utils/small_world_sparsevec.sql | ||
| SELECT '{1:0.4,2:0.3,3:0.2}/3' AS s3 \gset | ||
| SELECT '[-0.5,-0.1,-0.3]' AS v3 \gset | ||
| SELECT | ||
| id, | ||
| 0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as dist | ||
| FROM lantern.weighted_vector_search(CAST(NULL as "small_world"), exact => false, ef => 5, | ||
| w1=> 0.9, col1=>'s'::text, vec1=>:'s3'::sparsevec, | ||
| w2=> 0.1, col2=>'v'::text, vec2=>:'v3'::vector | ||
| ); | ||
| \else | ||
| \echo 'The sparsevec type does not exist. Skipping commands...' | ||
| \endif |
There was a problem hiding this comment.
info: Introduced conditional execution for sparsevec tests
test/sql/weighted_search.sql
Outdated
| SELECT | ||
| id, | ||
| 0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as dist | ||
| FROM lantern.weighted_vector_search(CAST(NULL as "small_world"), exact => false, ef => 5, | ||
| w1=> 0.9, col1=>'s'::text, vec1=>:'s3'::sparsevec, | ||
| w2=> 0.1, col2=>'v'::text, vec2=>:'v3'::vector | ||
| ); |
There was a problem hiding this comment.
info: New test case for weighted vector search with sparsevec and vector types
There was a problem hiding this comment.
Should we do something like this?
round(cast(0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as numeric), 2) as distOn arm processors the high precision floats seems to slightly differ from x86 ones which makes tests to fail
test/sql/weighted_search.sql
Outdated
| SELECT '{1:0.4,2:0.3,3:0.2}/3' AS s3 \gset | ||
| SELECT '[-0.5,-0.1,-0.3]' AS v3 \gset |
There was a problem hiding this comment.
style: Consider using more descriptive variable names than 's3' and 'v3'
test/sql/weighted_search.sql
Outdated
| ); | ||
|
|
||
| -- Check if the type 'sparsevec' exists and store the result in a variable | ||
| SELECT EXISTS ( |
There was a problem hiding this comment.
could we follow the current pattern and not do this in the test itself?
Maybe, see how we filter out pgvector or lantern extras tests when relevant extensions are not installed and do something similar here
| RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared | ||
|
|
||
| RUN cd /root/postgresql-15.5/contrib && make install -j | ||
| RUN cd /root/postgresql-15.5/contrib && make install |
There was a problem hiding this comment.
440.5 /usr/bin/install -c -m 755 postgres_fdw.so '/usr/local/pgsql/lib/postgres_fdw.so'
440.5 /usr/bin/install -c -m 644 ./postgres_fdw.control '/usr/local/pgsql/share/extension/'
440.6 /usr/bin/install -c -m 644 ./postgres_fdw--1.0.sql ./postgres_fdw--1.0--1.1.sql '/usr/local/pgsql/share/extension/'
440.6 make[1]: Leaving directory '/root/postgresql-15.5/contrib/postgres_fdw'
------
Dockerfile.dev:34
--------------------
32 | RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared
33 |
34 | >>> RUN cd /root/postgresql-15.5/contrib && make install -j
35 |
36 | # allow non-root users to install in the container to make it easier to run update-tests
--------------------
ERROR: failed to solve: process "/bin/sh -c cd /root/postgresql-15.5/contrib && make install -j" did not complete successfully: exit code: 2
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/ojyc3i59rx9nve3a7qeqpdid7
diqi@Dis-Laptop lantern %
The error I get with -j
| && mkdir build \ | ||
| && cd build \ | ||
| && cmake -DCMAKE_BUILD_TYPE=Debug .. \ | ||
| && cmake .. \ |
There was a problem hiding this comment.
why change this?
The dockerfile is meant for development and I often use GDB from inside it, so having a build with symbols is often good.
Though, I usually attach a folder from host and rebuild the DB for debugging so not a big deal but would still like to know why the change.
There was a problem hiding this comment.
The build did not succeed for me without it
4c52324 to
6d6c72d
Compare
6d6c72d to
17bc1f5
Compare
No description provided.