Skip to content

Commit 94c8c54

Browse files
committed
Added ColBERT example [skip ci]
1 parent 7fbb252 commit 94c8c54

File tree

3 files changed

+50
-0
lines changed

3 files changed

+50
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Or check out some examples:
3232
- [Hybrid search](https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search_rrf.py) with SentenceTransformers (Reciprocal Rank Fusion)
3333
- [Hybrid search](https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search.py) with SentenceTransformers (cross-encoder)
3434
- [Sparse search](https://github.com/pgvector/pgvector-python/blob/master/examples/sparse_search.py) with Transformers
35+
- [Late interaction search](https://github.com/pgvector/pgvector-python/blob/master/examples/colbert_exact.py) with ColBERT
3536
- [Image search](https://github.com/pgvector/pgvector-python/blob/master/examples/pytorch_image_search.py) with PyTorch
3637
- [Image search](https://github.com/pgvector/pgvector-python/blob/master/examples/hash_image_search.py) with perceptual hashing
3738
- [Morgan fingerprints](https://github.com/pgvector/pgvector-python/blob/master/examples/morgan_fingerprints.py) with RDKit

examples/colbert_exact.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
from colbert.infra import ColBERTConfig
2+
from colbert.modeling.checkpoint import Checkpoint
3+
import numpy as np
4+
from pgvector.psycopg import register_vector
5+
import psycopg
6+
7+
conn = psycopg.connect(dbname='pgvector_example', autocommit=True)
8+
9+
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
10+
register_vector(conn)
11+
12+
conn.execute('DROP TABLE IF EXISTS documents')
13+
conn.execute('CREATE TABLE documents (id bigserial PRIMARY KEY, content text, embeddings vector(128)[])')
14+
conn.execute("""
15+
CREATE OR REPLACE FUNCTION max_sim(document vector[], query vector[]) RETURNS double precision AS $$
16+
WITH queries AS (
17+
SELECT row_number() OVER () AS query_number, * FROM (SELECT unnest(query) AS query)
18+
),
19+
documents AS (
20+
SELECT unnest(document) AS document
21+
),
22+
similarities AS (
23+
SELECT query_number, 1 - (document <=> query) AS similarity FROM queries CROSS JOIN documents
24+
),
25+
max_similarities AS (
26+
SELECT MAX(similarity) AS max_similarity FROM similarities GROUP BY query_number
27+
)
28+
SELECT SUM(max_similarity) FROM max_similarities
29+
$$ LANGUAGE SQL
30+
""")
31+
32+
checkpoint = Checkpoint('colbert-ir/colbertv2.0', colbert_config=ColBERTConfig())
33+
34+
input = [
35+
'The dog is barking',
36+
'The cat is purring',
37+
'The bear is growling'
38+
]
39+
doc_embeddings = checkpoint.docFromText(input)
40+
for content, embeddings in zip(input, doc_embeddings):
41+
embeddings = [e.numpy() for e in embeddings if e.count_nonzero() > 0]
42+
conn.execute('INSERT INTO documents (content, embeddings) VALUES (%s, %s)', (content, embeddings))
43+
44+
query = 'puppy'
45+
query_embeddings = [e.numpy() for e in checkpoint.queryFromText([query], bsize=1)[0]]
46+
result = conn.execute('SELECT content, max_sim(embeddings, %s) AS max_sim FROM documents ORDER BY max_sim DESC LIMIT 5', (query_embeddings,)).fetchall()
47+
for row in result:
48+
print(row)

examples/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
cohere
2+
colbert-ai
23
datasets
34
gensim
45
imagehash

0 commit comments

Comments
 (0)