Initial implementation of pgvector and Oracle 26ai vector search demo
Three FastAPI backends comparing PostgreSQL/pgvector and Oracle 26ai for semantic image search using CLIP embeddings: Python-side embedding for both databases, plus Oracle in-database embedding via VECTOR_EMBEDDING(CLIP_TXT). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,3 @@
|
||||
.env
|
||||
__pycache__/
|
||||
photos/
|
||||
@@ -0,0 +1,466 @@
|
||||
# Vector Image Search — PostgreSQL/pgvector vs Oracle 26ai
|
||||
|
||||
A comparative demo that vectorizes JPEG photos using the CLIP neural network model
|
||||
and stores the embeddings in two different databases: **PostgreSQL with pgvector**
|
||||
and **Oracle AI Database 26ai**. Users search the photo collection by typing
|
||||
plain-text keywords such as "trees" or "water" and receive results ranked by
|
||||
semantic similarity.
|
||||
|
||||
Three backends are implemented, demonstrating two fundamental approaches to vector
|
||||
embedding:
|
||||
|
||||
| Backend | Port | Embedding location | Model |
|
||||
|---|---|---|---|
|
||||
| PostgreSQL + pgvector | 8000 | Python (external) | sentence-transformers CLIP |
|
||||
| Oracle 26ai (Python embedding) | 8001 | Python (external) | sentence-transformers CLIP |
|
||||
| Oracle 26ai (in-database embedding) | 8002 | Inside Oracle SQL | Oracle native CLIP_TXT |
|
||||
|
||||
The key architectural difference: in the third backend, the text query is embedded
|
||||
**inside a SQL statement** using Oracle's `VECTOR_EMBEDDING()` function — no Python
|
||||
ML library is loaded or called at search time.
|
||||
|
||||
---
|
||||
|
||||
## Architecture overview
|
||||
|
||||
```
|
||||
115 JPEG photos
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────────┐
|
||||
│ CLIP model (clip-ViT-B-32) │
|
||||
│ sentence-transformers lib │
|
||||
│ → 512-dimensional float vec │
|
||||
└──────────────┬────────────────┘
|
||||
│
|
||||
┌──────────────┴──────────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────────────┐ ┌──────────────────────┐ ┌───────────────────────┐
|
||||
│ PostgreSQL 16 │ │ Oracle 26ai │ │ Oracle 26ai │
|
||||
│ + pgvector 0.6.0 │ │ (version 23.26.1) │ │ (version 23.26.1) │
|
||||
│ database: │ │ PDB: FREEPDB1 │ │ PDB: FREEPDB1 │
|
||||
│ vectors_demo │ │ user: vectors_user │ │ schema: VECTOR │
|
||||
│ HNSW index │ │ HNSW index │ │ HNSW not needed │
|
||||
└────────┬─────────────┘ └──────────┬───────────┘ └──────────┬────────────┘
|
||||
│ │ │
|
||||
▼ ▼ │
|
||||
Python CLIP encode Python CLIP encode Text stays in Oracle SQL
|
||||
(search query) (search query) VECTOR_EMBEDDING(CLIP_TXT
|
||||
USING :q AS data)
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
|
||||
│ FastAPI │ │ FastAPI │ │ FastAPI │
|
||||
│ main.py │ │ main_oracle │ │ main_oracle_ │
|
||||
│ port 8000 │ │ port 8001 │ │ indb.py │
|
||||
└──────┬───────┘ └──────┬───────┘ │ port 8002 │
|
||||
│ │ └────────┬─────────┘
|
||||
▼ ▼ ▼
|
||||
frontend/index.html frontend/index.html frontend/index_indb.html
|
||||
(badge: pgvector) (badge: Oracle 26ai) (badge: Oracle In-DB)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Project structure
|
||||
|
||||
```
|
||||
pgvector-demo/
|
||||
├── backend/
|
||||
│ ├── .env # PostgreSQL credentials, photo path
|
||||
│ ├── db.py # PostgreSQL connection factory
|
||||
│ ├── embedder.py # CLIP model wrapper
|
||||
│ ├── index_images.py # One-time indexing script
|
||||
│ └── main.py # FastAPI app (port 8000)
|
||||
└── frontend/
|
||||
└── index.html # Search UI
|
||||
|
||||
oravector-demo/
|
||||
├── backend/
|
||||
│ ├── .env # Oracle credentials, photo path
|
||||
│ ├── db_oracle.py # Oracle connection factory (vectors_user)
|
||||
│ ├── embedder.py # CLIP model wrapper (identical to pgvector)
|
||||
│ ├── index_images_oracle.py # One-time indexing script (Python embedding)
|
||||
│ ├── main_oracle.py # FastAPI app — Python embedding (port 8001)
|
||||
│ └── main_oracle_indb.py # FastAPI app — in-database embedding (port 8002)
|
||||
└── frontend/
|
||||
├── index.html # Search UI (Oracle 26ai, Python embedding)
|
||||
└── index_indb.html # Search UI (Oracle 26ai, in-database embedding)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## System components installed
|
||||
|
||||
### Operating system packages
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---|---|---|
|
||||
| PostgreSQL | 16.13 (Ubuntu) | Relational database |
|
||||
| postgresql-16-pgvector | 0.6.0 | Vector data type and indexes for PostgreSQL |
|
||||
| Python | 3.12.3 | Runtime for all backend code |
|
||||
| Podman | — | Container runtime for Oracle 26ai |
|
||||
|
||||
**PostgreSQL pgvector installation:**
|
||||
```bash
|
||||
sudo apt install postgresql-16-pgvector
|
||||
```
|
||||
|
||||
**pgvector extension activation** (requires superuser, run once per database):
|
||||
```bash
|
||||
sudo -u postgres psql -d vectors_demo -c "CREATE EXTENSION vector;"
|
||||
```
|
||||
|
||||
### Oracle 26ai (Podman container)
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| Product | Oracle AI Database 26ai Free |
|
||||
| Version | 23.26.1.0.0 |
|
||||
| Container name | `oracle.free` |
|
||||
| Host port | 37611 (mapped to 1521 inside container) |
|
||||
| Pluggable Database | FREEPDB1 |
|
||||
| Schema user | `vectors_user` |
|
||||
|
||||
**Oracle vector memory** — the HNSW index is held entirely in the SGA's Vector
|
||||
Memory Area. This must be configured before the database starts:
|
||||
|
||||
```sql
|
||||
-- Connect as SYSDBA to service FREE (CDB root)
|
||||
ALTER SYSTEM SET vector_memory_size = 512M SCOPE=SPFILE;
|
||||
```
|
||||
|
||||
Then restart Oracle inside the container:
|
||||
```bash
|
||||
podman exec oracle.free bash -c "sqlplus -s / as sysdba <<'EOF'
|
||||
SHUTDOWN ABORT;
|
||||
EXIT;
|
||||
EOF"
|
||||
|
||||
podman exec oracle.free bash -c "sqlplus -s / as sysdba <<'EOF'
|
||||
STARTUP;
|
||||
EXIT;
|
||||
EOF"
|
||||
```
|
||||
|
||||
After restart, the SGA confirms: `Vector Memory Area: 536870912 bytes (512 MB)`.
|
||||
|
||||
### Python packages
|
||||
|
||||
| Package | Version | Used by | Purpose |
|
||||
|---|---|---|---|
|
||||
| `sentence-transformers` | 5.3.0 | both | CLIP model loading and inference |
|
||||
| `torch` | 2.11.0 | both | Neural network runtime for CLIP |
|
||||
| `Pillow` | 10.2.0 | both | JPEG loading and colour conversion |
|
||||
| `fastapi` | 0.135.2 | both | REST API framework |
|
||||
| `uvicorn` | 0.42.0 | both | ASGI server |
|
||||
| `python-dotenv` | 1.0.1 | both | `.env` file support |
|
||||
| `psycopg2-binary` | 2.9.11 | pgvector only | PostgreSQL driver |
|
||||
| `oracledb` | 3.4.2 | Oracle only | Oracle driver (thin mode, no client libs needed) |
|
||||
|
||||
**Install all packages:**
|
||||
```bash
|
||||
pip3 install fastapi uvicorn psycopg2-binary oracledb sentence-transformers \
|
||||
Pillow python-dotenv --break-system-packages
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Vectorization
|
||||
|
||||
### Model: CLIP (clip-ViT-B-32)
|
||||
|
||||
CLIP (Contrastive Language–Image Pretraining) is a neural network model developed
|
||||
by OpenAI. It was trained on hundreds of millions of image–text pairs and maps both
|
||||
images and text into the **same 512-dimensional vector space**. This enables
|
||||
searching images by plain-text query without any manual labelling or tagging.
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| Architecture | Vision Transformer ViT-B/32 |
|
||||
| Output dimension | 512 floats |
|
||||
| Similarity metric | Cosine similarity |
|
||||
| Weights source | Hugging Face Hub: `sentence-transformers/clip-ViT-B-32` |
|
||||
| Downloaded to | `~/.cache/huggingface/hub/` on first run |
|
||||
|
||||
**Why cosine similarity?** CLIP vectors have varying magnitudes. Cosine similarity
|
||||
normalises for magnitude and measures only the direction — the angle between two
|
||||
vectors — which reliably captures semantic relatedness regardless of vector scale.
|
||||
|
||||
The `embedder.py` module is identical in both projects. It lazily loads the model
|
||||
on first call and exposes two functions:
|
||||
|
||||
| Function | Input | Output |
|
||||
|---|---|---|
|
||||
| `embed_image(path)` | Filesystem path to a JPEG | `list[float]` — 512 values |
|
||||
| `embed_text(text)` | Plain-text query string | `list[float]` — 512 values |
|
||||
|
||||
At search time, the text query is embedded into the same vector space as the photos.
|
||||
The database then finds the photos whose vectors point in the most similar direction.
|
||||
|
||||
---
|
||||
|
||||
## Database schemas
|
||||
|
||||
### PostgreSQL + pgvector
|
||||
|
||||
```sql
|
||||
-- database: vectors_demo (PostgreSQL 16)
|
||||
CREATE EXTENSION vector; -- pgvector 0.6.0
|
||||
|
||||
CREATE TABLE images (
|
||||
id SERIAL PRIMARY KEY,
|
||||
filename TEXT NOT NULL UNIQUE,
|
||||
filepath TEXT NOT NULL,
|
||||
embedding vector(512) -- pgvector type, 512 dimensions
|
||||
);
|
||||
|
||||
CREATE INDEX images_embedding_idx
|
||||
ON images USING hnsw (embedding vector_cosine_ops);
|
||||
```
|
||||
|
||||
### Oracle 26ai
|
||||
|
||||
```sql
|
||||
-- PDB: FREEPDB1, user: vectors_user
|
||||
|
||||
CREATE TABLE images (
|
||||
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
|
||||
filename VARCHAR2(255) NOT NULL UNIQUE,
|
||||
filepath VARCHAR2(1000) NOT NULL,
|
||||
embedding VECTOR(512, FLOAT32) -- native Oracle type, typed at definition
|
||||
);
|
||||
|
||||
CREATE VECTOR INDEX images_embedding_idx
|
||||
ON images(embedding)
|
||||
ORGANIZATION INMEMORY NEIGHBOR GRAPH -- HNSW (in-memory)
|
||||
WITH DISTANCE COSINE
|
||||
WITH TARGET ACCURACY 95
|
||||
PARAMETERS (type HNSW, neighbors 32, efconstruction 200);
|
||||
```
|
||||
|
||||
**Key schema differences:**
|
||||
|
||||
| Aspect | PostgreSQL/pgvector | Oracle 26ai |
|
||||
|---|---|---|
|
||||
| Extension needed | `CREATE EXTENSION vector` | Built-in, no extension |
|
||||
| Vector column | `vector(512)` — dimension only | `VECTOR(512, FLOAT32)` — dimension + element type |
|
||||
| Primary key | `SERIAL` (auto-increment) | `NUMBER GENERATED ALWAYS AS IDENTITY` |
|
||||
| Text columns | `TEXT` (unlimited) | `VARCHAR2(n)` (length required) |
|
||||
| HNSW syntax | `USING hnsw (col vector_cosine_ops)` | `ORGANIZATION INMEMORY NEIGHBOR GRAPH` |
|
||||
| IVF syntax | `USING ivfflat (col vector_cosine_ops)` | `ORGANIZATION NEIGHBOR PARTITIONS` |
|
||||
| Accuracy target | Implicit (set via index params) | `WITH TARGET ACCURACY 95` (explicit %) |
|
||||
| Memory prereq | None | `vector_memory_size > 0` in SGA |
|
||||
|
||||
---
|
||||
|
||||
## Backend modules
|
||||
|
||||
### Connection factories
|
||||
|
||||
**`db.py` (PostgreSQL):**
|
||||
Reads `DB_HOST`, `DB_PORT`, `DB_NAME`, `DB_USER`, `DB_PASSWORD` from `.env` and
|
||||
returns a `psycopg2` connection.
|
||||
|
||||
**`db_oracle.py` (Oracle):**
|
||||
Reads `ORA_HOST`, `ORA_PORT`, `ORA_SERVICE`, `ORA_USER`, `ORA_PASSWORD` from `.env`
|
||||
and returns an `oracledb` connection. The DSN is assembled as `host:port/service`.
|
||||
Runs in **thin mode** — no Oracle Instant Client installation is required on the host.
|
||||
|
||||
---
|
||||
|
||||
### Indexing scripts
|
||||
|
||||
Both scripts are idempotent: they check for existing rows and skip already-indexed
|
||||
photos. Each photo is committed individually so a crash does not lose prior work.
|
||||
|
||||
| | `index_images.py` | `index_images_oracle.py` |
|
||||
|---|---|---|
|
||||
| Run command | `python3 index_images.py` | `python3 index_images_oracle.py` |
|
||||
| Vector bind | Python `list` passed directly | `array.array("f", embedding)` required |
|
||||
| Bind style | `%s` placeholders (psycopg2) | `:1`, `:2`, `:3` positional (oracledb) |
|
||||
| Runtime (115 photos, CPU) | **26 seconds** | **16 seconds** |
|
||||
|
||||
**Why `array.array` for Oracle?**
|
||||
The `python-oracledb` driver does not accept a plain Python list for a `VECTOR`
|
||||
column. The data must be a Python `array.array` with typecode `"f"` (32-bit float),
|
||||
matching the `FLOAT32` declaration in the Oracle column type.
|
||||
|
||||
---
|
||||
|
||||
### FastAPI applications
|
||||
|
||||
Both apps expose identical endpoints at different ports:
|
||||
|
||||
| Endpoint | Description |
|
||||
|---|---|
|
||||
| `GET /search?q=<text>&limit=<n>` | Embed query, run nearest-neighbour search, return ranked results |
|
||||
| `GET /stats` | Return count of indexed photos |
|
||||
| `GET /photos/<filename>` | Serve original JPEG from the photos directory |
|
||||
|
||||
**Search query comparison:**
|
||||
|
||||
PostgreSQL (`main.py`, port 8000):
|
||||
```sql
|
||||
SELECT filename, 1 - (embedding <=> $1::vector) AS score
|
||||
FROM images
|
||||
ORDER BY embedding <=> $1::vector
|
||||
LIMIT $2
|
||||
```
|
||||
|
||||
Oracle 26ai (`main_oracle.py`, port 8001):
|
||||
```sql
|
||||
SELECT filename,
|
||||
1 - VECTOR_DISTANCE(embedding, :vec, COSINE) AS score
|
||||
FROM images
|
||||
ORDER BY VECTOR_DISTANCE(embedding, :vec, COSINE)
|
||||
FETCH FIRST :lim ROWS ONLY
|
||||
```
|
||||
|
||||
**Key query differences:**
|
||||
|
||||
| Aspect | PostgreSQL/pgvector | Oracle 26ai |
|
||||
|---|---|---|
|
||||
| Distance operator | `<=>` (cosine distance operator) | `VECTOR_DISTANCE(col, val, COSINE)` |
|
||||
| Cast required | `$1::vector` — explicit cast | No cast, column type is enforced |
|
||||
| Top-N clause | `LIMIT n` | `FETCH FIRST n ROWS ONLY` |
|
||||
| Bind style | `$1`, `$2` positional (psycopg2) | `:name` named binds (dict) |
|
||||
| Repeated param | `$1` can appear multiple times | Same `:name` can appear multiple times; positional `:1` cannot be reused |
|
||||
| Score formula | `1 - (embedding <=> val)` | `1 - VECTOR_DISTANCE(...)` |
|
||||
|
||||
In both cases `1 − distance` converts cosine distance (0 = identical) into a
|
||||
similarity score (1.0 = identical), displayed as a percentage in the frontend.
|
||||
|
||||
---
|
||||
|
||||
## Frontend
|
||||
|
||||
Both frontends are identical single HTML files with no build step. Open directly
|
||||
in a browser.
|
||||
|
||||
| | pgvector frontend | Oracle 26ai frontend |
|
||||
|---|---|---|
|
||||
| File | `pgvector-demo/frontend/index.html` | `oravector-demo/frontend/index.html` |
|
||||
| Badge label | pgvector | Oracle 26ai |
|
||||
| API base URL | `http://localhost:8000` | `http://localhost:8001` |
|
||||
|
||||
Features: search box, Enter-key support, suggestion chips (trees, water, people,
|
||||
buildings, sky, street, night, cars), result grid with thumbnails and similarity
|
||||
scores in percent.
|
||||
|
||||
---
|
||||
|
||||
## Running the applications
|
||||
|
||||
**Start PostgreSQL backend** (Python embedding):
|
||||
```bash
|
||||
cd pgvector-demo/backend
|
||||
uvicorn main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
**Start Oracle backend — Python embedding:**
|
||||
```bash
|
||||
cd oravector-demo/backend
|
||||
uvicorn main_oracle:app --host 0.0.0.0 --port 8001
|
||||
```
|
||||
|
||||
**Start Oracle backend — in-database embedding:**
|
||||
```bash
|
||||
cd oravector-demo/backend
|
||||
uvicorn main_oracle_indb:app --host 0.0.0.0 --port 8002
|
||||
```
|
||||
|
||||
Open the matching `frontend/index.html` (ports 8000/8001) or
|
||||
`frontend/index_indb.html` (port 8002) in a browser. All three can run
|
||||
simultaneously.
|
||||
|
||||
**Re-index after adding photos:**
|
||||
```bash
|
||||
# PostgreSQL
|
||||
cd pgvector-demo/backend && python3 index_images.py
|
||||
|
||||
# Oracle (Python embedding)
|
||||
cd oravector-demo/backend && python3 index_images_oracle.py
|
||||
|
||||
# Oracle in-database: re-indexing is done in SQL directly
|
||||
# (the VECTOR schema's FOTO_VEKTOR table is managed by Oracle)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Oracle in-database embedding
|
||||
|
||||
The `VECTOR` schema, its ONNX models, and the `FOTO_VEKTOR` table were manually
|
||||
set up by the administrator — they are **not** part of a standard Oracle 26ai
|
||||
installation. The setup involved:
|
||||
|
||||
1. Creating a `VECTOR` database user
|
||||
2. Exporting CLIP (ViT-B/32) to ONNX format and loading the models via
|
||||
`DBMS_VECTOR.LOAD_ONNX_MODEL`
|
||||
3. Creating and populating the `FOTO_VEKTOR` table with images and their vectors
|
||||
|
||||
The resulting models and table are:
|
||||
|
||||
| Object | Type | Input | Output | Purpose |
|
||||
|---|---|---|---|---|
|
||||
| `VECTOR.CLIP_TXT` | ONNX model | `VARCHAR2` text | `VECTOR(512)` | Embed text queries |
|
||||
| `VECTOR.CLIP_IMG` | ONNX model | `BLOB` image | `VECTOR(512)` | Embed image data |
|
||||
| `VECTOR.FOTO_VEKTOR` | Table | — | — | Stores filenames, image BLOBs, and vectors |
|
||||
|
||||
These are called with the `VECTOR_EMBEDDING()` SQL function. The table
|
||||
`VECTOR.FOTO_VEKTOR` stores images as BLOBs alongside their CLIP_IMG-computed
|
||||
embeddings.
|
||||
|
||||
**The complete in-database search query:**
|
||||
```sql
|
||||
SELECT filename,
|
||||
1 - VECTOR_DISTANCE(
|
||||
foto_vek,
|
||||
VECTOR_EMBEDDING(CLIP_TXT USING :q AS data),
|
||||
COSINE
|
||||
) AS score
|
||||
FROM VECTOR.FOTO_VEKTOR
|
||||
ORDER BY VECTOR_DISTANCE(
|
||||
foto_vek,
|
||||
VECTOR_EMBEDDING(CLIP_TXT USING :q AS data),
|
||||
COSINE
|
||||
)
|
||||
FETCH FIRST 12 ROWS ONLY
|
||||
```
|
||||
|
||||
The Python FastAPI backend (`main_oracle_indb.py`) passes only the raw text string
|
||||
to Oracle via a bind variable `:q`. Oracle tokenizes the text, runs the CLIP_TXT
|
||||
ONNX model internally, produces the 512-dim vector, and performs the similarity
|
||||
search — all within one SQL statement. No Python ML library is involved at
|
||||
query time.
|
||||
|
||||
**Why Oracle can ship CLIP as an in-database ONNX model:**
|
||||
Oracle's `DBMS_VECTOR.LOAD_ONNX_MODEL` requires the model's ONNX graph to use
|
||||
`input_ids` in a single `Gather` node (embedding lookup only). CLIP's standard
|
||||
export uses `input_ids` additionally in `ArgMax` for EOS-token pooling, which
|
||||
Oracle's validator rejects. The manually loaded CLIP_TXT model in the `VECTOR`
|
||||
schema uses CLS-token pooling (position 0) instead, which produces a simpler
|
||||
graph that Oracle accepts. The
|
||||
cosine similarity between EOS-pooling and CLS-pooling variants is ~0.70.
|
||||
|
||||
---
|
||||
|
||||
## Performance comparison
|
||||
|
||||
Measured on this installation (CPU only, no GPU):
|
||||
|
||||
| Metric | PostgreSQL + pgvector | Oracle 26ai (Python embed) | Oracle 26ai (in-DB embed) |
|
||||
|---|---|---|---|
|
||||
| Photos indexed | 115 | 115 | 116 (manually indexed) |
|
||||
| Indexing time | 26 seconds | 16 seconds | 0 (indexed separately by admin) |
|
||||
| Index type | HNSW (on disk) | HNSW (in-memory) | Full table scan (116 rows) |
|
||||
| Memory required | None | 512 MB SGA | 512 MB SGA |
|
||||
| Python CLIP at query time | Yes | Yes | **No** |
|
||||
| Embedding location | Python process | Python process | Inside Oracle SQL |
|
||||
| `VECTOR_EMBEDDING()` used | No | No | **Yes** |
|
||||
|
||||
Note: indexing time for backends 1 and 2 is dominated by CLIP inference (CPU),
|
||||
not database write speed. The in-database backend uses the manually loaded CLIP
|
||||
models in the `VECTOR` schema; their indexing time is not measured here as it
|
||||
was performed separately by the administrator.
|
||||
BIN
Binary file not shown.
@@ -0,0 +1,19 @@
|
||||
import os
|
||||
import oracledb
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
def get_connection():
|
||||
return oracledb.connect(
|
||||
user=os.getenv("ORA_USER"),
|
||||
password=os.getenv("ORA_PASSWORD"),
|
||||
dsn=f"{os.getenv('ORA_HOST')}:{os.getenv('ORA_PORT')}/{os.getenv('ORA_SERVICE')}",
|
||||
)
|
||||
|
||||
def get_connection_indb():
|
||||
return oracledb.connect(
|
||||
user=os.getenv("ORA_USER_INDB"),
|
||||
password=os.getenv("ORA_PASSWORD_INDB"),
|
||||
dsn=f"{os.getenv('ORA_HOST')}:{os.getenv('ORA_PORT')}/{os.getenv('ORA_SERVICE')}",
|
||||
)
|
||||
@@ -0,0 +1,17 @@
|
||||
from sentence_transformers import SentenceTransformer
|
||||
from PIL import Image
|
||||
|
||||
_model = None
|
||||
|
||||
def _get_model():
|
||||
global _model
|
||||
if _model is None:
|
||||
_model = SentenceTransformer("clip-ViT-B-32")
|
||||
return _model
|
||||
|
||||
def embed_image(path: str) -> list[float]:
|
||||
img = Image.open(path).convert("RGB")
|
||||
return _get_model().encode(img).tolist()
|
||||
|
||||
def embed_text(text: str) -> list[float]:
|
||||
return _get_model().encode(text).tolist()
|
||||
@@ -0,0 +1,66 @@
|
||||
import os
|
||||
import array
|
||||
from dotenv import load_dotenv
|
||||
from db_oracle import get_connection
|
||||
from embedder import embed_image
|
||||
|
||||
load_dotenv()
|
||||
|
||||
PHOTOS_DIR = os.getenv("PHOTOS_DIR")
|
||||
|
||||
CREATE_TABLE = """
|
||||
CREATE TABLE images (
|
||||
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
|
||||
filename VARCHAR2(255) NOT NULL UNIQUE,
|
||||
filepath VARCHAR2(1000) NOT NULL,
|
||||
embedding VECTOR(512, FLOAT32)
|
||||
)
|
||||
"""
|
||||
|
||||
CREATE_INDEX = """
|
||||
CREATE VECTOR INDEX images_embedding_idx
|
||||
ON images(embedding)
|
||||
ORGANIZATION INMEMORY NEIGHBOR GRAPH
|
||||
WITH DISTANCE COSINE
|
||||
WITH TARGET ACCURACY 95
|
||||
PARAMETERS (type HNSW, neighbors 32, efconstruction 200)
|
||||
"""
|
||||
|
||||
INSERT = "INSERT INTO images (filename, filepath, embedding) VALUES (:1, :2, :3)"
|
||||
|
||||
def table_exists(cur):
|
||||
cur.execute("SELECT COUNT(*) FROM user_tables WHERE table_name = 'IMAGES'")
|
||||
return cur.fetchone()[0] > 0
|
||||
|
||||
def main():
|
||||
conn = get_connection()
|
||||
cur = conn.cursor()
|
||||
|
||||
if not table_exists(cur):
|
||||
cur.execute(CREATE_TABLE)
|
||||
cur.execute(CREATE_INDEX)
|
||||
conn.commit()
|
||||
print("Table and index created.")
|
||||
else:
|
||||
print("Table already exists, skipping creation.")
|
||||
|
||||
files = [f for f in os.listdir(PHOTOS_DIR) if f.lower().endswith((".jpg", ".jpeg"))]
|
||||
print(f"Found {len(files)} photos in {PHOTOS_DIR}")
|
||||
|
||||
for i, filename in enumerate(files, 1):
|
||||
filepath = os.path.join(PHOTOS_DIR, filename)
|
||||
cur.execute("SELECT 1 FROM images WHERE filename = :1", (filename,))
|
||||
if cur.fetchone():
|
||||
print(f"[{i}/{len(files)}] Skipping {filename} (already indexed)")
|
||||
continue
|
||||
embedding = array.array("f", embed_image(filepath))
|
||||
cur.execute(INSERT, (filename, filepath, embedding))
|
||||
conn.commit()
|
||||
print(f"[{i}/{len(files)}] Indexed {filename}")
|
||||
|
||||
cur.close()
|
||||
conn.close()
|
||||
print("Done.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,49 @@
|
||||
import os
|
||||
import array
|
||||
from fastapi import FastAPI, Query
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import FileResponse
|
||||
from dotenv import load_dotenv
|
||||
from db_oracle import get_connection
|
||||
from embedder import embed_text
|
||||
|
||||
load_dotenv()
|
||||
|
||||
PHOTOS_DIR = os.getenv("PHOTOS_DIR")
|
||||
|
||||
app = FastAPI()
|
||||
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
|
||||
|
||||
@app.get("/search")
|
||||
def search(q: str = Query(...), limit: int = Query(12)):
|
||||
vec = array.array("f", embed_text(q))
|
||||
conn = get_connection()
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT filename, 1 - VECTOR_DISTANCE(embedding, :vec, COSINE) AS score
|
||||
FROM images
|
||||
ORDER BY VECTOR_DISTANCE(embedding, :vec, COSINE)
|
||||
FETCH FIRST :lim ROWS ONLY
|
||||
""",
|
||||
{"vec": vec, "lim": limit},
|
||||
)
|
||||
rows = cur.fetchall()
|
||||
cur.close()
|
||||
conn.close()
|
||||
return [{"filename": r[0], "score": round(r[1], 4)} for r in rows]
|
||||
|
||||
@app.get("/stats")
|
||||
def stats():
|
||||
conn = get_connection()
|
||||
cur = conn.cursor()
|
||||
cur.execute("SELECT COUNT(*) FROM images")
|
||||
count = cur.fetchone()[0]
|
||||
cur.close()
|
||||
conn.close()
|
||||
return {"count": count}
|
||||
|
||||
@app.get("/photos/{filename}")
|
||||
def get_photo(filename: str):
|
||||
path = os.path.join(PHOTOS_DIR, filename)
|
||||
return FileResponse(path, media_type="image/jpeg")
|
||||
@@ -0,0 +1,55 @@
|
||||
import os
|
||||
from fastapi import FastAPI, Query
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import FileResponse
|
||||
from dotenv import load_dotenv
|
||||
from db_oracle import get_connection_indb
|
||||
|
||||
load_dotenv()
|
||||
|
||||
PHOTOS_DIR = os.getenv("PHOTOS_DIR")
|
||||
|
||||
app = FastAPI()
|
||||
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
|
||||
|
||||
@app.get("/search")
|
||||
def search(q: str = Query(...), limit: int = Query(12)):
|
||||
conn = get_connection_indb()
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT filename,
|
||||
1 - VECTOR_DISTANCE(
|
||||
foto_vek,
|
||||
VECTOR_EMBEDDING(CLIP_TXT USING :q AS data),
|
||||
COSINE
|
||||
) AS score
|
||||
FROM VECTOR.FOTO_VEKTOR
|
||||
ORDER BY VECTOR_DISTANCE(
|
||||
foto_vek,
|
||||
VECTOR_EMBEDDING(CLIP_TXT USING :q AS data),
|
||||
COSINE
|
||||
)
|
||||
FETCH FIRST :lim ROWS ONLY
|
||||
""",
|
||||
{"q": q, "lim": limit},
|
||||
)
|
||||
rows = cur.fetchall()
|
||||
cur.close()
|
||||
conn.close()
|
||||
return [{"filename": r[0], "score": round(r[1], 4)} for r in rows]
|
||||
|
||||
@app.get("/stats")
|
||||
def stats():
|
||||
conn = get_connection_indb()
|
||||
cur = conn.cursor()
|
||||
cur.execute("SELECT COUNT(*) FROM VECTOR.FOTO_VEKTOR")
|
||||
count = cur.fetchone()[0]
|
||||
cur.close()
|
||||
conn.close()
|
||||
return {"count": count}
|
||||
|
||||
@app.get("/photos/{filename}")
|
||||
def get_photo(filename: str):
|
||||
path = os.path.join(PHOTOS_DIR, filename)
|
||||
return FileResponse(path, media_type="image/jpeg")
|
||||
@@ -0,0 +1,179 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Vector Image Search — Oracle 26ai</title>
|
||||
<style>
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: system-ui, sans-serif; background: #f5f5f5; color: #222; }
|
||||
|
||||
header {
|
||||
background: #c74634;
|
||||
color: white;
|
||||
padding: 1.2rem 2rem;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 1rem;
|
||||
}
|
||||
header h1 { font-size: 1.4rem; font-weight: 600; }
|
||||
.badge {
|
||||
background: white;
|
||||
color: #c74634;
|
||||
font-size: 0.75rem;
|
||||
font-weight: 700;
|
||||
padding: 0.2rem 0.6rem;
|
||||
border-radius: 999px;
|
||||
}
|
||||
|
||||
.search-area {
|
||||
max-width: 700px;
|
||||
margin: 2rem auto 1rem;
|
||||
padding: 0 1rem;
|
||||
}
|
||||
.search-row {
|
||||
display: flex;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
input[type="text"] {
|
||||
flex: 1;
|
||||
padding: 0.7rem 1rem;
|
||||
font-size: 1rem;
|
||||
border: 1px solid #ccc;
|
||||
border-radius: 6px;
|
||||
}
|
||||
button.search-btn {
|
||||
padding: 0.7rem 1.4rem;
|
||||
background: #c74634;
|
||||
color: white;
|
||||
border: none;
|
||||
border-radius: 6px;
|
||||
font-size: 1rem;
|
||||
cursor: pointer;
|
||||
}
|
||||
button.search-btn:hover { background: #a83929; }
|
||||
|
||||
.chips {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.4rem;
|
||||
margin-top: 0.8rem;
|
||||
}
|
||||
.chip {
|
||||
padding: 0.3rem 0.8rem;
|
||||
background: white;
|
||||
border: 1px solid #ccc;
|
||||
border-radius: 999px;
|
||||
font-size: 0.85rem;
|
||||
cursor: pointer;
|
||||
}
|
||||
.chip:hover { background: #fcecea; border-color: #c74634; }
|
||||
|
||||
.stats { text-align: center; color: #666; font-size: 0.85rem; margin-bottom: 1rem; }
|
||||
|
||||
.grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
|
||||
gap: 1rem;
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
padding: 0 1rem 2rem;
|
||||
}
|
||||
.card {
|
||||
background: white;
|
||||
border-radius: 8px;
|
||||
overflow: hidden;
|
||||
box-shadow: 0 1px 4px rgba(0,0,0,0.1);
|
||||
}
|
||||
.card img {
|
||||
width: 100%;
|
||||
height: 140px;
|
||||
object-fit: cover;
|
||||
display: block;
|
||||
}
|
||||
.card-info {
|
||||
padding: 0.5rem 0.7rem;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
.card-info .score {
|
||||
font-weight: 700;
|
||||
color: #c74634;
|
||||
}
|
||||
.card-info .name {
|
||||
color: #555;
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
.empty { text-align: center; color: #999; margin-top: 3rem; font-size: 1rem; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>Vector Image Search</h1>
|
||||
<span class="badge">Oracle 26ai</span>
|
||||
</header>
|
||||
|
||||
<div class="search-area">
|
||||
<div class="search-row">
|
||||
<input id="query" type="text" placeholder="Search photos, e.g. trees, water, night…" />
|
||||
<button class="search-btn" onclick="doSearch()">Search</button>
|
||||
</div>
|
||||
<div class="chips">
|
||||
<span class="chip" onclick="setQuery('trees')">trees</span>
|
||||
<span class="chip" onclick="setQuery('water')">water</span>
|
||||
<span class="chip" onclick="setQuery('people')">people</span>
|
||||
<span class="chip" onclick="setQuery('buildings')">buildings</span>
|
||||
<span class="chip" onclick="setQuery('sky')">sky</span>
|
||||
<span class="chip" onclick="setQuery('street')">street</span>
|
||||
<span class="chip" onclick="setQuery('night')">night</span>
|
||||
<span class="chip" onclick="setQuery('cars')">cars</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p class="stats" id="stats"></p>
|
||||
<div class="grid" id="grid"><p class="empty">Enter a search term above.</p></div>
|
||||
|
||||
<script>
|
||||
const API = "http://localhost:8001";
|
||||
|
||||
fetch(`${API}/stats`)
|
||||
.then(r => r.json())
|
||||
.then(d => document.getElementById("stats").textContent = `${d.count} photos indexed`);
|
||||
|
||||
document.getElementById("query").addEventListener("keydown", e => {
|
||||
if (e.key === "Enter") doSearch();
|
||||
});
|
||||
|
||||
function setQuery(text) {
|
||||
document.getElementById("query").value = text;
|
||||
doSearch();
|
||||
}
|
||||
|
||||
function doSearch() {
|
||||
const q = document.getElementById("query").value.trim();
|
||||
if (!q) return;
|
||||
fetch(`${API}/search?q=${encodeURIComponent(q)}&limit=12`)
|
||||
.then(r => r.json())
|
||||
.then(renderResults);
|
||||
}
|
||||
|
||||
function renderResults(results) {
|
||||
const grid = document.getElementById("grid");
|
||||
if (!results.length) {
|
||||
grid.innerHTML = '<p class="empty">No results found.</p>';
|
||||
return;
|
||||
}
|
||||
grid.innerHTML = results.map(r => `
|
||||
<div class="card">
|
||||
<img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy" />
|
||||
<div class="card-info">
|
||||
<div class="score">${(r.score * 100).toFixed(1)}% match</div>
|
||||
<div class="name">${r.filename}</div>
|
||||
</div>
|
||||
</div>
|
||||
`).join("");
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,179 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Vector Image Search — Oracle In-DB</title>
|
||||
<style>
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: system-ui, sans-serif; background: #f5f5f5; color: #222; }
|
||||
|
||||
header {
|
||||
background: #7b5ea7;
|
||||
color: white;
|
||||
padding: 1.2rem 2rem;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 1rem;
|
||||
}
|
||||
header h1 { font-size: 1.4rem; font-weight: 600; }
|
||||
.badge {
|
||||
background: white;
|
||||
color: #7b5ea7;
|
||||
font-size: 0.75rem;
|
||||
font-weight: 700;
|
||||
padding: 0.2rem 0.6rem;
|
||||
border-radius: 999px;
|
||||
}
|
||||
|
||||
.search-area {
|
||||
max-width: 700px;
|
||||
margin: 2rem auto 1rem;
|
||||
padding: 0 1rem;
|
||||
}
|
||||
.search-row {
|
||||
display: flex;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
input[type="text"] {
|
||||
flex: 1;
|
||||
padding: 0.7rem 1rem;
|
||||
font-size: 1rem;
|
||||
border: 1px solid #ccc;
|
||||
border-radius: 6px;
|
||||
}
|
||||
button.search-btn {
|
||||
padding: 0.7rem 1.4rem;
|
||||
background: #7b5ea7;
|
||||
color: white;
|
||||
border: none;
|
||||
border-radius: 6px;
|
||||
font-size: 1rem;
|
||||
cursor: pointer;
|
||||
}
|
||||
button.search-btn:hover { background: #664e8d; }
|
||||
|
||||
.chips {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.4rem;
|
||||
margin-top: 0.8rem;
|
||||
}
|
||||
.chip {
|
||||
padding: 0.3rem 0.8rem;
|
||||
background: white;
|
||||
border: 1px solid #ccc;
|
||||
border-radius: 999px;
|
||||
font-size: 0.85rem;
|
||||
cursor: pointer;
|
||||
}
|
||||
.chip:hover { background: #f3f0f8; border-color: #7b5ea7; }
|
||||
|
||||
.stats { text-align: center; color: #666; font-size: 0.85rem; margin-bottom: 1rem; }
|
||||
|
||||
.grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
|
||||
gap: 1rem;
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
padding: 0 1rem 2rem;
|
||||
}
|
||||
.card {
|
||||
background: white;
|
||||
border-radius: 8px;
|
||||
overflow: hidden;
|
||||
box-shadow: 0 1px 4px rgba(0,0,0,0.1);
|
||||
}
|
||||
.card img {
|
||||
width: 100%;
|
||||
height: 140px;
|
||||
object-fit: cover;
|
||||
display: block;
|
||||
}
|
||||
.card-info {
|
||||
padding: 0.5rem 0.7rem;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
.card-info .score {
|
||||
font-weight: 700;
|
||||
color: #7b5ea7;
|
||||
}
|
||||
.card-info .name {
|
||||
color: #555;
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
.empty { text-align: center; color: #999; margin-top: 3rem; font-size: 1rem; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>Vector Image Search</h1>
|
||||
<span class="badge">Oracle In-DB</span>
|
||||
</header>
|
||||
|
||||
<div class="search-area">
|
||||
<div class="search-row">
|
||||
<input id="query" type="text" placeholder="Search photos, e.g. trees, water, night…" />
|
||||
<button class="search-btn" onclick="doSearch()">Search</button>
|
||||
</div>
|
||||
<div class="chips">
|
||||
<span class="chip" onclick="setQuery('trees')">trees</span>
|
||||
<span class="chip" onclick="setQuery('water')">water</span>
|
||||
<span class="chip" onclick="setQuery('people')">people</span>
|
||||
<span class="chip" onclick="setQuery('buildings')">buildings</span>
|
||||
<span class="chip" onclick="setQuery('sky')">sky</span>
|
||||
<span class="chip" onclick="setQuery('street')">street</span>
|
||||
<span class="chip" onclick="setQuery('night')">night</span>
|
||||
<span class="chip" onclick="setQuery('cars')">cars</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p class="stats" id="stats"></p>
|
||||
<div class="grid" id="grid"><p class="empty">Enter a search term above.</p></div>
|
||||
|
||||
<script>
|
||||
const API = "http://localhost:8002";
|
||||
|
||||
fetch(`${API}/stats`)
|
||||
.then(r => r.json())
|
||||
.then(d => document.getElementById("stats").textContent = `${d.count} photos indexed`);
|
||||
|
||||
document.getElementById("query").addEventListener("keydown", e => {
|
||||
if (e.key === "Enter") doSearch();
|
||||
});
|
||||
|
||||
function setQuery(text) {
|
||||
document.getElementById("query").value = text;
|
||||
doSearch();
|
||||
}
|
||||
|
||||
function doSearch() {
|
||||
const q = document.getElementById("query").value.trim();
|
||||
if (!q) return;
|
||||
fetch(`${API}/search?q=${encodeURIComponent(q)}&limit=12`)
|
||||
.then(r => r.json())
|
||||
.then(renderResults);
|
||||
}
|
||||
|
||||
function renderResults(results) {
|
||||
const grid = document.getElementById("grid");
|
||||
if (!results.length) {
|
||||
grid.innerHTML = '<p class="empty">No results found.</p>';
|
||||
return;
|
||||
}
|
||||
grid.innerHTML = results.map(r => `
|
||||
<div class="card">
|
||||
<img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy" />
|
||||
<div class="card-info">
|
||||
<div class="score">${(r.score * 100).toFixed(1)}% match</div>
|
||||
<div class="name">${r.filename}</div>
|
||||
</div>
|
||||
</div>
|
||||
`).join("");
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,14 @@
|
||||
import os
|
||||
import psycopg2
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
def get_connection():
|
||||
return psycopg2.connect(
|
||||
host=os.getenv("DB_HOST"),
|
||||
port=os.getenv("DB_PORT"),
|
||||
dbname=os.getenv("DB_NAME"),
|
||||
user=os.getenv("DB_USER"),
|
||||
password=os.getenv("DB_PASSWORD"),
|
||||
)
|
||||
@@ -0,0 +1,17 @@
|
||||
from sentence_transformers import SentenceTransformer
|
||||
from PIL import Image
|
||||
|
||||
_model = None
|
||||
|
||||
def _get_model():
|
||||
global _model
|
||||
if _model is None:
|
||||
_model = SentenceTransformer("clip-ViT-B-32")
|
||||
return _model
|
||||
|
||||
def embed_image(path: str) -> list[float]:
|
||||
img = Image.open(path).convert("RGB")
|
||||
return _get_model().encode(img).tolist()
|
||||
|
||||
def embed_text(text: str) -> list[float]:
|
||||
return _get_model().encode(text).tolist()
|
||||
@@ -0,0 +1,56 @@
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
from db import get_connection
|
||||
from embedder import embed_image
|
||||
|
||||
load_dotenv()
|
||||
|
||||
PHOTOS_DIR = os.getenv("PHOTOS_DIR")
|
||||
|
||||
CREATE_TABLE = """
|
||||
CREATE TABLE IF NOT EXISTS images (
|
||||
id SERIAL PRIMARY KEY,
|
||||
filename TEXT NOT NULL UNIQUE,
|
||||
filepath TEXT NOT NULL,
|
||||
embedding vector(512)
|
||||
);
|
||||
"""
|
||||
|
||||
CREATE_INDEX = """
|
||||
CREATE INDEX IF NOT EXISTS images_embedding_idx
|
||||
ON images USING hnsw (embedding vector_cosine_ops);
|
||||
"""
|
||||
|
||||
INSERT = """
|
||||
INSERT INTO images (filename, filepath, embedding)
|
||||
VALUES (%s, %s, %s)
|
||||
ON CONFLICT (filename) DO NOTHING;
|
||||
"""
|
||||
|
||||
def main():
|
||||
conn = get_connection()
|
||||
cur = conn.cursor()
|
||||
cur.execute(CREATE_TABLE)
|
||||
cur.execute(CREATE_INDEX)
|
||||
conn.commit()
|
||||
|
||||
files = [f for f in os.listdir(PHOTOS_DIR) if f.lower().endswith((".jpg", ".jpeg"))]
|
||||
print(f"Found {len(files)} photos in {PHOTOS_DIR}")
|
||||
|
||||
for i, filename in enumerate(files, 1):
|
||||
filepath = os.path.join(PHOTOS_DIR, filename)
|
||||
cur.execute("SELECT 1 FROM images WHERE filename = %s", (filename,))
|
||||
if cur.fetchone():
|
||||
print(f"[{i}/{len(files)}] Skipping {filename} (already indexed)")
|
||||
continue
|
||||
embedding = embed_image(filepath)
|
||||
cur.execute(INSERT, (filename, filepath, embedding))
|
||||
conn.commit()
|
||||
print(f"[{i}/{len(files)}] Indexed {filename}")
|
||||
|
||||
cur.close()
|
||||
conn.close()
|
||||
print("Done.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,48 @@
|
||||
import os
|
||||
from fastapi import FastAPI, Query
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import FileResponse
|
||||
from dotenv import load_dotenv
|
||||
from db import get_connection
|
||||
from embedder import embed_text
|
||||
|
||||
load_dotenv()
|
||||
|
||||
PHOTOS_DIR = os.getenv("PHOTOS_DIR")
|
||||
|
||||
app = FastAPI()
|
||||
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
|
||||
|
||||
@app.get("/search")
|
||||
def search(q: str = Query(...), limit: int = Query(12)):
|
||||
vec = embed_text(q)
|
||||
conn = get_connection()
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT filename, 1 - (embedding <=> %s::vector) AS score
|
||||
FROM images
|
||||
ORDER BY embedding <=> %s::vector
|
||||
LIMIT %s
|
||||
""",
|
||||
(vec, vec, limit),
|
||||
)
|
||||
rows = cur.fetchall()
|
||||
cur.close()
|
||||
conn.close()
|
||||
return [{"filename": r[0], "score": round(r[1], 4)} for r in rows]
|
||||
|
||||
@app.get("/stats")
|
||||
def stats():
|
||||
conn = get_connection()
|
||||
cur = conn.cursor()
|
||||
cur.execute("SELECT COUNT(*) FROM images")
|
||||
count = cur.fetchone()[0]
|
||||
cur.close()
|
||||
conn.close()
|
||||
return {"count": count}
|
||||
|
||||
@app.get("/photos/{filename}")
|
||||
def get_photo(filename: str):
|
||||
path = os.path.join(PHOTOS_DIR, filename)
|
||||
return FileResponse(path, media_type="image/jpeg")
|
||||
@@ -0,0 +1,179 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Vector Image Search — pgvector</title>
|
||||
<style>
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: system-ui, sans-serif; background: #f5f5f5; color: #222; }
|
||||
|
||||
header {
|
||||
background: #1a56db;
|
||||
color: white;
|
||||
padding: 1.2rem 2rem;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 1rem;
|
||||
}
|
||||
header h1 { font-size: 1.4rem; font-weight: 600; }
|
||||
.badge {
|
||||
background: white;
|
||||
color: #1a56db;
|
||||
font-size: 0.75rem;
|
||||
font-weight: 700;
|
||||
padding: 0.2rem 0.6rem;
|
||||
border-radius: 999px;
|
||||
}
|
||||
|
||||
.search-area {
|
||||
max-width: 700px;
|
||||
margin: 2rem auto 1rem;
|
||||
padding: 0 1rem;
|
||||
}
|
||||
.search-row {
|
||||
display: flex;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
input[type="text"] {
|
||||
flex: 1;
|
||||
padding: 0.7rem 1rem;
|
||||
font-size: 1rem;
|
||||
border: 1px solid #ccc;
|
||||
border-radius: 6px;
|
||||
}
|
||||
button.search-btn {
|
||||
padding: 0.7rem 1.4rem;
|
||||
background: #1a56db;
|
||||
color: white;
|
||||
border: none;
|
||||
border-radius: 6px;
|
||||
font-size: 1rem;
|
||||
cursor: pointer;
|
||||
}
|
||||
button.search-btn:hover { background: #1648c0; }
|
||||
|
||||
.chips {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.4rem;
|
||||
margin-top: 0.8rem;
|
||||
}
|
||||
.chip {
|
||||
padding: 0.3rem 0.8rem;
|
||||
background: white;
|
||||
border: 1px solid #ccc;
|
||||
border-radius: 999px;
|
||||
font-size: 0.85rem;
|
||||
cursor: pointer;
|
||||
}
|
||||
.chip:hover { background: #e8eef9; border-color: #1a56db; }
|
||||
|
||||
.stats { text-align: center; color: #666; font-size: 0.85rem; margin-bottom: 1rem; }
|
||||
|
||||
.grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
|
||||
gap: 1rem;
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
padding: 0 1rem 2rem;
|
||||
}
|
||||
.card {
|
||||
background: white;
|
||||
border-radius: 8px;
|
||||
overflow: hidden;
|
||||
box-shadow: 0 1px 4px rgba(0,0,0,0.1);
|
||||
}
|
||||
.card img {
|
||||
width: 100%;
|
||||
height: 140px;
|
||||
object-fit: cover;
|
||||
display: block;
|
||||
}
|
||||
.card-info {
|
||||
padding: 0.5rem 0.7rem;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
.card-info .score {
|
||||
font-weight: 700;
|
||||
color: #1a56db;
|
||||
}
|
||||
.card-info .name {
|
||||
color: #555;
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
.empty { text-align: center; color: #999; margin-top: 3rem; font-size: 1rem; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>Vector Image Search</h1>
|
||||
<span class="badge">pgvector</span>
|
||||
</header>
|
||||
|
||||
<div class="search-area">
|
||||
<div class="search-row">
|
||||
<input id="query" type="text" placeholder="Search photos, e.g. trees, water, night…" />
|
||||
<button class="search-btn" onclick="doSearch()">Search</button>
|
||||
</div>
|
||||
<div class="chips">
|
||||
<span class="chip" onclick="setQuery('trees')">trees</span>
|
||||
<span class="chip" onclick="setQuery('water')">water</span>
|
||||
<span class="chip" onclick="setQuery('people')">people</span>
|
||||
<span class="chip" onclick="setQuery('buildings')">buildings</span>
|
||||
<span class="chip" onclick="setQuery('sky')">sky</span>
|
||||
<span class="chip" onclick="setQuery('street')">street</span>
|
||||
<span class="chip" onclick="setQuery('night')">night</span>
|
||||
<span class="chip" onclick="setQuery('cars')">cars</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p class="stats" id="stats"></p>
|
||||
<div class="grid" id="grid"><p class="empty">Enter a search term above.</p></div>
|
||||
|
||||
<script>
|
||||
const API = "http://localhost:8000";
|
||||
|
||||
fetch(`${API}/stats`)
|
||||
.then(r => r.json())
|
||||
.then(d => document.getElementById("stats").textContent = `${d.count} photos indexed`);
|
||||
|
||||
document.getElementById("query").addEventListener("keydown", e => {
|
||||
if (e.key === "Enter") doSearch();
|
||||
});
|
||||
|
||||
function setQuery(text) {
|
||||
document.getElementById("query").value = text;
|
||||
doSearch();
|
||||
}
|
||||
|
||||
function doSearch() {
|
||||
const q = document.getElementById("query").value.trim();
|
||||
if (!q) return;
|
||||
fetch(`${API}/search?q=${encodeURIComponent(q)}&limit=12`)
|
||||
.then(r => r.json())
|
||||
.then(renderResults);
|
||||
}
|
||||
|
||||
function renderResults(results) {
|
||||
const grid = document.getElementById("grid");
|
||||
if (!results.length) {
|
||||
grid.innerHTML = '<p class="empty">No results found.</p>';
|
||||
return;
|
||||
}
|
||||
grid.innerHTML = results.map(r => `
|
||||
<div class="card">
|
||||
<img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy" />
|
||||
<div class="card-info">
|
||||
<div class="score">${(r.score * 100).toFixed(1)}% match</div>
|
||||
<div class="name">${r.filename}</div>
|
||||
</div>
|
||||
</div>
|
||||
`).join("");
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user