Update README with all recent changes

- Project structure: add index_images_indb.py - Architecture: fix schema names (VECTORS_USER/VECTOR), HNSW for all three - Database schemas: separate sections for VECTORS_USER and VECTOR, photo storage differences - Indexing scripts: three-way comparison table, measured avg times (12.1s/12.1s/13.6s) - ORA-24816 workaround documented - Performance comparison: real benchmark numbers, HNSW for in-DB, photo storage row - Oracle in-DB section: HNSW index creation, index_images_indb.py for population - Re-index section: add index_images_indb.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add in-DB indexing script, benchmark results, schema names in presentation
2026-05-20 11:17:27 +02:00 · 2026-05-20 10:42:13 +02:00 · 2026-05-19 15:19:37 +02:00 · 2026-05-19 15:15:52 +02:00 · 2026-05-19 14:56:28 +02:00 · 2026-05-19 14:39:40 +02:00
16 changed files with 1185 additions and 223 deletions
@@ -1,3 +1,7 @@
 .env
 __pycache__/
 photos/
 .~lock.*
 present.sh
 benchmark.sh
 diagrams/
@@ -40,8 +40,8 @@ ML library is loaded or called at search time.
  │  PostgreSQL 18       │    │  Oracle 26ai         │    │  Oracle 26ai          │
  │  + pgvector 0.8.2    │    │  (version 23.26.1)   │    │  (version 23.26.1)    │
  │  database:           │    │  PDB: FREEPDB1       │    │  PDB: FREEPDB1        │
-  │  vectors_demo        │    │  user: vectors_user  │    │  schema: VECTOR       │
+  │  vectors_demo        │    │  schema: VECTORS_USER│    │  schema: VECTOR       │
-  │  HNSW index          │    │  HNSW index          │    │  HNSW not needed      │
+  │  HNSW index          │    │  HNSW index          │    │  HNSW index           │
  └────────┬─────────────┘    └──────────┬───────────┘    └──────────┬────────────┘
           │                             │                            │
           ▼                             ▼                            │
@@ -88,7 +88,8 @@ vector-search-demo/
    │   ├── .env                     # Oracle credentials, photo path
    │   ├── db_oracle.py             # Oracle connection factory
    │   ├── embedder.py              # CLIP model wrapper (identical to pgvector)
-    │   ├── index_images_oracle.py   # One-time indexing script (Python embedding)
+    │   ├── index_images_oracle.py   # One-time indexing script (Python embedding, VECTORS_USER)
    │   ├── index_images_indb.py     # One-time indexing script (in-DB embedding, VECTOR schema)
    │   ├── main_oracle.py           # FastAPI app — Python embedding (port 8001)
    │   └── main_oracle_indb.py      # FastAPI app — in-database embedding (port 8002)
    └── frontend/
@@ -130,7 +131,7 @@ The `pgvector/pgvector:pg18` image includes pgvector pre-installed. See the
 | Container name | `oracle.free` |
 | Host port | 37611 (mapped to 1521 inside container) |
 | Pluggable Database | FREEPDB1 |
-| Schema users | `vectors_user`, `VECTOR` |
+| Schema users | `VECTORS_USER`, `VECTOR` |
 **Oracle vector memory** — the HNSW index is held entirely in the SGA's Vector
 Memory Area. This is already configured:
@@ -215,10 +216,11 @@ CREATE INDEX images_embedding_idx
    ON images USING hnsw (embedding vector_cosine_ops);
 ```
-### Oracle 26ai
+### Oracle 26ai — schema VECTORS_USER (Python embedding backend)
 ```sql
-- PDB: FREEPDB1, user: vectors_user
+-- PDB: FREEPDB1, schema: VECTORS_USER
 -- Photos stored as file paths on the app server filesystem
 CREATE TABLE images (
    id        NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
@@ -235,6 +237,36 @@ CREATE VECTOR INDEX images_embedding_idx
    PARAMETERS (type HNSW, neighbors 32, efconstruction 200);
 ```
 ### Oracle 26ai — schema VECTOR (in-database embedding backend)
 ```sql
 -- PDB: FREEPDB1, schema: VECTOR
 -- Photos stored as BLOBs inside Oracle — no filesystem access at query time
 CREATE TABLE foto_vektor (
    id        NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    filename  VARCHAR2(100),
    foto      BLOB,                  -- full JPEG stored in Oracle
    foto_vek  VECTOR                 -- embedding computed by CLIP_IMG ONNX model
 );
 CREATE VECTOR INDEX foto_vektor_idx
    ON foto_vektor(foto_vek)
    ORGANIZATION INMEMORY NEIGHBOR GRAPH
    WITH DISTANCE COSINE
    WITH TARGET ACCURACY 95
    PARAMETERS (type HNSW, neighbors 32, efconstruction 200);
 ```
 **Key difference between the two Oracle schemas:**
 | Aspect | VECTORS_USER | VECTOR |
 |---|---|---|
 | Photo storage | File path (filesystem) | BLOB (inside Oracle) |
 | Embedding at index time | Python CLIP | Oracle `VECTOR_EMBEDDING(CLIP_IMG)` |
 | Embedding at query time | Python CLIP | Oracle `VECTOR_EMBEDDING(CLIP_TXT)` |
 | Indexed by | `index_images_oracle.py` | `index_images_indb.py` |
 **Key schema differences:**
 | Aspect | PostgreSQL/pgvector | Oracle 26ai |
@@ -268,21 +300,29 @@ Runs in **thin mode** — no Oracle Instant Client installation is required on t
 ### Indexing scripts
-Both scripts are idempotent: they check for existing rows and skip already-indexed
+All three scripts are idempotent: they check for existing rows and skip already-indexed
 photos. Each photo is committed individually so a crash does not lose prior work.
-| | `index_images.py` | `index_images_oracle.py` |
+| | `index_images.py` | `index_images_oracle.py` | `index_images_indb.py` |
-|---|---|---|
+|---|---|---|---|
-| Run command | `python3 index_images.py` | `python3 index_images_oracle.py` |
+| Schema | PostgreSQL `vectors_demo` | Oracle `VECTORS_USER` | Oracle `VECTOR` |
-| Vector bind | Python `list` passed directly | `array.array("f", embedding)` required |
+| Run command | `python3 index_images.py` | `python3 index_images_oracle.py` | `python3 index_images_indb.py` |
-| Bind style | `%s` placeholders (psycopg2) | `:1`, `:2`, `:3` positional (oracledb) |
+| Photo data sent | File path | File path | Full JPEG as BLOB |
-| Runtime (116 photos, CPU) | ~26 seconds | ~16 seconds |
+| Embedding | Python CLIP | Python CLIP | Oracle `VECTOR_EMBEDDING(CLIP_IMG)` |
 | Vector bind | Python `list` | `array.array("f", ...)` | Computed inside Oracle |
 | Avg runtime (3 runs, CPU) | **12.1 s** | **12.1 s** | **13.6 s** |
-**Why `array.array` for Oracle?**
+**Why `array.array` for `index_images_oracle.py`?**
 The `python-oracledb` driver does not accept a plain Python list for a `VECTOR`
 column. The data must be a Python `array.array` with typecode `"f"` (32-bit float),
 matching the `FLOAT32` declaration in the Oracle column type.
 **Why two SQL statements in `index_images_indb.py`?**
 Oracle raises `ORA-24816` if a BLOB bind variable appears before another bind in the
 same `VALUES` clause. The script works around this by inserting the BLOB first, then
 updating the vector in a second statement — letting Oracle read the stored BLOB to
 compute the embedding internally.
 ---
 ### FastAPI applications
@@ -343,7 +383,8 @@ Three single-file HTML frontends, each served by its own backend at `/ui/`:
 Features: search box, Enter-key support, suggestion chips (trees, water, people,
 buildings, sky, street, night, cars), result grid with thumbnails and similarity
-scores in percent.
+scores in percent. Click any photo to view it full size in a lightbox overlay;
 close with a click anywhere or `Escape`.
 ---
@@ -469,16 +510,22 @@ podman cp oravector-demo/sql/setup_vector_schema.sql oracle.free:/tmp/
 podman exec oracle.free bash -c "sqlplus -s / as sysdba @/tmp/setup_vector_schema.sql"
 ```
-**Populate `FOTO_VEKTOR`** with images and their vectors (run as VECTOR user in SQL):
+**Add HNSW index** (after the table is created):
-```sql
+```bash
-- Example: insert one photo with its CLIP_IMG embedding
+podman exec oracle.free bash -c "sqlplus -s 'vector/Vektor@localhost:1521/FREEPDB1' <<'EOF'
-INSERT INTO vector.foto_vektor (filename, foto, foto_vek)
+CREATE VECTOR INDEX foto_vektor_idx
-VALUES (
+    ON VECTOR.FOTO_VEKTOR(foto_vek)
-    'photo.jpg',
+    ORGANIZATION INMEMORY NEIGHBOR GRAPH
-    TO_BLOB(BFILENAME('VEC_DUMP', 'photo.jpg')),
+    WITH DISTANCE COSINE WITH TARGET ACCURACY 95
-    VECTOR_EMBEDDING(CLIP_IMG USING TO_BLOB(BFILENAME('VEC_DUMP', 'photo.jpg')) AS data)
+    PARAMETERS (type HNSW, neighbors 32, efconstruction 200);
-);
+EXIT;
-COMMIT;
+EOF"
 ```
 **Populate `FOTO_VEKTOR`** using the indexing script (reads JPEGs from `PHOTOS_DIR`,
 sends them as BLOBs to Oracle, which computes embeddings via `VECTOR_EMBEDDING(CLIP_IMG)`):
 ```bash
 cd oravector-demo/backend && python3 index_images_indb.py
 ```
 ---
@@ -518,11 +565,11 @@ cd oravector-demo/backend && uvicorn main_oracle_indb:app --host 0.0.0.0 --port
 # PostgreSQL
 cd pgvector-demo/backend && python3 index_images.py
-# Oracle (Python embedding)
+# Oracle VECTORS_USER (Python embedding)
 cd oravector-demo/backend && python3 index_images_oracle.py
-# Oracle in-database: re-indexing is done in SQL directly
+# Oracle VECTOR (in-database embedding)
-# (the VECTOR schema's FOTO_VEKTOR table is managed by Oracle)
+cd oravector-demo/backend && python3 index_images_indb.py
 ```
 ---
@@ -536,14 +583,15 @@ installation. The setup involved:
 1. Creating a `VECTOR` database user
 2. Exporting CLIP (ViT-B/32) to ONNX format and loading the models via
   `DBMS_VECTOR.LOAD_ONNX_MODEL`
-3. Creating and populating the `FOTO_VEKTOR` table with images and their vectors
+3. Creating the `FOTO_VEKTOR` table and HNSW index
 4. Populating `FOTO_VEKTOR` using `index_images_indb.py`
 The resulting models and table are:
 | Object | Type | Input | Output | Purpose |
 |---|---|---|---|---|
-| `VECTOR.CLIP_TXT` | ONNX model | `VARCHAR2` text | `VECTOR(512)` | Embed text queries |
+| `VECTOR.CLIP_TXT` | ONNX model | `VARCHAR2` text | `VECTOR(512)` | Embed text queries at search time |
-| `VECTOR.CLIP_IMG` | ONNX model | `BLOB` image | `VECTOR(512)` | Embed image data |
+| `VECTOR.CLIP_IMG` | ONNX model | `BLOB` image | `VECTOR(512)` | Embed images at index time |
 | `VECTOR.FOTO_VEKTOR` | Table | — | — | Stores filenames, image BLOBs, and vectors |
 These are called with the `VECTOR_EMBEDDING()` SQL function. The table
@@ -590,15 +638,41 @@ Measured on this installation (CPU only, no GPU):
 | Metric | PostgreSQL + pgvector | Oracle 26ai (Python embed) | Oracle 26ai (in-DB embed) |
 |---|---|---|---|
-| Photos indexed | 116 | 116 | 116 (manually indexed) |
+| Photos indexed | 116 | 116 | 116 |
-| Indexing time | ~26 seconds | ~16 seconds | 0 (indexed separately by admin) |
+| Avg indexing time (3 runs, CPU) | **12.1 s** | **12.1 s** | **13.6 s** |
-| Index type | HNSW (on disk) | HNSW (in-memory) | Full table scan (116 rows) |
+| Index type | HNSW (on disk) | HNSW (in-memory) | HNSW (in-memory) |
 | Memory required | None | 512 MB SGA | 512 MB SGA |
 | Photo storage | File path (filesystem) | File path (filesystem) | BLOB (in Oracle) |
 | Python CLIP at query time | Yes | Yes | **No** |
-| Embedding location | Python process | Python process | Inside Oracle SQL |
+| Embedding at index time | Python CLIP | Python CLIP | Oracle `VECTOR_EMBEDDING(CLIP_IMG)` |
 | Embedding at query time | Python CLIP | Python CLIP | Oracle `VECTOR_EMBEDDING(CLIP_TXT)` |
 | `VECTOR_EMBEDDING()` used | No | No | **Yes** |
 | Oracle schema | — | `VECTORS_USER` | `VECTOR` |
-Note: indexing time for backends 1 and 2 is dominated by CLIP inference (CPU),
+Note: indexing time is dominated by CLIP inference for backends 1 and 2 (CPU, no GPU).
-not database write speed. The in-database backend uses the manually loaded CLIP
+Backend 3 is slightly slower because each photo is transferred as a full JPEG BLOB
-models in the `VECTOR` schema; their indexing time is not measured here as it
+to Oracle over the network before Oracle computes the embedding internally.
-was performed separately by the administrator.
+
 ---
 ## Presentation
 The presentation `Vektoren in der Datenbank.pptx` is generated by `make_presentation.py`:
 ```bash
 python3 make_presentation.py
 ```
 **Start the slideshow directly** (skips the LibreOffice UI):
 ```bash
 libreoffice --impress --show "Vektoren in der Datenbank.pptx"
 ```
 Or use the local helper script (gitignored):
 ```bash
 ./present.sh
 ```
 Press `Esc` to exit the presentation.
@@ -0,0 +1,833 @@
 """
 Generates "Vektoren in der Datenbank.pptx" — a LibreOffice-compatible presentation.
 Run from the project root:  python3 make_presentation.py
 """
 from pptx import Presentation
 from pptx.util import Inches, Pt, Emu
 from pptx.dml.color import RGBColor
 from pptx.enum.text import PP_ALIGN
 from pptx.oxml.ns import qn
 from pptx.oxml import parse_xml
 from lxml import etree
 import os
 import numpy as np
 import matplotlib
 matplotlib.use("Agg")
 import matplotlib.pyplot as plt
 import matplotlib.patches as mpatches
 _A_NS = "http://schemas.openxmlformats.org/drawingml/2006/main"
 def OxmlElement(tag):
    local = tag.split(":")[1]
    return etree.fromstring(f'<a:{local} xmlns:a="{_A_NS}"/>')
 # ── Diagram generation (matplotlib → PNG → embedded in slide) ────────────────
 DIAG_BG   = "#1e1e2e"
 DIAG_GRID = "#313244"
 DIAG_AXIS = "#6c7086"
 def _fig(w, h):
    fig, ax = plt.subplots(figsize=(w, h))
    fig.patch.set_facecolor(DIAG_BG)
    ax.set_facecolor(DIAG_BG)
    return fig, ax
 def _save(fig, name):
    path = os.path.join("diagrams", name)
    fig.savefig(path, dpi=150, bbox_inches="tight", facecolor=DIAG_BG)
    plt.close(fig)
    return path
 def diagram_s3_vectors():
    """Slide 3: 2-D vector space with Hund / Katze / Auto."""
    fig, ax = _fig(5, 5)
    ax.set_xlim(-1.3, 1.3)
    ax.set_ylim(-1.3, 1.3)
    ax.set_aspect("equal")
    ax.grid(True, color=DIAG_GRID, linewidth=0.5, alpha=0.6)
    ax.axhline(0, color=DIAG_AXIS, linewidth=1)
    ax.axvline(0, color=DIAG_AXIS, linewidth=1)
    ax.set_xticks([]); ax.set_yticks([])
    for sp in ax.spines.values(): sp.set_visible(False)
    ax.text(1.27,  0.05, "x₁", color=DIAG_AXIS, fontsize=12)
    ax.text( 0.05, 1.27, "x₂", color=DIAG_AXIS, fontsize=12)
    vecs = [
        ((0.91,  0.12), "#89b4fa", "Hund"),
        ((0.87,  0.18), "#74c7ec", "Katze"),
        ((-0.30,  0.90), "#f38ba8", "Auto"),
    ]
    for (vx, vy), color, label in vecs:
        ax.annotate("", xy=(vx, vy), xytext=(0, 0),
                    arrowprops=dict(arrowstyle="->", color=color, lw=2.5))
        ox, oy = 0.10, 0.07
        ax.text(vx + ox * np.sign(vx or 1),
                vy + oy * np.sign(vy or 1),
                label, color=color, fontsize=13, fontweight="bold")
    # Small arc: Hund ↔ Katze
    a1 = np.degrees(np.arctan2(0.12, 0.91))
    a2 = np.degrees(np.arctan2(0.18, 0.87))
    ax.add_patch(mpatches.Arc((0, 0), 0.32, 0.32, angle=0,
                               theta1=min(a1, a2), theta2=max(a1, a2),
                               color="#a6e3a1", lw=2))
    ax.text(0.22, -0.10, "klein", color="#a6e3a1", fontsize=10, ha="center")
    # Large arc: Hund ↔ Auto
    a3 = np.degrees(np.arctan2(0.90, -0.30))
    ax.add_patch(mpatches.Arc((0, 0), 0.52, 0.52, angle=0,
                               theta1=a1, theta2=a3,
                               color="#fab387", lw=2))
    ax.text(-0.10, 0.34, "groß", color="#fab387", fontsize=10)
    plt.tight_layout(pad=0.3)
    return _save(fig, "s3_vectors.png")
 def diagram_s4_flow():
    """Slide 4: Semantic search pipeline as a flow diagram."""
    fig, ax = _fig(12, 1.9)        # flat figure — matches slide aspect ratio
    ax.set_xlim(0, 12); ax.set_ylim(0, 1.9)
    ax.axis("off")
    steps = [
        (1.2,  'Text-Anfrage\n"Bäume"',  "#89b4fa"),
        (3.6,  "CLIP-Modell",            "#cba6f7"),
        (6.0,  "Vektor  512 floats",     "#74c7ec"),
        (8.4,  "Datenbank k-NN",         "#f38ba8"),
        (10.8, "Ergebnisse\nnach Score", "#a6e3a1"),
    ]
    for x, label, color in steps:
        box = mpatches.FancyBboxPatch((x - 1.05, 0.22), 2.1, 1.4,
                                       boxstyle="round,pad=0.1",
                                       facecolor="#313244", edgecolor=color, linewidth=2)
        ax.add_patch(box)
        ax.text(x, 0.92, label, ha="center", va="center",
                color=color, fontsize=13, fontweight="bold", multialignment="center",
                fontfamily="sans-serif")
    for i in range(len(steps) - 1):
        x1 = steps[i][0]   + 1.05
        x2 = steps[i+1][0] - 1.05
        ax.annotate("", xy=(x2, 0.92), xytext=(x1, 0.92),
                    arrowprops=dict(arrowstyle="->", color=DIAG_AXIS, lw=2.5))
    plt.tight_layout(pad=0.15)
    return _save(fig, "s4_flow.png")
 def diagram_s6_cosine():
    """Slide 6: Two vectors with the cosine angle between them."""
    fig, ax = _fig(5, 4.5)
    ax.set_xlim(-0.2, 1.35); ax.set_ylim(-0.15, 1.35)
    ax.set_aspect("equal")
    ax.axis("off")
    vA = np.array([1.1,  0.25])   # image vector
    vB = np.array([0.55, 1.0 ])   # text vector
    for v, color, label, lpos in [
        (vA, "#89b4fa", "Bild-Vektor",         (1.12,  0.18)),
        (vB, "#cba6f7", 'Text-Vektor\n"Bäume"', (0.56,  1.07)),
    ]:
        ax.annotate("", xy=v, xytext=(0, 0),
                    arrowprops=dict(arrowstyle="->", color=color, lw=3))
        ax.text(*lpos, label, color=color, fontsize=12,
                fontweight="bold", ha="center", multialignment="center")
    # Angle arc
    a1 = np.degrees(np.arctan2(vA[1], vA[0]))
    a2 = np.degrees(np.arctan2(vB[1], vB[0]))
    ax.add_patch(mpatches.Arc((0, 0), 0.45, 0.45, angle=0,
                               theta1=a1, theta2=a2,
                               color="#a6e3a1", lw=2.5))
    mid_angle = np.radians((a1 + a2) / 2)
    ax.text(0.28 * np.cos(mid_angle), 0.28 * np.sin(mid_angle),
            "θ", color="#a6e3a1", fontsize=16, fontweight="bold",
            ha="center", va="center")
    # Origin dot
    ax.plot(0, 0, "o", color=DIAG_AXIS, markersize=6)
    # Formula
    ax.text(0.58, -0.12,
            "Ähnlichkeit = 1 − cos(θ)",
            color="#cdd6f4", fontsize=11, ha="center",
            fontfamily="monospace")
    plt.tight_layout(pad=0.3)
    return _save(fig, "s6_cosine.png")
 def diagram_architecture():
    """Architecture slide: 3 columns showing app server, database, and where CLIP runs."""
    CLIP_CLR = "#a6e3a1"
    # (x, db_name, color, port, clip_app, clip_db, db_tech, vec_embed_fn)
    COLS = [
        (2.3,  "PostgreSQL 18",              "#89b4fa", "Port 8000", True,  False, "pgvector 0.8.2\nHNSW (Disk)",  None),
        (6.65, "Oracle 26ai\nVECTORS_USER",  "#f38ba8", "Port 8001", True,  False, "HNSW (SGA)",                   None),
        (11.0, "Oracle 26ai\nVECTOR",        "#cba6f7", "Port 8002", False, True,  "HNSW (SGA)",                   "VECTOR_EMBEDDING()"),
    ]
    fig, ax = _fig(13.5, 6.5)
    ax.set_xlim(0, 13.5); ax.set_ylim(-0.8, 6.0)
    ax.axis("off")
    for x, db_name, color, port, clip_app, clip_db, db_tech, vec_fn in COLS:
        # ── Column title + port
        ax.text(x, 5.78, port, ha="center", color=color, fontsize=13, fontweight="bold")
        # ── App server box
        ax.add_patch(mpatches.FancyBboxPatch(
            (x-1.7, 3.7), 3.4, 1.85,
            boxstyle="round,pad=0.1", facecolor="#28293d", edgecolor=color, lw=2))
        ax.text(x, 5.38, "App-Server  (FastAPI)", ha="center",
                color=color, fontsize=11, fontweight="bold")
        if clip_app:
            ax.add_patch(mpatches.FancyBboxPatch(
                (x-1.2, 3.78), 2.4, 0.82,
                boxstyle="round,pad=0.08", facecolor="#1e1e2e", edgecolor=CLIP_CLR, lw=2))
            ax.text(x, 4.19, "CLIP-Modell\n(sentence-transformers)",
                    ha="center", va="center", color=CLIP_CLR, fontsize=9.5, fontweight="bold",
                    multialignment="center")
        else:
            ax.add_patch(mpatches.FancyBboxPatch(
                (x-1.2, 3.78), 2.4, 0.82,
                boxstyle="round,pad=0.08", facecolor="#1e1e2e", edgecolor=DIAG_AXIS, lw=1,
                linestyle="dashed"))
            ax.text(x, 4.19, "kein CLIP",
                    ha="center", va="center", color=DIAG_AXIS, fontsize=10, style="italic")
        # ── Arrow + what is sent
        ax.annotate("", xy=(x, 3.05), xytext=(x, 3.65),
                    arrowprops=dict(arrowstyle="->", color=DIAG_AXIS, lw=2))
        arrow_lbl = "Vektor (512 floats)" if clip_app else "Text-String"
        ax.text(x, 3.35, arrow_lbl, ha="center", va="center",
                color=DIAG_AXIS, fontsize=9, style="italic")
        # ── Database box
        db_h = 2.8 if clip_db else 1.9
        ax.add_patch(mpatches.FancyBboxPatch(
            (x-1.7, 0.15), 3.4, db_h,
            boxstyle="round,pad=0.1", facecolor="#28293d", edgecolor=color, lw=2))
        if clip_db:
            # CLIP ONNX box inside DB
            ax.add_patch(mpatches.FancyBboxPatch(
                (x-1.2, 0.25), 2.4, 0.82,
                boxstyle="round,pad=0.08", facecolor="#1e1e2e", edgecolor=CLIP_CLR, lw=2))
            ax.text(x, 0.66, "CLIP-Modell\n(ONNX, in Oracle)",
                    ha="center", va="center", color=CLIP_CLR, fontsize=9.5, fontweight="bold",
                    multialignment="center")
            # VECTOR_EMBEDDING() label
            ax.text(x, 1.22, vec_fn,
                    ha="center", color="#fab387", fontsize=10, fontweight="bold",
                    fontfamily="monospace")
            # DB name
            ax.text(x, 1.65, db_name, ha="center", color=color,
                    fontsize=11, fontweight="bold")
            ax.text(x, 2.35, db_tech, ha="center", color=DIAG_AXIS,
                    fontsize=9, multialignment="center")
        else:
            ax.text(x, 1.5, db_name, ha="center", color=color,
                    fontsize=11, fontweight="bold")
            ax.text(x, 0.72, db_tech, ha="center", color=DIAG_AXIS,
                    fontsize=9, multialignment="center")
    # ── Vertical separators
    for xsep in [4.5, 8.85]:
        ax.plot([xsep, xsep], [0.05, 5.9], color=DIAG_GRID, lw=1, linestyle="--")
    # ── Caption — separated from boxes, applies to all three columns
    ax.plot([0.3, 13.2], [-0.18, -0.18], color=DIAG_GRID, lw=1)
    ax.text(6.75, -0.5, "116 Street Fotos  ·  CLIP ViT-B/32  ·  512-dimensionale Vektoren",
            ha="center", va="center", color="#cdd6f4", fontsize=13, style="italic")
    plt.tight_layout(pad=0.2)
    return _save(fig, "architecture.png")
 # Generate diagrams up front
 os.makedirs("diagrams", exist_ok=True)
 DIAG_S3   = diagram_s3_vectors()
 DIAG_S4   = diagram_s4_flow()
 DIAG_S6   = diagram_s6_cosine()
 DIAG_ARCH = diagram_architecture()
 import copy
 # ── Colour palette (dark theme) ──────────────────────────────────────────────
 BG          = RGBColor(0x1e, 0x1e, 0x2e)   # slide background
 TITLE_CLR   = RGBColor(0xcb, 0xd3, 0xff)   # slide titles
 BODY_CLR    = RGBColor(0xcd, 0xd6, 0xf4)   # body text
 DIM_CLR     = RGBColor(0x6c, 0x70, 0x86)   # dimmed / captions
 ACCENT_PG   = RGBColor(0x89, 0xb4, 0xfa)   # pgvector blue
 ACCENT_ORA  = RGBColor(0xf3, 0x8b, 0xa8)   # Oracle red/pink
 ACCENT_IDB  = RGBColor(0xcb, 0xa6, 0xf7)   # in-DB purple
 ACCENT_GRN  = RGBColor(0xa6, 0xe3, 0xa1)   # green for highlights
 CODE_BG     = RGBColor(0x31, 0x32, 0x44)   # code block background
 CODE_CLR    = RGBColor(0xa6, 0xe3, 0xa1)   # code text
 W = Inches(13.33)   # widescreen 16:9
 H = Inches(7.5)
 FONT = "Roboto"
 prs = Presentation()
 prs.slide_width  = W
 prs.slide_height = H
 blank_layout = prs.slide_layouts[6]   # completely blank
 LOGO_PATH   = "/home/dierk/Bilder/Logo/Logo DLC Final.png"
 CONFERENCE  = "Quest Data Minds Konferenz"
 EVENT_DATE  = "28. Mai 2026"
 EVENT_CITY  = "Köln"
 _slide_num = [0]   # mutable counter so nested calls can increment it
 def add_slide(logo=True, footer=True):
    slide = prs.slides.add_slide(blank_layout)
    bg = slide.background
    fill = bg.fill
    fill.solid()
    fill.fore_color.rgb = BG
    if logo:
        slide.shapes.add_picture(LOGO_PATH,
            Inches(11.6), Inches(7.0), Inches(1.6), Inches(0.42))
    if footer:
        _slide_num[0] += 1
        # thin separator line
        sep = slide.shapes.add_shape(1, Inches(0.3), Inches(6.95), Inches(11.1), Pt(1))
        sep.fill.solid()
        sep.fill.fore_color.rgb = DIM_CLR
        sep.line.fill.background()
        # left: conference info
        txb(slide, f"{CONFERENCE}  ·  {EVENT_CITY}, {EVENT_DATE}",
            Inches(0.3), Inches(7.02), Inches(9.5), Inches(0.35),
            size=11, color=DIM_CLR)
        # right: page number (before logo)
        txb(slide, str(_slide_num[0]),
            Inches(10.9), Inches(7.02), Inches(0.6), Inches(0.35),
            size=11, color=DIM_CLR, align=PP_ALIGN.RIGHT)
    return slide
 def txb(slide, text, x, y, w, h,
        size=24, bold=False, color=BODY_CLR,
        align=PP_ALIGN.LEFT, italic=False):
    box = slide.shapes.add_textbox(x, y, w, h)
    tf  = box.text_frame
    tf.word_wrap = True
    p = tf.paragraphs[0]
    p.alignment = align
    run = p.add_run()
    run.text = text
    run.font.size   = Pt(size)
    run.font.bold   = bold
    run.font.italic = italic
    run.font.color.rgb = color
    run.font.name   = FONT
    return box
 def title_slide_layout(slide, title, subtitle=None):
    txb(slide, title,
        Inches(1), Inches(2.8), Inches(11.33), Inches(1.2),
        size=48, bold=True, color=TITLE_CLR, align=PP_ALIGN.CENTER)
    if subtitle:
        txb(slide, subtitle,
            Inches(1), Inches(4.1), Inches(11.33), Inches(0.8),
            size=24, color=DIM_CLR, align=PP_ALIGN.CENTER)
 def section_header(slide, title, accent=ACCENT_PG):
    """Full-width coloured bar at top, then title."""
    bar = slide.shapes.add_shape(
        1,  # MSO_SHAPE_TYPE.RECTANGLE
        Inches(0), Inches(0), W, Inches(0.12)
    )
    bar.fill.solid()
    bar.fill.fore_color.rgb = accent
    bar.line.fill.background()
    txb(slide, title,
        Inches(0.5), Inches(0.2), Inches(12.33), Inches(0.8),
        size=32, bold=True, color=TITLE_CLR)
 def bullet_box(slide, items, x, y, w, h, size=20, color=BODY_CLR, indent=False):
    box = slide.shapes.add_textbox(x, y, w, h)
    tf  = box.text_frame
    tf.word_wrap = True
    first = True
    for item in items:
        if first:
            p = tf.paragraphs[0]
            first = False
        else:
            p = tf.add_paragraph()
        p.space_before = Pt(4)
        run = p.add_run()
        run.text = ("    " if indent else "") + item
        run.font.size  = Pt(size)
        run.font.color.rgb = color
        run.font.name  = FONT
 def code_box(slide, code, x, y, w, h, size=13):
    # Background rectangle (no text)
    bg = slide.shapes.add_shape(1, x, y, w, h)
    bg.fill.solid()
    bg.fill.fore_color.rgb = CODE_BG
    bg.line.color.rgb = RGBColor(0x58, 0x5b, 0x70)
    bg.text_frame.text = ""
    # Text box on top — textboxes have predictable left-aligned defaults
    pad = Pt(7)
    tb = slide.shapes.add_textbox(x + pad, y + pad, w - pad * 2, h - pad * 2)
    tf = tb.text_frame
    tf.word_wrap = False
    tf.margin_left   = Pt(0)
    tf.margin_right  = Pt(0)
    tf.margin_top    = Pt(0)
    tf.margin_bottom = Pt(0)
    first = True
    for line in code.strip().split("\n"):
        if first:
            p = tf.paragraphs[0]
            first = False
        else:
            p = tf.add_paragraph()
        p.alignment    = PP_ALIGN.LEFT
        p.space_before = Pt(0)
        p.space_after  = Pt(0)
        # Explicitly zero out left margin, hanging indent, and remove any bullet
        pPr = p._p.get_or_add_pPr()
        pPr.set("marL", "0")
        pPr.set("indent", "0")
        for tag in ("a:buClr","a:buClrTx","a:buFont","a:buFontTx","a:buChar","a:buAutoNum","a:buNone"):
            for el in pPr.findall(qn(tag)):
                pPr.remove(el)
        pPr.append(OxmlElement("a:buNone"))
        run = p.add_run()
        run.text = line
        run.font.size  = Pt(size)
        run.font.color.rgb = CODE_CLR
        run.font.name  = "Courier New"
 def divider(slide, y, color=DIM_CLR):
    line = slide.shapes.add_shape(1, Inches(0.5), y, Inches(12.33), Pt(1))
    line.fill.solid()
    line.fill.fore_color.rgb = color
    line.line.fill.background()
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 1 — Titelfolie
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide(logo=False, footer=False)  # title slide: custom layout
 title_slide_layout(s,
    "Vektoren in der Datenbank",
    "Semantische Bildsuche mit PostgreSQL/pgvector und Oracle 26ai")
 # Conference details
 txb(s, CONFERENCE,
    Inches(1), Inches(5.0), Inches(11.33), Inches(0.5),
    size=20, bold=True, color=ACCENT_PG, align=PP_ALIGN.CENTER)
 txb(s, f"{EVENT_DATE}  ·  {EVENT_CITY}",
    Inches(1), Inches(5.5), Inches(11.33), Inches(0.45),
    size=18, color=DIM_CLR, align=PP_ALIGN.CENTER)
 # Larger centred logo
 s.shapes.add_picture(LOGO_PATH, Inches(4.67), Inches(6.1), Inches(4.0), Inches(1.06))
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 2 — Agenda
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Agenda", ACCENT_PG)
 bullet_box(s, [
    "01  Was ist ein Vektor?",
    "02  Semantische Suche — jenseits von Schlüsselwörtern",
    "03  Das CLIP-Modell",
    "04  Ähnlichkeit messen: Cosinus-Distanz",
    "05  PostgreSQL + pgvector",
    "06  Oracle 26ai — nativer Vektor-Support",
    "07  Oracle 26ai — Embedding in der Datenbank",
    "08  Architektur der Demo",
    "09  Demo",
    "10  Vergleich & Fazit",
 ], Inches(1.5), Inches(1.3), Inches(10), Inches(5.5), size=20)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 3 — Was ist ein Vektor?
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Was ist ein Vektor?", ACCENT_PG)
 bullet_box(s, [
    "▸  Ein Vektor ist eine geordnete Liste von Zahlen: [0.12, -0.87, 0.44, …]",
    "▸  Jede Zahl beschreibt eine Dimension im semantischen Raum",
    "▸  Moderne KI-Modelle erzeugen Vektoren mit 512 bis 1536 Dimensionen",
    "▸  Ähnliche Inhalte → ähnliche Vektoren → kleiner Abstand im Raum",
    "▸  Texte, Bilder, Audio — alles lässt sich in denselben Vektorraum einbetten",
 ], Inches(0.8), Inches(1.3), Inches(7.2), Inches(4), size=20)
 # 2-D vector diagram on the right
 s.shapes.add_picture(DIAG_S3, Inches(7.8), Inches(1.1), Inches(5.3), Inches(5.3))
 txb(s, "Vektoren machen Ähnlichkeit berechenbar.",
    Inches(0.8), Inches(5.8), Inches(6.8), Inches(0.7),
    size=22, bold=True, color=ACCENT_GRN)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 4 — Semantische Suche
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Semantische Suche — jenseits von Schlüsselwörtern", ACCENT_PG)
 bullet_box(s, [
    "Klassische Suche:    \"trees\" findet nur Dokumente mit dem Wort \"trees\"",
    "",
    "Semantische Suche:  \"trees\" findet Bilder von Wäldern, Parks, Natur —",
    "                             ohne dass das Wort irgendwo steht",
 ], Inches(0.8), Inches(1.3), Inches(11.5), Inches(2.2), size=20)
 divider(s, Inches(3.7))
 bullet_box(s, [
    "▸  Text-Anfrage wird in denselben Vektorraum eingebettet wie die Bilder",
    "▸  Datenbankabfrage: finde die k nächsten Nachbarn (k-NN)",
    "▸  Ergebnis: Bilder nach semantischer Ähnlichkeit gerankt",
    "▸  Kein manuelles Tagging, keine Metadaten nötig",
 ], Inches(0.8), Inches(3.9), Inches(11.5), Inches(1.1), size=20)
 # Flow diagram
 s.shapes.add_picture(DIAG_S4, Inches(0.5), Inches(5.1), Inches(12.3), Inches(1.75))
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 5 — CLIP-Modell
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Das CLIP-Modell (OpenAI)", ACCENT_IDB)
 bullet_box(s, [
    "CLIP = Contrastive Language–Image Pretraining",
    "▸  Trainiert auf hunderten Millionen Bild-Text-Paaren",
    "▸  Bildet sowohl Bilder als auch Text in denselben 512-dimensionalen Raum ab",
    "▸  Modell: clip-ViT-B-32  (Vision Transformer, Patch-Größe 32×32)",
    "▸  Quell-Gewichte: Hugging Face Hub (sentence-transformers/clip-ViT-B-32)",
 ], Inches(0.8), Inches(1.3), Inches(7.5), Inches(3.2), size=20)
 code_box(s,
    'from sentence_transformers import (\n    SentenceTransformer)\n\nmodel = SentenceTransformer(\n    "clip-ViT-B-32")\n\n# Bild einbetten\nvec = model.encode(image)\n# → 512 floats\n\n# Text einbetten\nvec = model.encode("Bäume")\n# → 512 floats, gleicher Raum!',
    Inches(8.8), Inches(1.3), Inches(4.3), Inches(3.8), size=11)
 txb(s, "Bild-Vektor und Text-Vektor zeigen in dieselbe Richtung,\nwenn Bild und Text inhaltlich übereinstimmen.",
    Inches(0.8), Inches(5.0), Inches(11.5), Inches(1.0),
    size=18, italic=True, color=ACCENT_IDB)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 6 — Cosinus-Distanz
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Ähnlichkeit messen: Cosinus-Distanz", ACCENT_PG)
 bullet_box(s, [
    "▸  CLIP-Vektoren haben unterschiedliche Beträge — daher kein euklidischer Abstand",
    "▸  Cosinus-Distanz misst nur den Winkel zwischen zwei Vektoren",
    "▸  Cosinus-Distanz = 0   →  identisch",
    "▸  Cosinus-Distanz = 1   →  völlig unähnlich",
    "▸  Ähnlichkeitswert = 1 − Distanz  →  1.0 = perfekte Übereinstimmung",
 ], Inches(0.8), Inches(1.3), Inches(7.5), Inches(3.5), size=20)
 # Cosine diagram on the right
 s.shapes.add_picture(DIAG_S6, Inches(8.0), Inches(1.1), Inches(5.1), Inches(3.7))
 code_box(s,
    "-- PostgreSQL\n1 - (embedding <=> query_vec)\n\n-- Oracle 26ai\n1 - VECTOR_DISTANCE(embedding, query_vec, COSINE)",
    Inches(0.8), Inches(5.0), Inches(6.0), Inches(1.85), size=13)
 txb(s, "In der Demo:\nScore 28 % = schwach\nScore 75 % = stark",
    Inches(7.0), Inches(5.0), Inches(5.0), Inches(1.85),
    size=18, color=ACCENT_GRN)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 7 — PostgreSQL + pgvector: Voraussetzungen
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "PostgreSQL + pgvector", ACCENT_PG)
 txb(s, "Was wird benötigt?", Inches(0.8), Inches(1.3), Inches(11), Inches(0.5),
    size=22, bold=True, color=ACCENT_PG)
 bullet_box(s, [
    "▸  PostgreSQL (ab Version 13)",
    "▸  pgvector-Extension  —  docker image: pgvector/pgvector:pg18",
    "▸  Extension aktivieren:  CREATE EXTENSION vector;",
    "▸  Python-Paket:  psycopg2-binary",
    "▸  KI-Bibliothek:  sentence-transformers  (auf dem Anwendungsserver)",
 ], Inches(0.8), Inches(1.9), Inches(11.5), Inches(2.5), size=20)
 divider(s, Inches(4.6))
 txb(s, "Schema & Index", Inches(0.8), Inches(4.5), Inches(11), Inches(0.5),
    size=22, bold=True, color=ACCENT_PG)
 code_box(s,
    "CREATE TABLE images (\n    id        SERIAL PRIMARY KEY,\n    filename  TEXT NOT NULL UNIQUE,\n    embedding vector(512)          -- pgvector-Typ\n);\n\nCREATE INDEX ON images USING hnsw (embedding vector_cosine_ops);",
    Inches(0.8), Inches(5.0), Inches(7.5), Inches(1.85), size=13)
 bullet_box(s, [
    "HNSW = Hierarchical Navigable Small World",
    "Approximativer k-NN Index",
    "Sehr schnell bei der Suche",
 ], Inches(8.8), Inches(5.0), Inches(4.3), Inches(1.85), size=18, color=DIM_CLR)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 8 — PostgreSQL: Suchanfrage
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "PostgreSQL: Suchanfrage", ACCENT_PG)
 bullet_box(s, [
    "1.  Text-Anfrage mit CLIP in Python in einen Vektor umwandeln",
    "2.  Vektor an die SQL-Abfrage übergeben",
    "3.  PostgreSQL findet die ähnlichsten Bilder via HNSW-Index",
 ], Inches(0.8), Inches(1.3), Inches(11.5), Inches(1.5), size=20)
 code_box(s,
    "# Python\nvec = model.encode(\"Bäume\")       # → 512 floats\n\n# SQL\nSELECT filename,\n       1 - (embedding <=> %s::vector) AS score\nFROM   images\nORDER  BY embedding <=> %s::vector\nLIMIT  12;",
    Inches(0.8), Inches(3.0), Inches(7.5), Inches(3.5), size=16)
 bullet_box(s, [
    "<=>  Cosinus-Distanz-Operator",
    "(pgvector-spezifisch)",
    "",
    "$1::vector  expliziter Cast",
    "erforderlich",
    "",
    "LIMIT statt FETCH FIRST",
 ], Inches(9.0), Inches(3.0), Inches(4.0), Inches(3.5), size=18, color=DIM_CLR)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 9 — Oracle 26ai: Nativer Support
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Oracle 26ai — nativer Vektor-Support", ACCENT_ORA)
 txb(s, "Was wird benötigt?", Inches(0.8), Inches(1.3), Inches(11), Inches(0.5),
    size=22, bold=True, color=ACCENT_ORA)
 bullet_box(s, [
    "▸  Oracle AI Database 26ai Free (oder Enterprise)",
    "▸  Keine Extension nötig — Vektoren sind eingebaut",
    "▸  Vector Memory Area im SGA konfigurieren (für HNSW-Index)",
    "▸  Python-Paket: oracledb  (Thin Mode — kein Oracle Client nötig)",
    "▸  KI-Bibliothek: sentence-transformers  (auf dem Anwendungsserver)",
 ], Inches(0.8), Inches(1.9), Inches(11.5), Inches(2.2), size=20)
 divider(s, Inches(4.2))
 txb(s, "Schema & Index", Inches(0.8), Inches(4.3), Inches(11), Inches(0.45),
    size=20, bold=True, color=ACCENT_ORA)
 code_box(s,
    "CREATE TABLE images (\n    id        NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n    filename  VARCHAR2(255) NOT NULL UNIQUE,\n    embedding VECTOR(512, FLOAT32)     -- Typ + Dimension\n);\nCREATE VECTOR INDEX images_idx ON images(embedding)\n    ORGANIZATION INMEMORY NEIGHBOR GRAPH\n    WITH DISTANCE COSINE WITH TARGET ACCURACY 95;",
    Inches(0.8), Inches(4.8), Inches(8.5), Inches(2.0), size=11)
 bullet_box(s, [
    "HNSW im SGA",
    "(Vector Memory Area)",
    "512 MB konfiguriert",
 ], Inches(9.8), Inches(4.8), Inches(3.3), Inches(2.0), size=17, color=DIM_CLR)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 10 — Oracle: Unterschiede zu pgvector
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Oracle vs. pgvector — Schema-Unterschiede", ACCENT_ORA)
 rows = [
    ("Extension",       "CREATE EXTENSION vector",        "Eingebaut, keine Extension"),
    ("Vektor-Spalte",   "vector(512)  — nur Dimension",   "VECTOR(512, FLOAT32)  — Dim + Typ"),
    ("Primary Key",     "SERIAL",                          "NUMBER GENERATED ALWAYS AS IDENTITY"),
    ("Text-Spalte",     "TEXT  (unbegrenzt)",              "VARCHAR2(n)  — Länge erforderlich"),
    ("HNSW-Syntax",     "USING hnsw (...ops)",             "ORGANIZATION INMEMORY NEIGHBOR GRAPH"),
    ("Genauigkeit",     "Implizit via Index-Parameter",   "WITH TARGET ACCURACY 95  (explizit)"),
    ("Speicher",        "Kein Sonder-Speicher nötig",     "vector_memory_size im SGA"),
    ("Abstand-Op",      "<=>  (Operator)",                 "VECTOR_DISTANCE(col, vec, COSINE)"),
    ("Top-N",           "LIMIT n",                         "FETCH FIRST n ROWS ONLY"),
 ]
 # Column header row
 y = Inches(1.3)
 hdr_bg = s.shapes.add_shape(1, Inches(0.3), y, Inches(12.7), Inches(0.55))
 hdr_bg.fill.solid()
 hdr_bg.fill.fore_color.rgb = RGBColor(0x18, 0x18, 0x28)
 hdr_bg.line.fill.background()
 txb(s, "Aspekt",               Inches(0.4), y + Pt(6), Inches(2.2), Inches(0.5), size=14, bold=True, color=BODY_CLR)
 txb(s, "PostgreSQL + pgvector",Inches(2.7), y + Pt(6), Inches(4.8), Inches(0.5), size=14, bold=True, color=ACCENT_PG)
 txb(s, "Oracle 26ai",          Inches(7.6), y + Pt(6), Inches(5.4), Inches(0.5), size=14, bold=True, color=ACCENT_ORA)
 y += Inches(0.56)
 for i, (aspect, pg, ora) in enumerate(rows):
    bg_color = RGBColor(0x28, 0x29, 0x3d) if i % 2 == 0 else RGBColor(0x24, 0x25, 0x38)
    row_bg = s.shapes.add_shape(1, Inches(0.3), y, Inches(12.7), Inches(0.52))
    row_bg.fill.solid()
    row_bg.fill.fore_color.rgb = bg_color
    row_bg.line.fill.background()
    txb(s, aspect, Inches(0.4), y + Pt(5), Inches(2.2), Inches(0.48), size=13, bold=True, color=DIM_CLR)
    txb(s, pg,     Inches(2.7), y + Pt(5), Inches(4.8), Inches(0.48), size=13, color=ACCENT_PG)
    txb(s, ora,    Inches(7.6), y + Pt(5), Inches(5.4), Inches(0.48), size=13, color=ACCENT_ORA)
    y += Inches(0.53)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 11 — Oracle In-Database Embedding
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Oracle 26ai — Embedding in der Datenbank", ACCENT_IDB)
 bullet_box(s, [
    "▸  Oracle kann ONNX-Modelle direkt in die Datenbank laden",
    "     (ONNX = Open Neural Network Exchange)",
    "▸  VECTOR_EMBEDDING() ruft das Modell innerhalb einer SQL-Abfrage auf",
    "▸  Kein Python, keine KI-Bibliothek auf dem Anwendungsserver zur Laufzeit",
    "▸  Der Text-String ist der einzige Parameter aus Python",
    "▸  Schema: VECTOR  —  Tabelle: FOTO_VEKTOR  —  Bilder als BLOB gespeichert",
    "▸  HNSW-Index auf FOTO_VEKTOR (wie in Schema VECTORS_USER)",
 ], Inches(0.8), Inches(1.3), Inches(11.5), Inches(3.0), size=19)
 code_box(s,
    "-- Gesamte Logik in einem SQL-Statement\nSELECT filename,\n       1 - VECTOR_DISTANCE(\n               foto_vek,\n               VECTOR_EMBEDDING(CLIP_TXT USING :q AS data),\n               COSINE\n           ) AS score\nFROM   VECTOR.FOTO_VEKTOR\nORDER  BY VECTOR_DISTANCE(\n             foto_vek,\n             VECTOR_EMBEDDING(CLIP_TXT USING :q AS data), COSINE)\nFETCH  FIRST 12 ROWS ONLY;",
    Inches(0.8), Inches(3.6), Inches(7.5), Inches(3.3), size=13)
 bullet_box(s, [
    ":q  = reiner Text aus Python",
    "",
    "Oracle übernimmt:",
    "  • Tokenisierung",
    "  • ONNX-Inferenz",
    "  • Vektorsuche",
    "",
    "→ Architektur vereinfacht sich",
 ], Inches(9.0), Inches(3.6), Inches(4.0), Inches(3.4), size=18, color=DIM_CLR)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 12 — ONNX in Oracle: Besonderheit
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "ONNX in Oracle: Was zu beachten ist", ACCENT_IDB)
 bullet_box(s, [
    "Oracle's ONNX-Validator stellt strenge Anforderungen an das Modell-Graph:",
    "",
    "▸  input_ids darf nur in einem einzigen Gather-Knoten verwendet werden",
    "▸  Standard-CLIP-Export verwendet input_ids auch in ArgMax  →  wird abgelehnt",
    "",
    "Lösung: CLIP_TXT mit CLS-Token-Pooling (Position 0) statt EOS-Token-Pooling",
    "▸  Einfacherer ONNX-Graph, den Oracle akzeptiert",
    "▸  Cosinus-Ähnlichkeit zwischen EOS- und CLS-Variante: ~0,70",
    "▸  Modell muss beim Export entsprechend angepasst werden",
 ], Inches(0.8), Inches(1.3), Inches(11.5), Inches(3.8), size=19)
 code_box(s,
    "-- Modell laden (einmalig durch Administrator)\nEXEC DBMS_VECTOR.LOAD_ONNX_MODEL(\n    'VEC_DUMP', 'clip_txt.onnx', 'CLIP_TXT',\n    JSON('{\"function\":\"embedding\",\"embeddingOutput\":\"output\",\n          \"input\":{\"input\":[\"DATA\"]}}'));",
    Inches(0.8), Inches(5.2), Inches(11.5), Inches(1.6), size=13)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 13 — Architektur: Wo wird CLIP berechnet?
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Architektur der Demo", ACCENT_GRN)
 s.shapes.add_picture(DIAG_ARCH, Inches(0.3), Inches(1.1), Inches(12.73), Inches(5.7))
 # Slide 15 — Demo-Hinweis
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Demo", ACCENT_GRN)
 for url, label, color, y in [
    ("http://localhost:8000/ui/", "pgvector (blau)",    ACCENT_PG,  Inches(2.2)),
    ("http://localhost:8001/ui/", "Oracle 26ai (rot)",  ACCENT_ORA, Inches(3.5)),
    ("http://localhost:8002/ui/", "Oracle In-DB (lila)",ACCENT_IDB, Inches(4.8)),
 ]:
    txb(s, url,   Inches(1.5), y,               Inches(6),    Inches(0.5), size=22, bold=True, color=color)
    txb(s, label, Inches(7.8), y + Inches(0.05), Inches(4.5), Inches(0.5), size=20, color=DIM_CLR)
 txb(s, "Suchbegriffe zum Ausprobieren:",
    Inches(1.5), Inches(5.9), Inches(10), Inches(0.5), size=18, color=BODY_CLR)
 txb(s, "Bäume  ·  Wasser  ·  Menschen  ·  Gebäude  ·  Himmel  ·  Nacht  ·  Autos",
    Inches(1.5), Inches(6.3), Inches(10), Inches(0.6), size=20, bold=True, color=ACCENT_GRN)
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 15 — Vergleich
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Vergleich", ACCENT_PG)
 rows = [
    ("Merkmal",                "PostgreSQL + pgvector",       "Oracle · VECTORS_USER",      "Oracle · VECTOR"),
    ("Fotos indiziert",        "116",                          "116",                         "116"),
    ("Indizierungszeit",       "Ø 12,1 Sek.  (3 Läufe)",      "Ø 12,1 Sek.  (3 Läufe)",     "Ø 13,6 Sek.  (3 Läufe)"),
    ("Index-Typ",              "HNSW (auf Disk)",              "HNSW (im Speicher)",          "HNSW (im Speicher)"),
    ("RAM-Bedarf",             "Keiner",                       "512 MB SGA",                  "512 MB SGA"),
    ("CLIP zur Laufzeit",      "Ja (Python)",                  "Ja (Python)",                 "Nein"),
    ("Embedding-Ort",          "Python-Prozess",               "Python-Prozess",              "In der Datenbank"),
    ("VECTOR_EMBEDDING()",     "—",                            "—",                           "Ja"),
    ("Extension nötig",        "CREATE EXTENSION vector",      "Nein",                        "Nein"),
 ]
 y = Inches(1.3)
 header = True
 for row in rows:
    bg_color = RGBColor(0x18, 0x18, 0x28) if header else (RGBColor(0x28, 0x29, 0x3d) if rows.index(row) % 2 == 0 else RGBColor(0x24, 0x25, 0x38))
    row_bg = s.shapes.add_shape(1, Inches(0.3), y, Inches(12.7), Inches(0.52))
    row_bg.fill.solid()
    row_bg.fill.fore_color.rgb = bg_color
    row_bg.line.fill.background()
    colors = [DIM_CLR, ACCENT_PG, ACCENT_ORA, ACCENT_IDB] if header else [BODY_CLR, ACCENT_PG, ACCENT_ORA, ACCENT_IDB]
    widths = [2.5, 3.0, 3.1, 3.1]
    xs = [0.4, 2.9, 6.0, 9.15]
    for j, (cell, col, w, x) in enumerate(zip(row, colors, widths, xs)):
        txb(s, cell, Inches(x), y + Pt(4), Inches(w), Inches(0.48),
            size=13, bold=header, color=col)
    y += Inches(0.53)
    header = False
 # ════════════════════════════════════════════════════════════════════════════
 # Slide 16 — Fazit
 # ════════════════════════════════════════════════════════════════════════════
 s = add_slide()
 section_header(s, "Fazit", ACCENT_GRN)
 bullet_box(s, [
    "▸  Beide Datenbanken unterstützen Vektorsuche produktionsreif",
    "▸  pgvector: einfach, leichtgewichtig, kein zusätzlicher Speicher nötig",
    "▸  Oracle 26ai: vollständig integriert, kein Extension-Management",
    "▸  Oracle In-DB Embedding: Architektur ohne ML-Laufzeit im App-Server",
    "▸  CLIP ermöglicht Bildersuche per Freitext — ohne Tagging oder Metadaten",
    "▸  HNSW liefert schnelle approximative k-NN-Suche in beiden Datenbanken",
 ], Inches(0.8), Inches(1.3), Inches(11.5), Inches(3.5), size=21)
 divider(s, Inches(5.1))
 txb(s, "Quellcode & Dokumentation",
    Inches(0.8), Inches(5.2), Inches(11), Inches(0.5),
    size=20, bold=True, color=BODY_CLR)
 txb(s, "https://gitea.dl-cons.de/dierk/vector-search-demo",
    Inches(0.8), Inches(5.7), Inches(11), Inches(0.5),
    size=20, color=ACCENT_PG)
 txb(s, "Programmierung und Folien unterstützt durch Claude (Anthropic)",
    Inches(0.8), Inches(6.55), Inches(11.33), Inches(0.35),
    size=13, italic=True, color=DIM_CLR, align=PP_ALIGN.CENTER)
 # ════════════════════════════════════════════════════════════════════════════
 # Save
 # ════════════════════════════════════════════════════════════════════════════
 OUT = "Vektoren in der Datenbank.pptx"
 prs.save(OUT)
 print(f"Saved: {OUT}  ({prs.slides.__len__()} slides)")
@@ -4,14 +4,19 @@ from PIL import Image
 _model = None
 def _get_model():
    # Lazy load: the CLIP model is ~600 MB and takes several seconds to initialise.
    # Loading on first call avoids the cost at import time and during indexing warmup.
    global _model
    if _model is None:
        _model = SentenceTransformer("clip-ViT-B-32")
    return _model
 def embed_image(path: str) -> list[float]:
    # CLIP requires RGB — some JPEGs are stored as CMYK or grayscale.
    img = Image.open(path).convert("RGB")
    return _get_model().encode(img).tolist()
 def embed_text(text: str) -> list[float]:
    # Text and images share the same 512-dimensional vector space in CLIP,
    # so the returned vector is directly comparable to image embeddings.
    return _get_model().encode(text).tolist()
@@ -0,0 +1,49 @@
 import os
 import time
 from dotenv import load_dotenv
 from db_oracle import get_connection_indb
 load_dotenv()
 PHOTOS_DIR = os.getenv("PHOTOS_DIR")
 def main():
    conn = get_connection_indb()
    cur = conn.cursor()
    cur.execute("SELECT COUNT(*) FROM VECTOR.FOTO_VEKTOR")
    print(f"Rows before: {cur.fetchone()[0]}")
    files = [f for f in os.listdir(PHOTOS_DIR) if f.lower().endswith((".jpg", ".jpeg"))]
    print(f"Found {len(files)} photos in {PHOTOS_DIR}")
    start = time.time()
    for i, filename in enumerate(files, 1):
        filepath = os.path.join(PHOTOS_DIR, filename)
        cur.execute("SELECT 1 FROM VECTOR.FOTO_VEKTOR WHERE filename = :1", (filename,))
        if cur.fetchone():
            print(f"[{i}/{len(files)}] Skipping {filename} (already indexed)")
            continue
        with open(filepath, "rb") as f:
            blob_data = f.read()
        # ORA-24816: Oracle cannot bind the same BLOB as both column value and
        # VECTOR_EMBEDDING() input in one statement. Insert the BLOB first, then
        # let Oracle compute the embedding from the stored data in a second step.
        cur.execute(
            "INSERT INTO VECTOR.FOTO_VEKTOR (filename, foto) VALUES (:1, :2)",
            (filename, blob_data),
        )
        cur.execute(
            """UPDATE VECTOR.FOTO_VEKTOR
               SET foto_vek = VECTOR_EMBEDDING(CLIP_IMG USING foto AS data)
               WHERE filename = :1""",
            (filename,),
        )
        conn.commit()
        print(f"[{i}/{len(files)}] Indexed {filename}")
    elapsed = time.time() - start
    print(f"Done in {elapsed:.1f} seconds.")
 if __name__ == "__main__":
    main()
@@ -1,5 +1,6 @@
 import os
 import array
 import time
 from dotenv import load_dotenv
 from db_oracle import get_connection
 from embedder import embed_image
@@ -47,12 +48,14 @@ def main():
    files = [f for f in os.listdir(PHOTOS_DIR) if f.lower().endswith((".jpg", ".jpeg"))]
    print(f"Found {len(files)} photos in {PHOTOS_DIR}")
    start = time.time()
    for i, filename in enumerate(files, 1):
        filepath = os.path.join(PHOTOS_DIR, filename)
        cur.execute("SELECT 1 FROM images WHERE filename = :1", (filename,))
        if cur.fetchone():
            print(f"[{i}/{len(files)}] Skipping {filename} (already indexed)")
            continue
        # oracledb requires array.array("f") for VECTOR(512, FLOAT32) — plain list is rejected.
        embedding = array.array("f", embed_image(filepath))
        cur.execute(INSERT, (filename, filepath, embedding))
        conn.commit()
@@ -60,7 +63,7 @@ def main():
    cur.close()
    conn.close()
-    print("Done.")
+    print(f"Done in {time.time() - start:.1f} seconds.")
 if __name__ == "__main__":
    main()
@@ -20,6 +20,8 @@ app.mount("/ui", StaticFiles(directory=os.path.abspath(FRONTEND_DIR), html=True)
@app.get("/search")
 def search(q: str = Query(...), limit: int = Query(12)):
    # oracledb rejects a plain Python list for a VECTOR column.
    # array.array("f") produces a typed 32-bit float buffer that matches VECTOR(512, FLOAT32).
    vec = array.array("f", embed_text(q))
    conn = get_connection()
    cur = conn.cursor()
@@ -1,3 +1,5 @@
 # No embedder import — text embedding happens inside Oracle via VECTOR_EMBEDDING(CLIP_TXT).
 # The only value Python passes to the database is the raw query string (:q).
 import os
 from fastapi import FastAPI, Query
 from fastapi.middleware.cors import CORSMiddleware
@@ -106,6 +106,37 @@
      text-overflow: ellipsis;
    }
    .empty { text-align: center; color: #999; margin-top: 3rem; font-size: 1rem; }
    .card img { cursor: pointer; }
    .card img:hover { opacity: 0.85; }
    .lightbox {
      display: none;
      position: fixed;
      inset: 0;
      background: rgba(0,0,0,0.85);
      z-index: 100;
      align-items: center;
      justify-content: center;
      flex-direction: column;
      gap: 0.8rem;
    }
    .lightbox.open { display: flex; }
    .lightbox img {
      max-width: 90vw;
      max-height: 80vh;
      object-fit: contain;
      border-radius: 4px;
      box-shadow: 0 4px 32px rgba(0,0,0,0.6);
    }
    .lightbox-info { color: white; font-size: 0.95rem; text-align: center; }
    .lightbox-info .lb-score { color: #cba6f7; font-weight: 700; }
    .lightbox-close {
      position: fixed;
      top: 1rem; right: 1.2rem;
      color: white; font-size: 2rem;
      cursor: pointer; line-height: 1;
    }
  </style>
 </head>
 <body>
@@ -134,6 +165,14 @@
  <p class="stats" id="stats"></p>
  <div class="grid" id="grid"><p class="empty">Enter a search term above.</p></div>
  <div class="lightbox" id="lightbox" onclick="closeLightbox()">
    <span class="lightbox-close" onclick="closeLightbox()">✕</span>
    <img id="lb-img" src="" alt="" />
    <div class="lightbox-info">
      <span id="lb-name"></span> &nbsp;·&nbsp; <span class="lb-score" id="lb-score"></span>
    </div>
  </div>
  <script>
    const API = "http://localhost:8002";
@@ -166,7 +205,8 @@
      }
      grid.innerHTML = results.map(r => `
        <div class="card">
-          <img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy" />
+          <img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy"
               onclick="openLightbox('${encodeURIComponent(r.filename)}','${r.filename}','${(r.score*100).toFixed(1)}%')" />
          <div class="card-info">
            <div class="score">${(r.score * 100).toFixed(1)}% match</div>
            <div class="name">${r.filename}</div>
@@ -174,6 +214,19 @@
        </div>
      `).join("");
    }
    function openLightbox(encoded, name, score) {
      document.getElementById("lb-img").src = `${API}/photos/${encoded}`;
      document.getElementById("lb-name").textContent = name;
      document.getElementById("lb-score").textContent = score + " match";
      document.getElementById("lightbox").classList.add("open");
    }
    function closeLightbox() {
      document.getElementById("lightbox").classList.remove("open");
    }
    document.addEventListener("keydown", e => { if (e.key === "Escape") closeLightbox(); });
  </script>
 </body>
 </html>
@@ -106,6 +106,37 @@
      text-overflow: ellipsis;
    }
    .empty { text-align: center; color: #999; margin-top: 3rem; font-size: 1rem; }
    .card img { cursor: pointer; }
    .card img:hover { opacity: 0.85; }
    .lightbox {
      display: none;
      position: fixed;
      inset: 0;
      background: rgba(0,0,0,0.85);
      z-index: 100;
      align-items: center;
      justify-content: center;
      flex-direction: column;
      gap: 0.8rem;
    }
    .lightbox.open { display: flex; }
    .lightbox img {
      max-width: 90vw;
      max-height: 80vh;
      object-fit: contain;
      border-radius: 4px;
      box-shadow: 0 4px 32px rgba(0,0,0,0.6);
    }
    .lightbox-info { color: white; font-size: 0.95rem; text-align: center; }
    .lightbox-info .lb-score { color: #f38ba8; font-weight: 700; }
    .lightbox-close {
      position: fixed;
      top: 1rem; right: 1.2rem;
      color: white; font-size: 2rem;
      cursor: pointer; line-height: 1;
    }
  </style>
 </head>
 <body>
@@ -134,6 +165,14 @@
  <p class="stats" id="stats"></p>
  <div class="grid" id="grid"><p class="empty">Enter a search term above.</p></div>
  <div class="lightbox" id="lightbox" onclick="closeLightbox()">
    <span class="lightbox-close" onclick="closeLightbox()">✕</span>
    <img id="lb-img" src="" alt="" />
    <div class="lightbox-info">
      <span id="lb-name"></span> &nbsp;·&nbsp; <span class="lb-score" id="lb-score"></span>
    </div>
  </div>
  <script>
    const API = "http://localhost:8001";
@@ -166,7 +205,8 @@
      }
      grid.innerHTML = results.map(r => `
        <div class="card">
-          <img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy" />
+          <img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy"
               onclick="openLightbox('${encodeURIComponent(r.filename)}','${r.filename}','${(r.score*100).toFixed(1)}%')" />
          <div class="card-info">
            <div class="score">${(r.score * 100).toFixed(1)}% match</div>
            <div class="name">${r.filename}</div>
@@ -174,6 +214,19 @@
        </div>
      `).join("");
    }
    function openLightbox(encoded, name, score) {
      document.getElementById("lb-img").src = `${API}/photos/${encoded}`;
      document.getElementById("lb-name").textContent = name;
      document.getElementById("lb-score").textContent = score + " match";
      document.getElementById("lightbox").classList.add("open");
    }
    function closeLightbox() {
      document.getElementById("lightbox").classList.remove("open");
    }
    document.addEventListener("keydown", e => { if (e.key === "Escape") closeLightbox(); });
  </script>
 </body>
 </html>
@@ -1,179 +0,0 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Vector Image Search — Oracle In-DB</title>
  <style>
    * { box-sizing: border-box; margin: 0; padding: 0; }
    body { font-family: system-ui, sans-serif; background: #f5f5f5; color: #222; }
    header {
      background: #7b5ea7;
      color: white;
      padding: 1.2rem 2rem;
      display: flex;
      align-items: center;
      gap: 1rem;
    }
    header h1 { font-size: 1.4rem; font-weight: 600; }
    .badge {
      background: white;
      color: #7b5ea7;
      font-size: 0.75rem;
      font-weight: 700;
      padding: 0.2rem 0.6rem;
      border-radius: 999px;
    }
    .search-area {
      max-width: 700px;
      margin: 2rem auto 1rem;
      padding: 0 1rem;
    }
    .search-row {
      display: flex;
      gap: 0.5rem;
    }
    input[type="text"] {
      flex: 1;
      padding: 0.7rem 1rem;
      font-size: 1rem;
      border: 1px solid #ccc;
      border-radius: 6px;
    }
    button.search-btn {
      padding: 0.7rem 1.4rem;
      background: #7b5ea7;
      color: white;
      border: none;
      border-radius: 6px;
      font-size: 1rem;
      cursor: pointer;
    }
    button.search-btn:hover { background: #664e8d; }
    .chips {
      display: flex;
      flex-wrap: wrap;
      gap: 0.4rem;
      margin-top: 0.8rem;
    }
    .chip {
      padding: 0.3rem 0.8rem;
      background: white;
      border: 1px solid #ccc;
      border-radius: 999px;
      font-size: 0.85rem;
      cursor: pointer;
    }
    .chip:hover { background: #f3f0f8; border-color: #7b5ea7; }
    .stats { text-align: center; color: #666; font-size: 0.85rem; margin-bottom: 1rem; }
    .grid {
      display: grid;
      grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
      gap: 1rem;
      max-width: 1200px;
      margin: 0 auto;
      padding: 0 1rem 2rem;
    }
    .card {
      background: white;
      border-radius: 8px;
      overflow: hidden;
      box-shadow: 0 1px 4px rgba(0,0,0,0.1);
    }
    .card img {
      width: 100%;
      height: 140px;
      object-fit: cover;
      display: block;
    }
    .card-info {
      padding: 0.5rem 0.7rem;
      font-size: 0.8rem;
    }
    .card-info .score {
      font-weight: 700;
      color: #7b5ea7;
    }
    .card-info .name {
      color: #555;
      white-space: nowrap;
      overflow: hidden;
      text-overflow: ellipsis;
    }
    .empty { text-align: center; color: #999; margin-top: 3rem; font-size: 1rem; }
  </style>
 </head>
 <body>
  <header>
    <h1>Vector Image Search</h1>
    <span class="badge">Oracle In-DB</span>
  </header>
  <div class="search-area">
    <div class="search-row">
      <input id="query" type="text" placeholder="Search photos, e.g. trees, water, night…" />
      <button class="search-btn" onclick="doSearch()">Search</button>
    </div>
    <div class="chips">
      <span class="chip" onclick="setQuery('trees')">trees</span>
      <span class="chip" onclick="setQuery('water')">water</span>
      <span class="chip" onclick="setQuery('people')">people</span>
      <span class="chip" onclick="setQuery('buildings')">buildings</span>
      <span class="chip" onclick="setQuery('sky')">sky</span>
      <span class="chip" onclick="setQuery('street')">street</span>
      <span class="chip" onclick="setQuery('night')">night</span>
      <span class="chip" onclick="setQuery('cars')">cars</span>
    </div>
  </div>
  <p class="stats" id="stats"></p>
  <div class="grid" id="grid"><p class="empty">Enter a search term above.</p></div>
  <script>
    const API = "http://localhost:8002";
    fetch(`${API}/stats`)
      .then(r => r.json())
      .then(d => document.getElementById("stats").textContent = `${d.count} photos indexed`);
    document.getElementById("query").addEventListener("keydown", e => {
      if (e.key === "Enter") doSearch();
    });
    function setQuery(text) {
      document.getElementById("query").value = text;
      doSearch();
    }
    function doSearch() {
      const q = document.getElementById("query").value.trim();
      if (!q) return;
      fetch(`${API}/search?q=${encodeURIComponent(q)}&limit=12`)
        .then(r => r.json())
        .then(renderResults);
    }
    function renderResults(results) {
      const grid = document.getElementById("grid");
      if (!results.length) {
        grid.innerHTML = '<p class="empty">No results found.</p>';
        return;
      }
      grid.innerHTML = results.map(r => `
        <div class="card">
          <img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy" />
          <div class="card-info">
            <div class="score">${(r.score * 100).toFixed(1)}% match</div>
            <div class="name">${r.filename}</div>
          </div>
        </div>
      `).join("");
    }
  </script>
 </body>
 </html>
@@ -4,14 +4,19 @@ from PIL import Image
 _model = None
 def _get_model():
    # Lazy load: the CLIP model is ~600 MB and takes several seconds to initialise.
    # Loading on first call avoids the cost at import time and during indexing warmup.
    global _model
    if _model is None:
        _model = SentenceTransformer("clip-ViT-B-32")
    return _model
 def embed_image(path: str) -> list[float]:
    # CLIP requires RGB — some JPEGs are stored as CMYK or grayscale.
    img = Image.open(path).convert("RGB")
    return _get_model().encode(img).tolist()
 def embed_text(text: str) -> list[float]:
    # Text and images share the same 512-dimensional vector space in CLIP,
    # so the returned vector is directly comparable to image embeddings.
    return _get_model().encode(text).tolist()
@@ -1,4 +1,5 @@
 import os
 import time
 from dotenv import load_dotenv
 from db import get_connection
 from embedder import embed_image
@@ -37,6 +38,7 @@ def main():
    files = [f for f in os.listdir(PHOTOS_DIR) if f.lower().endswith((".jpg", ".jpeg"))]
    print(f"Found {len(files)} photos in {PHOTOS_DIR}")
    start = time.time()
    for i, filename in enumerate(files, 1):
        filepath = os.path.join(PHOTOS_DIR, filename)
        cur.execute("SELECT 1 FROM images WHERE filename = %s", (filename,))
@@ -50,7 +52,7 @@ def main():
    cur.close()
    conn.close()
-    print("Done.")
+    print(f"Done in {time.time() - start:.1f} seconds.")
 if __name__ == "__main__":
    main()
@@ -29,6 +29,9 @@ def search(q: str = Query(...), limit: int = Query(12)):
        ORDER BY embedding <=> %s::vector
        LIMIT %s
        """,
        # vec appears twice: once for ORDER BY (uses HNSW index), once for the score column.
        # ::vector cast is required — psycopg2 passes the list as text without it.
        # 1 - distance converts cosine distance (0=identical) to similarity (1=identical).
        (vec, vec, limit),
    )
    rows = cur.fetchall()
@@ -106,6 +106,37 @@
      text-overflow: ellipsis;
    }
    .empty { text-align: center; color: #999; margin-top: 3rem; font-size: 1rem; }
    .card img { cursor: pointer; }
    .card img:hover { opacity: 0.85; }
    .lightbox {
      display: none;
      position: fixed;
      inset: 0;
      background: rgba(0,0,0,0.85);
      z-index: 100;
      align-items: center;
      justify-content: center;
      flex-direction: column;
      gap: 0.8rem;
    }
    .lightbox.open { display: flex; }
    .lightbox img {
      max-width: 90vw;
      max-height: 80vh;
      object-fit: contain;
      border-radius: 4px;
      box-shadow: 0 4px 32px rgba(0,0,0,0.6);
    }
    .lightbox-info { color: white; font-size: 0.95rem; text-align: center; }
    .lightbox-info .lb-score { color: #89b4fa; font-weight: 700; }
    .lightbox-close {
      position: fixed;
      top: 1rem; right: 1.2rem;
      color: white; font-size: 2rem;
      cursor: pointer; line-height: 1;
    }
  </style>
 </head>
 <body>
@@ -134,6 +165,14 @@
  <p class="stats" id="stats"></p>
  <div class="grid" id="grid"><p class="empty">Enter a search term above.</p></div>
  <div class="lightbox" id="lightbox" onclick="closeLightbox()">
    <span class="lightbox-close" onclick="closeLightbox()">✕</span>
    <img id="lb-img" src="" alt="" />
    <div class="lightbox-info">
      <span id="lb-name"></span> &nbsp;·&nbsp; <span class="lb-score" id="lb-score"></span>
    </div>
  </div>
  <script>
    const API = "http://localhost:8000";
@@ -166,7 +205,8 @@
      }
      grid.innerHTML = results.map(r => `
        <div class="card">
-          <img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy" />
+          <img src="${API}/photos/${encodeURIComponent(r.filename)}" alt="${r.filename}" loading="lazy"
               onclick="openLightbox('${encodeURIComponent(r.filename)}','${r.filename}','${(r.score*100).toFixed(1)}%')" />
          <div class="card-info">
            <div class="score">${(r.score * 100).toFixed(1)}% match</div>
            <div class="name">${r.filename}</div>
@@ -174,6 +214,19 @@
        </div>
      `).join("");
    }
    function openLightbox(encoded, name, score) {
      document.getElementById("lb-img").src = `${API}/photos/${encoded}`;
      document.getElementById("lb-name").textContent = name;
      document.getElementById("lb-score").textContent = score + " match";
      document.getElementById("lightbox").classList.add("open");
    }
    function closeLightbox() {
      document.getElementById("lightbox").classList.remove("open");
    }
    document.addEventListener("keydown", e => { if (e.key === "Escape") closeLightbox(); });
  </script>
 </body>
 </html>
Author	SHA1	Message	Date
dierk	9116533f03	Update README with all recent changes - Project structure: add index_images_indb.py - Architecture: fix schema names (VECTORS_USER/VECTOR), HNSW for all three - Database schemas: separate sections for VECTORS_USER and VECTOR, photo storage differences - Indexing scripts: three-way comparison table, measured avg times (12.1s/12.1s/13.6s) - ORA-24816 workaround documented - Performance comparison: real benchmark numbers, HNSW for in-DB, photo storage row - Oracle in-DB section: HNSW index creation, index_images_indb.py for population - Re-index section: add index_images_indb.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 11:17:27 +02:00
dierk	3ef43019be	Add in-DB indexing script, benchmark results, schema names in presentation - index_images_indb.py: new script indexing via VECTOR_EMBEDDING(CLIP_IMG) using a two-step INSERT+UPDATE to work around ORA-24816 - index_images_oracle.py / index_images.py: add timing output - Presentation: schema names VECTORS_USER/VECTOR in diagram and comparison, ONNX expansion, HNSW index note on slide 11, indexing times updated from 3-run benchmark (avg: PG 12.1s, Ora 12.1s, InDB 13.6s) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 10:42:13 +02:00
dierk	e70d422c69	Document lightbox feature in README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 15:19:37 +02:00
dierk	26ce44e186	Add lightbox to all three frontends — click photo to view full size Click any result image to open it in a dark overlay. Click anywhere or press Escape to close. Score colour matches each frontend's accent colour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 15:15:52 +02:00
dierk	048309da8a	Add present.sh and document LibreOffice --show flag in README present.sh launches the slideshow directly without opening the Impress UI. The script is gitignored as a local convenience helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 14:56:28 +02:00
dierk	1c5e00d8e4	Add targeted comments explaining non-obvious behaviour - embedder.py: lazy model load rationale, RGB conversion, shared vector space - main.py: why vec appears twice, ::vector cast, 1-distance score formula - main_oracle.py: why array.array("f") is required instead of plain list - main_oracle_indb.py: no embedder import — embedding done inside Oracle SQL - index_images_oracle.py: same array.array requirement on indexing path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 14:39:40 +02:00
dierk	70da90c238	Add LibreOffice lock file pattern to .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 14:18:28 +02:00
dierk	c893c92235	Remove redundant index_indb.html — superseded by frontend/indb/index.html Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 14:17:25 +02:00