:Release Notes: - Add ACID-compliant SQLiteStore (WAL mode, FULL sync, FK constraints) - Add AnomalyType enum for normalized anomaly storage - Add legacy data migration script (dry-run, batch, rollback) - Update ChromaStore to delegate indexed queries to SQLite - Add test suite for SQLiteStore (7 tests, all passing) :Detailed Notes: - SQLiteStore: news_items, anomaly_types, news_anomalies tables with indexes - Performance: get_latest/get_top_ranked O(n)→O(log n), get_stats O(n)→O(1) - ChromaDB remains primary vector store; SQLite provides indexed metadata queries :Testing Performed: - python3 -m pytest tests/ -v (112 passed) :QA Notes: - Tests verified by Python QA Engineer subagent :Issues Addressed: - get_latest/get_top_ranked fetched ALL items then sorted in Python - get_stats iterated over ALL items - anomalies_detected stored as comma-joined string (no index) Change-Id: I708808b6e72889869afcf16d4ac274260242007a
61 KiB
Trend-Scout AI: Database Schema Migration Plan
Document Version: 1.0
Date: 2026-03-30
Status: Draft for Review
1. Executive Summary
This document outlines a comprehensive migration plan to address critical performance and data architecture issues in the Trend-Scout AI vector storage layer. The migration transforms a flat, unnormalized ChromaDB collection into a normalized, multi-collection architecture with proper indexing, pagination, and query optimization.
Current Pain Points
| Issue | Current Behavior | Impact |
|---|---|---|
get_latest |
Fetches ALL items via collection.get(), sorts in Python |
O(n) memory + sort |
get_top_ranked |
Fetches ALL items via collection.get(), sorts in Python |
O(n) memory + sort |
get_stats |
Iterates over ALL items to count categories | O(n) iteration |
anomalies_detected |
Stored as comma-joined string "A1,A2,A3" |
Type corruption, no index |
| Single collection | No normalization, mixed data types | No efficient filtering |
| No pagination | No offset-based pagination support | Cannot handle large datasets |
Expected Outcomes
| Metric | Current | Target | Improvement |
|---|---|---|---|
get_latest (1000 items) |
~500ms, O(n) | ~10ms, O(log n) | 50x faster |
get_top_ranked (1000 items) |
~500ms, O(n log n) | ~10ms, O(k log k) | 50x faster |
get_stats (1000 items) |
~200ms | ~1ms | 200x faster |
| Memory usage | O(n) full scan | O(1) or O(k) | ~99% reduction |
2. Target Architecture
2.1 Multi-Collection Design
┌─────────────────────────────────────────────────────────────────────┐
│ ChromaDB Instance │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ news_items │ │ category_index │ │
│ │ (main collection) │ │ (lookup table) │ │
│ ├──────────────────────┤ ├──────────────────────┤ │
│ │ id (uuid5 from url) │◄───┤ category_id │ │
│ │ content_text (vec) │ │ name │ │
│ │ title │ │ item_count │ │
│ │ url │ │ created_at │ │
│ │ source │ │ updated_at │ │
│ │ timestamp │ └──────────────────────┘ │
│ │ relevance_score │ │
│ │ summary_ru │ ┌──────────────────────┐ │
│ │ category_id (FK) │───►│ anomaly_types │ │
│ │ anomalies[] │ │ (normalized) │ │
│ └──────────────────────┘ ├──────────────────────┤ │
│ │ anomaly_id │ │
│ │ name │ │
│ │ description │ │
│ └──────────────────────┘ │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ news_anomalies │ │ stats_cache │ │
│ │ (junction table) │ │ (materialized) │ │
│ ├──────────────────────┤ ├──────────────────────┤ │
│ │ news_id (FK) │◄───┤ key │ │
│ │ anomaly_id (FK) │ │ total_count │ │
│ │ detected_at │ │ category_counts (JSON)│ │
│ └──────────────────────┘ │ source_counts (JSON) │ │
│ │ last_updated │ │
│ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
2.2 SQLite Shadow Database (for FTS and relational queries)
┌─────────────────────────────────────────────────────────────────────┐
│ SQLite Database │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ news_items_rel │ │ categories │ │
│ │ (normalized relay) │ ├──────────────────────┤ │
│ ├──────────────────────┤ │ id (PK) │ │
│ │ id (uuid, PK) │ │ name │ │
│ │ title │ │ item_count │ │
│ │ url (UNIQUE) │ │ embedding_id (FK) │ │
│ │ source │ └──────────────────────┘ │
│ │ timestamp │ │
│ │ relevance_score │ ┌──────────────────────┐ │
│ │ summary_ru │ │ anomalies │ │
│ │ category_id (FK) │ ├──────────────────────┤ │
│ │ content_text │ │ id (PK) │ │
│ └──────────────────────┘ │ name │ │
│ │ description │ │
│ ┌──────────────────────┐ └──────────────────────┘ │
│ │ news_anomalies_rel │ │
│ ├──────────────────────┤ ┌──────────────────────┐ │
│ │ news_id (FK) │ │ stats_snapshot │ │
│ │ anomaly_id (FK) │ ├──────────────────────┤ │
│ │ detected_at │ │ total_count │ │
│ └──────────────────────┘ │ category_json │ │
│ │ last_updated │ │
│ ┌──────────────────────┐ └──────────────────────┘ │
│ │ news_fts │ │
│ │ (FTS5 virtual) │ │
│ ├──────────────────────┤ ┌──────────────────────┐ │
│ │ news_id (FK) │ │ crawl_history │ │
│ │ title_tokens │ ├──────────────────────┤ │
│ │ content_tokens │ │ id (PK) │ │
│ │ summary_tokens │ │ crawler_name │ │
│ └──────────────────────┘ │ items_fetched │ │
│ │ items_new │ │
│ │ started_at │ │
│ │ completed_at │ │
│ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
2.3 New DTOs
# src/processor/dto.py (evolved)
from typing import List, Optional
from pydantic import BaseModel, Field
from datetime import datetime
from enum import Enum
class AnomalyType(str, Enum):
"""Normalized anomaly types - not a comma-joined string."""
WEBGPU = "WebGPU"
NPU_ACCELERATION = "NPU acceleration"
EDGE_AI = "Edge AI"
# ... extensible
class EnrichedNewsItemDTO(BaseModel):
"""Extended DTO with proper normalized relationships."""
title: str
url: str
content_text: str
source: str
timestamp: datetime
relevance_score: int = Field(ge=0, le=10)
summary_ru: str
category: str # Category name, resolved from category_id
anomalies_detected: List[AnomalyType] # Now a proper list
anomaly_ids: Optional[List[str]] = None # Internal use
class CategoryStats(BaseModel):
"""Statistics per category."""
category: str
count: int
avg_relevance: float
class StorageStats(BaseModel):
"""Complete storage statistics."""
total_count: int
categories: List[CategoryStats]
sources: Dict[str, int]
anomaly_types: Dict[str, int]
last_updated: datetime
</code>
### 2.4 Evolved Interface Design
```python
# src/storage/base.py
from abc import ABC, abstractmethod
from typing import List, Optional, AsyncIterator, Dict, Any
from datetime import datetime
from src.processor.dto import EnrichedNewsItemDTO, StorageStats, CategoryStats
class IStoreCommand(ABC):
"""Write operations - SRP: only handles writes."""
@abstractmethod
async def store(self, item: EnrichedNewsItemDTO) -> str:
"""Store an item. Returns the generated ID."""
pass
@abstractmethod
async def store_batch(self, items: List[EnrichedNewsItemDTO]) -> List[str]:
"""Batch store items. Returns generated IDs."""
pass
@abstractmethod
async def update(self, item_id: str, item: EnrichedNewsItemDTO) -> None:
"""Update an existing item."""
pass
@abstractmethod
async def delete(self, item_id: str) -> None:
"""Delete an item by ID."""
pass
class IStoreQuery(ABC):
"""Read operations - SRP: only handles queries."""
@abstractmethod
async def get_by_id(self, item_id: str) -> Optional[EnrichedNewsItemDTO]:
pass
@abstractmethod
async def exists(self, url: str) -> bool:
pass
@abstractmethod
async def get_latest(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0
) -> List[EnrichedNewsItemDTO]:
"""Paginated retrieval by timestamp. Uses index, not full scan."""
pass
@abstractmethod
async def get_top_ranked(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0
) -> List[EnrichedNewsItemDTO]:
"""Paginated retrieval by relevance_score. Uses index, not full scan."""
pass
@abstractmethod
async def get_stats(self, use_cache: bool = True) -> StorageStats:
"""Returns cached stats or computes on-demand."""
pass
@abstractmethod
async def search_hybrid(
self,
query: str,
limit: int = 5,
category: Optional[str] = None,
threshold: Optional[float] = None
) -> List[EnrichedNewsItemDTO]:
"""Keyword + semantic hybrid search with RRF fusion."""
pass
@abstractmethod
async def search_stream(
self,
query: str,
limit: int = 5,
category: Optional[str] = None
) -> AsyncIterator[EnrichedNewsItemDTO]:
"""Streaming search for large result sets."""
pass
class IVectorStore(IStoreCommand, IStoreQuery):
"""Combined interface preserving backward compatibility."""
pass
class IAdminOperations(ABC):
"""Administrative operations - separate from core CRUD."""
@abstractmethod
async def rebuild_stats_cache(self) -> StorageStats:
"""Force rebuild of statistics cache."""
pass
@abstractmethod
async def vacuum(self) -> None:
"""Optimize storage."""
pass
@abstractmethod
async def get_health(self) -> Dict[str, Any]:
"""Health check for monitoring."""
pass
3. Migration Strategy
3.1 Phase 0: Preparation (Week 1)
Objective: Establish infrastructure for zero-downtime migration.
3.1.1 Create Feature Flags System
# src/config/feature_flags.py
from enum import Flag, auto
class FeatureFlags(Flag):
"""Feature flags for phased migration."""
NORMALIZED_STORAGE = auto()
SQLITE_SHADOW_DB = auto()
FTS_ENABLED = auto()
STATS_CACHE = auto()
PAGINATION = auto()
HYBRID_SEARCH = auto()
# Global config
FEATURE_FLAGS = FeatureFlags.NORMALIZED_STORAGE | FeatureFlags.STATS_CACHE
def is_enabled(flag: FeatureFlags) -> bool:
return flag in FEATURE_FLAGS
3.1.2 Implement Shadow Write (Dual-Write Pattern)
# src/storage/dual_writer.py
class DualWriter:
"""
Writes to both legacy and new storage simultaneously.
Enables gradual migration with no data loss.
"""
def __init__(
self,
legacy_store: ChromaStore,
normalized_store: 'NormalizedChromaStore',
sqlite_store: 'SQLiteStore'
):
self.legacy = legacy_store
self.normalized = normalized_store
self.sqlite = sqlite_store
async def store(self, item: EnrichedNewsItemDTO) -> str:
# Write to both systems
legacy_id = await self.legacy.store(item)
normalized_id = await self.normalized.store(item)
sqlite_id = await self.sqlite.store_relational(item)
# Return normalized ID as source of truth
return normalized_id
3.1.3 Create Migration Test Suite
# tests/migrations/test_normalized_storage.py
import pytest
from datetime import datetime, timezone
from src.processor.dto import EnrichedNewsItemDTO
from src.storage.normalized_chroma_store import NormalizedChromaStore
@pytest.mark.asyncio
async def test_migration_preserves_data_integrity():
"""
Test that normalized storage preserves all data from legacy format.
Round-trip: Legacy DTO -> Normalized Storage -> Legacy DTO
"""
original = EnrichedNewsItemDTO(
title="Test",
url="https://example.com/test",
content_text="Content",
source="TestSource",
timestamp=datetime.now(timezone.utc),
relevance_score=8,
summary_ru="Резюме",
anomalies_detected=["WebGPU", "Edge AI"],
category="Tech"
)
store = NormalizedChromaStore(...)
item_id = await store.store(original)
retrieved = await store.get_by_id(item_id)
assert retrieved.title == original.title
assert retrieved.url == original.url
assert retrieved.relevance_score == original.relevance_score
assert retrieved.category == original.category
assert set(retrieved.anomalies_detected) == set(original.anomalies_detected)
@pytest.mark.asyncio
async def test_anomaly_normalization():
"""
Test that anomalies are properly normalized to enum values.
"""
dto = EnrichedNewsItemDTO(
...,
anomalies_detected=["webgpu", "NPU acceleration", "Edge AI"]
)
store = NormalizedChromaStore(...)
item_id = await store.store(dto)
retrieved = await store.get_by_id(item_id)
# Should be normalized to canonical forms
assert AnomalyType.WEBGPU in retrieved.anomalies_detected
assert AnomalyType.NPU_ACCELERATION in retrieved.anomalies_detected
assert AnomalyType.EDGE_AI in retrieved.anomalies_detected
3.2 Phase 1: Normalized Storage (Week 2)
Objective: Introduce normalized anomaly storage without breaking existing queries.
3.2.1 New Normalized ChromaStore
# src/storage/normalized_chroma_store.py
class NormalizedChromaStore(IVectorStore):
"""
ChromaDB storage with normalized anomaly types.
Maintains backward compatibility via IVectorStore interface.
"""
# Collections
NEWS_COLLECTION = "news_items"
ANOMALY_COLLECTION = "anomaly_types"
JUNCTION_COLLECTION = "news_anomalies"
async def store(self, item: EnrichedNewsItemDTO) -> str:
"""Store with normalized anomaly handling."""
anomaly_ids = await self._ensure_anomalies(item.anomalies_detected)
# Store main item
doc_id = str(uuid.uuid5(uuid.NAMESPACE_URL, item.url))
metadata = {
"title": item.title,
"url": item.url,
"source": item.source,
"timestamp": item.timestamp.isoformat(),
"relevance_score": item.relevance_score,
"summary_ru": item.summary_ru,
"category": item.category,
# NO MORE COMMA-JOINED STRING
}
# Store junction records for anomalies
for anomaly_id in anomaly_ids:
await self._store_junction(doc_id, anomaly_id)
# Vector storage (without anomalies in metadata)
await asyncio.to_thread(
self.collection.upsert,
ids=[doc_id],
documents=[item.content_text],
metadatas=[metadata]
)
return doc_id
async def _ensure_anomalies(
self,
anomalies: List[str]
) -> List[str]:
"""Ensure anomaly types exist, return IDs."""
anomaly_ids = []
for anomaly in anomalies:
# Normalize to canonical form
normalized = AnomalyType.from_string(anomaly)
anomaly_id = str(uuid.uuid5(uuid.NAMESPACE_URL, normalized.value))
# Upsert anomaly type
await asyncio.to_thread(
self.anomaly_collection.upsert,
ids=[anomaly_id],
documents=[normalized.value],
metadatas=[{
"name": normalized.value,
"description": normalized.description
}]
)
anomaly_ids.append(anomaly_id)
return anomaly_ids
3.2.2 AnomalyType Enum
# src/processor/anomaly_types.py
from enum import Enum
class AnomalyType(str, Enum):
"""Canonical anomaly types with metadata."""
WEBGPU = "WebGPU"
NPU_ACCELERATION = "NPU acceleration"
EDGE_AI = "Edge AI"
QUANTUM_COMPUTING = "Quantum computing"
NEUROMORPHIC = "Neuromorphic computing"
SPATIAL_COMPUTING = "Spatial computing"
UNKNOWN = "Unknown"
@classmethod
def from_string(cls, value: str) -> "AnomalyType":
"""Fuzzy matching from raw string."""
normalized = value.strip().lower()
for member in cls:
if member.value.lower() == normalized:
return member
# Try partial match
for member in cls:
if normalized in member.value.lower():
return member
return cls.UNKNOWN
@property
def description(self) -> str:
"""Human-readable description."""
descriptions = {
"WebGPU": "WebGPU graphics API or GPU compute",
"NPU acceleration": "Neural Processing Unit hardware",
"Edge AI": "Edge computing with AI",
# ...
}
return descriptions.get(self.value, "")
3.3 Phase 2: Indexed Queries (Week 3)
Objective: Eliminate full-collection scans for get_latest, get_top_ranked, get_stats.
3.3.1 SQLite Shadow Database
# src/storage/sqlite_store.py
import sqlite3
from pathlib import Path
from typing import List, Optional, Dict, Any
from contextlib import asynccontextmanager
from datetime import datetime
import json
class SQLiteStore:
"""
SQLite shadow database for relational queries and FTS.
Maintains sync with ChromaDB news_items collection.
"""
def __init__(self, db_path: Path):
self.db_path = db_path
self._init_schema()
def _init_schema(self):
"""Initialize SQLite schema with proper indexes."""
with self._get_connection() as conn:
conn.executescript("""
-- Main relational table
CREATE TABLE IF NOT EXISTS news_items (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
url TEXT UNIQUE NOT NULL,
source TEXT NOT NULL,
timestamp INTEGER NOT NULL, -- Unix epoch
relevance_score INTEGER NOT NULL,
summary_ru TEXT,
category TEXT NOT NULL,
content_text TEXT,
created_at INTEGER DEFAULT (unixepoch())
);
-- Indexes for fast queries
CREATE INDEX IF NOT EXISTS idx_news_timestamp
ON news_items(timestamp DESC);
CREATE INDEX IF NOT EXISTS idx_news_relevance
ON news_items(relevance_score DESC);
CREATE INDEX IF NOT EXISTS idx_news_category
ON news_items(category);
CREATE INDEX IF NOT EXISTS idx_news_source
ON news_items(source);
-- FTS5 virtual table for full-text search
CREATE VIRTUAL TABLE IF NOT EXISTS news_fts USING fts5(
id,
title,
content_text,
summary_ru,
content='news_items',
content_rowid='rowid'
);
-- Triggers to keep FTS in sync
CREATE TRIGGER IF NOT EXISTS news_fts_insert AFTER INSERT ON news_items
BEGIN
INSERT INTO news_fts(rowid, id, title, content_text, summary_ru)
VALUES (NEW.rowid, NEW.id, NEW.title, NEW.content_text, NEW.summary_ru);
END;
CREATE TRIGGER IF NOT EXISTS news_fts_delete AFTER DELETE ON news_items
BEGIN
INSERT INTO news_fts(news_fts, rowid, id, title, content_text, summary_ru)
VALUES ('delete', OLD.rowid, OLD.id, OLD.title, OLD.content_text, OLD.summary_ru);
END;
CREATE TRIGGER IF NOT EXISTS news_fts_update AFTER UPDATE ON news_items
BEGIN
INSERT INTO news_fts(news_fts, rowid, id, title, content_text, summary_ru)
VALUES ('delete', OLD.rowid, OLD.id, OLD.title, OLD.content_text, OLD.summary_ru);
INSERT INTO news_fts(rowid, id, title, content_text, summary_ru)
VALUES (NEW.rowid, NEW.id, NEW.title, NEW.content_text, NEW.summary_ru);
END;
-- Stats cache table
CREATE TABLE IF NOT EXISTS stats_cache (
key TEXT PRIMARY KEY DEFAULT 'main',
total_count INTEGER DEFAULT 0,
category_counts TEXT DEFAULT '{}', -- JSON
source_counts TEXT DEFAULT '{}', -- JSON
anomaly_counts TEXT DEFAULT '{}', -- JSON
last_updated INTEGER
);
-- Anomaly types table
CREATE TABLE IF NOT EXISTS anomaly_types (
id TEXT PRIMARY KEY,
name TEXT UNIQUE NOT NULL,
description TEXT
);
-- Junction table for news-anomaly relationships
CREATE TABLE IF NOT EXISTS news_anomalies (
news_id TEXT NOT NULL,
anomaly_id TEXT NOT NULL,
detected_at INTEGER DEFAULT (unixepoch()),
PRIMARY KEY (news_id, anomaly_id),
FOREIGN KEY (news_id) REFERENCES news_items(id),
FOREIGN KEY (anomaly_id) REFERENCES anomaly_types(id)
);
CREATE INDEX IF NOT EXISTS idx_news_anomalies_news
ON news_anomalies(news_id);
CREATE INDEX IF NOT EXISTS idx_news_anomalies_anomaly
ON news_anomalies(anomaly_id);
-- Crawl history for monitoring
CREATE TABLE IF NOT EXISTS crawl_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
crawler_name TEXT NOT NULL,
items_fetched INTEGER DEFAULT 0,
items_new INTEGER DEFAULT 0,
started_at INTEGER,
completed_at INTEGER,
error_message TEXT
);
""")
@asynccontextmanager
def _get_connection(self):
"""Async context manager for SQLite connections."""
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
try:
yield conn
finally:
conn.close()
# --- Fast Indexed Queries ---
async def get_latest_indexed(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0
) -> List[Dict[str, Any]]:
"""Get latest items using SQLite index - O(log n + k)."""
with self._get_connection() as conn:
if category:
cursor = conn.execute("""
SELECT * FROM news_items
WHERE category = ?
ORDER BY timestamp DESC
LIMIT ? OFFSET ?
""", (category, limit, offset))
else:
cursor = conn.execute("""
SELECT * FROM news_items
ORDER BY timestamp DESC
LIMIT ? OFFSET ?
""", (limit, offset))
return [dict(row) for row in cursor.fetchall()]
async def get_top_ranked_indexed(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0
) -> List[Dict[str, Any]]:
"""Get top ranked items using SQLite index - O(log n + k)."""
with self._get_connection() as conn:
if category:
cursor = conn.execute("""
SELECT * FROM news_items
WHERE category = ?
ORDER BY relevance_score DESC, timestamp DESC
LIMIT ? OFFSET ?
""", (category, limit, offset))
else:
cursor = conn.execute("""
SELECT * FROM news_items
ORDER BY relevance_score DESC, timestamp DESC
LIMIT ? OFFSET ?
""", (limit, offset))
return [dict(row) for row in cursor.fetchall()]
async def get_stats_fast(self) -> Dict[str, Any]:
"""
Get stats from cache or compute incrementally.
O(1) if cached, O(1) for reads from indexed tables.
"""
with self._get_connection() as conn:
# Try cache first
cursor = conn.execute("SELECT * FROM stats_cache WHERE key = 'main'")
row = cursor.fetchone()
if row:
return {
"total_count": row["total_count"],
"category_counts": json.loads(row["category_counts"]),
"source_counts": json.loads(row["source_counts"]),
"anomaly_counts": json.loads(row["anomaly_counts"]),
"last_updated": datetime.fromtimestamp(row["last_updated"])
}
# Fall back to indexed aggregation (still O(n) but faster than Python)
cursor = conn.execute("""
SELECT
COUNT(*) as total_count,
category,
COUNT(*) as cat_count
FROM news_items
GROUP BY category
""")
category_counts = {row["category"]: row["cat_count"] for row in cursor.fetchall()}
cursor = conn.execute("""
SELECT source, COUNT(*) as count
FROM news_items
GROUP BY source
""")
source_counts = {row["source"]: row["count"] for row in cursor.fetchall()}
cursor = conn.execute("""
SELECT a.name, COUNT(*) as count
FROM news_anomalies na
JOIN anomaly_types a ON na.anomaly_id = a.id
GROUP BY a.name
""")
anomaly_counts = {row["name"]: row["count"] for row in cursor.fetchall()}
return {
"total_count": sum(category_counts.values()),
"category_counts": category_counts,
"source_counts": source_counts,
"anomaly_counts": anomaly_counts,
"last_updated": datetime.now()
}
async def search_fts(
self,
query: str,
limit: int = 10,
category: Optional[str] = None
) -> List[str]:
"""Full-text search using SQLite FTS5 - O(log n)."""
with self._get_connection() as conn:
# Escape FTS5 special characters
fts_query = query.replace('"', '""')
if category:
cursor = conn.execute("""
SELECT n.id FROM news_items n
JOIN news_fts fts ON n.id = fts.id
WHERE news_fts MATCH ?
AND n.category = ?
LIMIT ?
""", (f'"{fts_query}"', category, limit))
else:
cursor = conn.execute("""
SELECT id FROM news_fts
WHERE news_fts MATCH ?
LIMIT ?
""", (f'"{fts_query}"', limit))
return [row["id"] for row in cursor.fetchall()]
3.3.2 Update ChromaStore to Use SQLite Index
# src/storage/chroma_store.py (updated)
class ChromaStore(IVectorStore):
"""
Updated ChromaStore that delegates indexed queries to SQLite.
Maintains backward compatibility.
"""
def __init__(
self,
client: ClientAPI,
collection_name: str = "news_collection",
sqlite_store: Optional[SQLiteStore] = None
):
self.client = client
self.collection = self.client.get_or_create_collection(name=collection_name)
self.sqlite = sqlite_store # New: SQLite shadow for indexed queries
async def get_latest(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0 # NEW: Pagination support
) -> List[EnrichedNewsItemDTO]:
"""Get latest using SQLite index - O(log n + k)."""
if self.sqlite and offset == 0: # Only use fast path for simple queries
rows = await self.sqlite.get_latest_indexed(limit, category)
# Reconstruct DTOs from SQLite rows
return [self._row_to_dto(row) for row in rows]
# Fallback to legacy behavior with pagination
return await self._get_latest_legacy(limit, category, offset)
async def _get_latest_legacy(
self,
limit: int,
category: Optional[str],
offset: int
) -> List[EnrichedNewsItemDTO]:
"""Legacy implementation with pagination."""
where: Any = {"category": category} if category else None
results = await asyncio.to_thread(
self.collection.get,
include=["metadatas", "documents"],
where=where
)
# ... existing sorting logic ...
# Apply offset/limit after sorting
return items[offset:offset + limit]
async def get_stats(self) -> Dict[str, Any]:
"""Get stats - O(1) from cache."""
if self.sqlite:
return await self.sqlite.get_stats_fast()
return await self._get_stats_legacy()
3.4 Phase 3: Hybrid Search with RRF (Week 4)
Objective: Implement hybrid keyword + semantic search using RRF fusion.
# src/storage/hybrid_search.py
from typing import List, Tuple
from src.processor.dto import EnrichedNewsItemDTO
class HybridSearchStrategy:
"""
Reciprocal Rank Fusion for combining keyword and semantic search.
Implements RRF: Score = Σ 1/(k + rank_i)
"""
def __init__(self, k: int = 60):
self.k = k # RRF smoothing factor
def fuse(
self,
keyword_results: List[Tuple[str, float]], # (id, score)
semantic_results: List[Tuple[str, float]], # (id, score)
) -> List[Tuple[str, float]]:
"""
Fuse results using RRF.
Higher fused score = better.
"""
fused_scores: Dict[str, float] = {}
# Keyword results get higher initial weight
for rank, (doc_id, score) in enumerate(keyword_results):
rrf_score = 1 / (self.k + rank)
fused_scores[doc_id] = fused_scores.get(doc_id, 0) + rrf_score * 2.0
# Semantic results
for rank, (doc_id, score) in enumerate(semantic_results):
rrf_score = 1 / (self.k + rank)
fused_scores[doc_id] = fused_scores.get(doc_id, 0) + rrf_score * 1.0
# Sort by fused score descending
sorted_results = sorted(
fused_scores.items(),
key=lambda x: x[1],
reverse=True
)
return sorted_results
async def search(
self,
query: str,
limit: int = 5,
category: Optional[str] = None,
threshold: Optional[float] = None
) -> List[EnrichedNewsItemDTO]:
"""
Execute hybrid search:
1. SQLite FTS for exact keyword matches
2. ChromaDB for semantic similarity
3. RRF fusion
"""
# Phase 1: Keyword search (FTS)
fts_ids = await self.sqlite.search_fts(query, limit * 2, category)
# Phase 2: Semantic search
semantic_results = await self.chroma.search(
query, limit=limit * 2, category=category, threshold=threshold
)
# Phase 3: RRF Fusion
keyword_scores = [(id, 1.0) for id in fts_ids] # FTS doesn't give scores
semantic_scores = [
(self._get_doc_id(item), self._compute_adaptive_score(item))
for item in semantic_results
]
fused = self.fuse(keyword_scores, semantic_scores)
# Retrieve full DTOs in fused order
final_results = []
for doc_id, _ in fused[:limit]:
dto = await self.chroma.get_by_id(doc_id)
if dto:
final_results.append(dto)
return final_results
def _compute_adaptive_score(self, item: EnrichedNewsItemDTO) -> float:
"""
Compute adaptive score combining relevance and recency.
More recent items with same relevance get slight boost.
"""
# Normalize relevance to 0-1
relevance_norm = item.relevance_score / 10.0
# Days since publication (exponential decay)
days_old = (datetime.now() - item.timestamp).days
recency_decay = 0.95 ** days_old
return relevance_norm * 0.7 + recency_decay * 0.3
3.5 Phase 4: Stats Cache with Invalidation (Week 5)
Objective: Eliminate O(n) stats computation via incremental updates.
# src/storage/stats_cache.py
from dataclasses import dataclass, field
from datetime import datetime
from typing import Dict, List
import asyncio
import json
@dataclass
class StatsCache:
"""Thread-safe statistics cache with incremental updates."""
total_count: int = 0
category_counts: Dict[str, int] = field(default_factory=dict)
source_counts: Dict[str, int] = field(default_factory=dict)
anomaly_counts: Dict[str, int] = field(default_factory=dict)
last_updated: datetime = field(default_factory=datetime.now)
_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
async def invalidate(self):
"""Invalidate cache - marks for recomputation."""
async with self._lock:
self.total_count = -1 # Sentinel value
self.category_counts = {}
self.source_counts = {}
self.anomaly_counts = {}
async def apply_delta(self, delta: "StatsDelta"):
"""
Apply incremental update without full recomputation.
O(1) operation.
"""
async with self._lock:
self.total_count += delta.total_delta
for cat, count in delta.category_deltas.items():
self.category_counts[cat] = self.category_counts.get(cat, 0) + count
for source, count in delta.source_deltas.items():
self.source_counts[source] = self.source_counts.get(source, 0) + count
for anomaly, count in delta.anomaly_deltas.items():
self.anomaly_counts[anomaly] = self.anomaly_counts.get(anomaly, 0) + count
self.last_updated = datetime.now()
def is_valid(self) -> bool:
"""Check if cache is populated."""
return self.total_count >= 0
@dataclass
class StatsDelta:
"""Delta for incremental stats update."""
total_delta: int = 0
category_deltas: Dict[str, int] = field(default_factory=dict)
source_deltas: Dict[str, int] = field(default_factory=dict)
anomaly_deltas: Dict[str, int] = field(default_factory=dict)
@classmethod
def from_item_add(cls, item: EnrichedNewsItemDTO) -> "StatsDelta":
"""Create delta for adding an item."""
return cls(
total_delta=1,
category_deltas={item.category: 1},
source_deltas={item.source: 1},
anomaly_deltas={a.value: 1 for a in item.anomalies_detected}
)
# Integration with storage
class CachedVectorStore(IVectorStore):
"""VectorStore with incremental stats cache."""
def __init__(self, ...):
self.stats_cache = StatsCache()
# ...
async def store(self, item: EnrichedNewsItemDTO) -> str:
# Store item
item_id = await self._store_impl(item)
# Update cache incrementally
delta = StatsDelta.from_item_add(item)
await self.stats_cache.apply_delta(delta)
return item_id
async def get_stats(self) -> Dict[str, Any]:
"""Get stats - O(1) from cache if valid."""
if self.stats_cache.is_valid():
return {
"total_count": self.stats_cache.total_count,
"category_counts": self.stats_cache.category_counts,
"source_counts": self.stats_cache.source_counts,
"anomaly_counts": self.stats_cache.anomaly_counts,
"last_updated": self.stats_cache.last_updated
}
# Fall back to full recomputation (still via SQLite)
return await self._recompute_stats()
4. Backward Compatibility Strategy
4.1 Interface Compatibility
The IVectorStore interface remains unchanged. New methods have default implementations:
class IVectorStore(IStoreCommand, IStoreQuery):
"""Backward-compatible interface."""
async def get_latest(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0 # NEW: Optional pagination
) -> List[EnrichedNewsItemDTO]:
"""
Default implementation maintains legacy behavior.
Override in implementation for optimized path.
"""
return await self._default_get_latest(limit, category)
async def get_top_ranked(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0 # NEW: Optional pagination
) -> List[EnrichedNewsItemDTO]:
"""Default implementation maintains legacy behavior."""
return await self._default_get_top_ranked(limit, category)
4.2 DTO Compatibility Layer
# src/storage/compat.py
class DTOConverter:
"""
Converts between legacy and normalized DTO formats.
Ensures smooth transition.
"""
@staticmethod
def normalize_anomalies(
anomalies: Union[List[str], str]
) -> List[AnomalyType]:
"""
Handle both legacy comma-joined strings and new list format.
"""
if isinstance(anomalies, str):
# Legacy format: "WebGPU,NPU acceleration"
return [AnomalyType.from_string(a) for a in anomalies.split(",") if a]
return [AnomalyType.from_string(a) if isinstance(a, str) else a for a in anomalies]
@staticmethod
def to_legacy_dict(dto: EnrichedNewsItemDTO) -> Dict[str, Any]:
"""
Convert normalized DTO to legacy format for API compatibility.
"""
return {
**dto.model_dump(),
# Legacy comma-joined format for API consumers
"anomalies_detected": ",".join(a.value if isinstance(a, AnomalyType) else str(a)
for a in dto.anomalies_detected)
}
4.3 Dual-Mode Storage
# src/storage/adaptive_store.py
class AdaptiveVectorStore(IVectorStore):
"""
Storage adapter that switches between legacy and normalized
based on feature flags.
"""
def __init__(
self,
legacy_store: ChromaStore,
normalized_store: NormalizedChromaStore,
feature_flags: FeatureFlags
):
self.legacy = legacy_store
self.normalized = normalized_store
self.flags = feature_flags
async def get_latest(
self,
limit: int = 10,
category: Optional[str] = None,
offset: int = 0
) -> List[EnrichedNewsItemDTO]:
if is_enabled(FeatureFlags.NORMALIZED_STORAGE):
return await self.normalized.get_latest(limit, category, offset)
return await self.legacy.get_latest(limit, category)
# Similar delegation for all methods...
5. Risk Mitigation
5.1 Rollback Plan
┌─────────────────────────────────────────────────────────────────┐
│ ROLLBACK DECISION TREE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1 Complete │
│ ├── Feature Flag: NORMALIZED_STORAGE = ON │
│ ├── Dual-write active │
│ └── Can rollback: YES (disable flag, use legacy) │
│ │
│ Phase 2 Complete │
│ ├── SQLite indexes active │
│ ├── ChromaDB still primary │
│ └── Can rollback: YES (disable flag, rebuild legacy indexes) │
│ │
│ Phase 3 Complete │
│ ├── Hybrid search active │
│ ├── FTS data populated │
│ └── Can rollback: YES (disable flag, purge FTS triggers) │
│ │
│ Phase 4 Complete │
│ ├── Stats cache active │
│ └── Can rollback: YES (invalidate cache, use legacy) │
│ │
│ ⚠️ CRITICAL: After Phase 4, data divergence makes │
│ clean rollback impossible. Requires data sync tool. │
│ │
└─────────────────────────────────────────────────────────────────┘
5.2 Data Validation Strategy
# tests/migration/test_data_integrity.py
import pytest
from datetime import datetime, timezone
from src.processor.dto import EnrichedNewsItemDTO
from src.storage.validators import DataIntegrityValidator
class TestDataIntegrity:
"""
Comprehensive data integrity tests for migration.
Run after each phase to validate.
"""
@pytest.fixture
def validator(self):
return DataIntegrityValidator()
@pytest.mark.asyncio
async def test_anomaly_roundtrip(self, validator):
"""Test anomaly normalization roundtrip."""
original_anomalies = ["WebGPU", "NPU acceleration", "Edge AI"]
dto = EnrichedNewsItemDTO(
...,
anomalies_detected=original_anomalies
)
# Store
item_id = await storage.store(dto)
# Retrieve
retrieved = await storage.get_by_id(item_id)
# Validate normalized form
validator.assert_anomalies_match(
retrieved.anomalies_detected,
original_anomalies
)
@pytest.mark.asyncio
async def test_legacy_compat_mode(self, validator):
"""
Test that legacy comma-joined strings still work.
Simulates old client sending comma-joined anomalies.
"""
# Simulate legacy input
legacy_metadata = {
"title": "Test",
"url": "https://example.com/legacy",
"content_text": "Content",
"source": "LegacySource",
"timestamp": "2024-01-01T00:00:00",
"relevance_score": 5,
"summary_ru": "Резюме",
"category": "Tech",
"anomalies_detected": "WebGPU,Edge AI" # Legacy format
}
# Should be automatically normalized
dto = DTOConverter.legacy_metadata_to_dto(legacy_metadata)
validator.assert_valid_dto(dto)
@pytest.mark.asyncio
async def test_stats_consistency(self, validator):
"""
Test that stats from cache match actual data.
Validates incremental update correctness.
"""
# Add items
await storage.store(item1)
await storage.store(item2)
# Get cached stats
cached_stats = await storage.get_stats(use_cache=True)
# Get actual counts
actual_stats = await storage.get_stats(use_cache=False)
validator.assert_stats_match(cached_stats, actual_stats)
@pytest.mark.asyncio
async def test_pagination_consistency(self, validator):
"""
Test that pagination doesn't skip or duplicate items.
"""
items = [create_test_item(i) for i in range(100)]
for item in items:
await storage.store(item)
# Get first page
page1 = await storage.get_latest(limit=10, offset=0)
# Get second page
page2 = await storage.get_latest(limit=10, offset=10)
# Validate no overlap
validator.assert_no_overlap(page1, page2)
# Validate total count
total = await storage.get_stats()
assert sum(len(p) for p in [page1, page2]) <= total["total_count"]
5.3 Monitoring and Alerts
# src/monitoring/migration_health.py
from dataclasses import dataclass
from datetime import datetime
from typing import Dict, Any
@dataclass
class MigrationHealthCheck:
"""Health checks for migration progress."""
phase: int
checks: Dict[str, bool]
def is_healthy(self) -> bool:
return all(self.checks.values())
def to_report(self) -> Dict[str, Any]:
return {
"phase": self.phase,
"healthy": self.is_healthy(),
"checks": self.checks,
"timestamp": datetime.now().isoformat()
}
class MigrationMonitor:
"""Monitor migration health and emit alerts."""
async def check_phase1_health(self) -> MigrationHealthCheck:
"""Phase 1 health checks."""
checks = {
"dual_write_active": await self._check_dual_write(),
"anomaly_normalized": await self._check_anomaly_normalization(),
"no_data_loss": await self._check_data_integrity(),
"backward_compat": await self._check_compat_mode(),
}
return MigrationHealthCheck(phase=1, checks=checks)
async def _check_data_integrity(self) -> bool:
"""
Sample 100 items and verify normalized vs legacy match.
"""
legacy_store = ChromaStore(...)
normalized_store = NormalizedChromaStore(...)
sample_ids = await legacy_store._sample_ids(100)
for item_id in sample_ids:
legacy_item = await legacy_store.get_by_id(item_id)
normalized_item = await normalized_store.get_by_id(item_id)
if not self._items_match(legacy_item, normalized_item):
return False
return True
def _items_match(
self,
legacy: EnrichedNewsItemDTO,
normalized: EnrichedNewsItemDTO
) -> bool:
"""Verify legacy and normalized items are semantically equal."""
return (
legacy.title == normalized.title and
legacy.url == normalized.url and
legacy.relevance_score == normalized.relevance_score and
set(normalized.anomalies_detected) == set(legacy.anomalies_detected)
)
6. Performance Targets
6.1 Benchmarks
| Operation | Current | Phase 1 | Phase 2 | Phase 3 | Phase 4 |
|---|---|---|---|---|---|
get_latest (10 items) |
500ms | 100ms | 10ms | 10ms | 10ms |
get_latest (100 items) |
800ms | 200ms | 20ms | 20ms | 20ms |
get_top_ranked (10) |
500ms | 100ms | 10ms | 10ms | 10ms |
get_stats |
200ms | 50ms | 20ms | 20ms | 1ms |
search (keyword) |
50ms | 50ms | 50ms | 10ms | 10ms |
search (semantic) |
100ms | 100ms | 100ms | 50ms | 50ms |
search (hybrid) |
N/A | N/A | N/A | 60ms | 30ms |
store (single) |
10ms | 15ms | 15ms | 15ms | 15ms |
| Memory (1000 items) | O(n) | O(n) | O(1) | O(1) | O(1) |
6.2 Scaling Expectations
┌─────────────────────────────────────────────────────────────────┐
│ PERFORMANCE vs DATASET SIZE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ get_latest (limit=10) │
│ │
│ Current: O(n) │
│ ├─ 100 items: 50ms │
│ ├─ 1,000 items: 500ms │
│ ├─ 10,000 items: 5,000ms (5s) ⚠️ │
│ └─ 100,000 items: 50,000ms (50s) 🚫 │
│ │
│ Optimized (SQLite indexed): O(log n + k) │
│ ├─ 100 items: 5ms │
│ ├─ 1,000 items: 7ms │
│ ├─ 10,000 items: 10ms │
│ ├─ 100,000 items: 15ms │
│ └─ 1,000,000 items: 25ms │
│ │
│ Target: Support 1M+ items with <50ms latency │
│ │
└─────────────────────────────────────────────────────────────────┘
7. Implementation Phases
Phase 0: Infrastructure (Week 1)
Dependencies: None
Owner: Senior Architecture Engineer
- Create feature flags system
- Implement
DualWriterclass - Create migration test suite
- Set up SQLite database initialization
- Create
AnomalyTypeenum - Implement
DTOConverterfor backward compat
Deliverables:
src/config/feature_flags.pysrc/storage/dual_writer.pysrc/processor/anomaly_types.pysrc/storage/sqlite_store.py(schema only)tests/migrations/
Exit Criteria:
- All phase 0 tests pass
- Feature flags system operational
- Dual-write writes to both stores
Phase 1: Normalized Storage (Week 2)
Dependencies: Phase 0
Owner: Data Engineer
- Implement
NormalizedChromaStore - Create anomaly junction collection
- Implement anomaly normalization
- Update
IVectorStoreinterface with new signatures - Create
AdaptiveVectorStorefor feature-flag switching - Write integration tests
Deliverables:
src/storage/normalized_chroma_store.pysrc/storage/adaptive_store.py- Updated
IVectorStoreinterface
Exit Criteria:
- Normalized storage stores/retrieves correctly
- Anomaly enum normalization works
- Dual-write to legacy + normalized active
- All tests pass with both stores
Phase 2: Indexed Queries (Week 3)
Dependencies: Phase 1
Owner: Backend Architect
- Complete SQLite schema with indexes
- Implement FTS5 virtual table and triggers
- Implement
get_latest_indexed()using SQLite - Implement
get_top_ranked_indexed()using SQLite - Update
ChromaStoreto delegate to SQLite - Implement pagination (offset/limit)
Deliverables:
- Complete
SQLiteStoreimplementation ChromaStoreupdated with indexed queries- Pagination support in interface
Exit Criteria:
get_latestuses SQLite index (verify with EXPLAIN)get_top_rankeduses SQLite index- Pagination works correctly
- No full-collection scans in hot path
Phase 3: Hybrid Search (Week 4)
Dependencies: Phase 2
Owner: Backend Architect
- Implement
HybridSearchStrategywith RRF - Integrate SQLite FTS with ChromaDB semantic
- Add adaptive scoring (relevance + recency)
- Implement
search_stream()for large results - Performance test hybrid vs legacy
Deliverables:
src/storage/hybrid_search.py- Updated
ChromaStore.search()
Exit Criteria:
- Hybrid search returns better results than pure semantic
- Keyword matches appear first
- RRF fusion working correctly
- Threshold filtering applies to semantic results
Phase 4: Stats Cache (Week 5)
Dependencies: Phase 3
Owner: Data Engineer
- Implement
StatsCacheclass with incremental updates - Integrate cache with
AdaptiveVectorStore - Implement cache invalidation triggers
- Add cache warming on startup
- Performance test
get_stats()
Deliverables:
src/storage/stats_cache.py- Updated storage layer with cache
Exit Criteria:
get_statsreturns in <5ms (cached)- Cache invalidates correctly on store/delete
- Cache rebuilds correctly on demand
Phase 5: Validation & Cutover (Week 6)
Dependencies: All previous phases
Owner: QA Engineer
- Run full migration test suite
- Performance benchmark comparison
- Load test with simulated traffic
- Validate all Telegram bot commands work
- Generate migration completion report
- Disable legacy code paths (optional)
Deliverables:
- Migration completion report
- Performance benchmark report
- Legacy code removal (if approved)
Exit Criteria:
- All tests pass
- Performance targets met
- No regression in bot functionality
- Stakeholder sign-off
8. Open Questions
8.1 Data Migration
| Question | Impact | Priority |
|---|---|---|
| Should we migrate existing comma-joined anomalies to normalized junction records? | Data integrity | HIGH |
| What is the expected dataset size at migration time? | Performance planning | HIGH |
| Can we have downtime for the migration, or must it be zero-downtime? | Rollback strategy | HIGH |
| Should we keep legacy ChromaDB collection or archive it? | Storage costs | MEDIUM |
8.2 Architecture
| Question | Impact | Priority |
|---|---|---|
| Do we want to eventually migrate away from ChromaDB to a more scalable solution (Qdrant, Weaviate)? | Future planning | MEDIUM |
| Should anomalies be stored in ChromaDB or only in SQLite? | Consistency model | MEDIUM |
| Do we need multi-tenancy support for multiple Telegram channels? | Future feature | LOW |
8.3 Operations
| Question | Impact | Priority |
|---|---|---|
What is the acceptable cache staleness for get_stats? |
Consistency vs performance | HIGH |
| Should we implement TTL for old items? | Storage management | MEDIUM |
| Do we need backup/restore procedures for SQLite? | Disaster recovery | HIGH |
8.4 Testing
| Question | Impact | Priority |
|---|---|---|
| Should we implement chaos testing for dual-write failure modes? | Reliability | MEDIUM |
| What is the acceptable test coverage threshold? | Quality | HIGH |
| Do we need integration tests with real ChromaDB instance? | Confidence | HIGH |
9. Appendix
A. File Structure After Migration
src/
├── config/
│ ├── __init__.py
│ ├── feature_flags.py # NEW
│ └── settings.py
├── storage/
│ ├── __init__.py
│ ├── base.py # UPDATED: Evolved interface
│ ├── chroma_store.py # UPDATED: Delegates to SQLite
│ ├── normalized_chroma_store.py # NEW
│ ├── sqlite_store.py # NEW
│ ├── hybrid_search.py # NEW
│ ├── stats_cache.py # NEW
│ ├── adaptive_store.py # NEW
│ ├── dual_writer.py # NEW
│ ├── compat.py # NEW
│ └── migrations/ # NEW
│ ├── __init__.py
│ ├── v1_normalize_anomalies.py
│ └── v2_add_indexes.py
├── processor/
│ ├── __init__.py
│ ├── dto.py # UPDATED: New DTOs
│ ├── anomaly_types.py # NEW
│ └── ...
├── crawlers/
│ ├── __init__.py
│ └── ...
├── bot/
│ ├── __init__.py
│ ├── handlers.py # UPDATED: Use new pagination
│ └── ...
└── main.py # UPDATED: Initialize new stores
B. ChromaDB Version Consideration
Note: The current implementation uses ChromaDB's
upsertwith metadata. ChromaDB has limitations:
- No native sorting by metadata fields
- No native pagination (offset/limit)
whereclause filtering is limitedThe SQLite shadow database addresses these limitations while maintaining ChromaDB for vector operations.
C. Monitoring Queries for Production
-- Check SQLite index usage
EXPLAIN QUERY PLAN
SELECT * FROM news_items
ORDER BY timestamp DESC
LIMIT 10;
-- Check FTS index
EXPLAIN QUERY PLAN
SELECT * FROM news_fts
WHERE news_fts MATCH 'WebGPU';
-- Check cache freshness
SELECT key, total_count, last_updated
FROM stats_cache;
-- Check anomaly distribution
SELECT a.name, COUNT(*) as count
FROM news_anomalies na
JOIN anomaly_types a ON na.anomaly_id = a.id
GROUP BY a.name
ORDER BY count DESC;
10. Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Senior Architecture Engineer | |||
| Project Manager | |||
| QA Lead | |||
| DevOps Lead |
Document generated for team implementation review.