- Add ADR 001 for Hybrid Search Architecture - Implement Phase 1 (Exact Match) and Phase 2 (Semantic Fallback) in ChromaStore - Wrap blocking ChromaDB calls in asyncio.to_thread - Update IVectorStore interface to support category filtering and thresholds - Add comprehensive tests for hybrid search logic
6.2 KiB
6.2 KiB
ADR 001: Architecture Design for Enhanced Semantic & Hybrid Search
1. Context and Problem Statement
The "Trend-Scout AI" bot currently utilizes a basic synchronous implementation of ChromaDB to fulfill both categorical retrieval (/latest) and free-text queries (/search). Two major issues have severely impacted the user experience:
- Incorrect Categories in
/latest: The system performs a dense vector search using the requested category name (e.g., "AI") rather than a deterministic exact match. This returns semantically related news regardless of their actual assigned category, yielding false positives. - Poor Semantic Matches in
/search:- The default English-centric embedding model (e.g.,
all-MiniLM-L6-v2) handles Russian summaries and specialized technical acronyms poorly. - Pure vector search ignores exact keyword matches, leading to frustrated user expectations when searching for specific entities (e.g., "OpenAI o1" or specific version numbers).
- The default English-centric embedding model (e.g.,
- Blocking I/O operations: The
ChromaStoreexecutes blocking synchronous operations withinasync defwrappers, potentially starving theasyncioevent loop and violating asynchronous data flow requirements.
2. Decision Drivers
- Accuracy & Relevance: Strict categorization and high recall for exact keywords + conceptual similarity.
- Multilingual Support: Strong performance on both English source texts and Russian summaries.
- Performance & Concurrency: Fully non-blocking (async) operations.
- Adherence to SOLID: Maintain strict interface boundaries, dependency inversion, and existing Domain Transfer Objects (DTOs).
- Alignment with Agent Architecture: Ensure the Vector Storage Agent focuses strictly on storage/retrieval coordination without leaking AI processing duties.
3. Proposed Architecture
3.1. Asynchronous Data Flow (I/O)
- Decision: Migrate the local ChromaDB calls to run in a thread pool executor. Alternatively, if ChromaDB is hosted as a standalone server, utilize
chromadb.AsyncHttpClient. - Implementation: Encapsulate blocking calls like
self.collection.upsert()andself.collection.query()insideasyncio.to_thread()to prevent blocking the Telegram bot's main event loop.
3.2. Interface Segregation (ISP) for Storage
The current IVectorStore interface conflates generic vector searching, exact categorical retrieval, and database administration.
- Action: Segregate the interfaces to adhere to ISP.
- Refactored Interfaces:
class IStoreCommand(ABC): @abstractmethod async def store(self, item: EnrichedNewsItemDTO) -> None: ... class IStoreQuery(ABC): @abstractmethod async def search_hybrid(self, query: str, limit: int = 5) -> List[EnrichedNewsItemDTO]: ... @abstractmethod async def get_latest_by_category(self, category: Optional[str], limit: int = 10) -> List[EnrichedNewsItemDTO]: ... @abstractmethod async def get_top_ranked(self, limit: int = 10) -> List[EnrichedNewsItemDTO]: ...
3.3. Strict Metadata Filtering for /latest
- Mechanism: The
/latestcommand must completely bypass vector similarity search. Instead, it will use ChromaDB's.get()method coupled with a strictwheremetadata filter:where={"category": {"$eq": category}}. - Sorting Architecture: Because ChromaDB does not natively support sorting results by a metadata field (like
timestamp), theget_latest_by_categorymethod will over-fetch (e.g., fetch up to 100 recent items using the metadata filter) and perform a fast, deterministic in-memory sort bytimestampdescending before slicing to the requestedlimit.
3.4. Hybrid Search Architecture (Keyword + Vector)
- Mechanism: Implement a Hybrid Search Strategy utilizing Reciprocal Rank Fusion (RRF).
- Sparse Retrieval (Keyword): Integrate a lightweight keyword index alongside ChromaDB. Given the bot's scale, SQLite FTS5 (Full-Text Search) is the optimal choice. It provides persistent, fast token matching without the overhead of Elasticsearch.
- Dense Retrieval (Vector): ChromaDB semantic search.
- Fusion Strategy:
- The new
HybridSearchStrategyissues queries to both the SQLite FTS index and ChromaDB concurrently usingasyncio.gather. - The results are normalized using the RRF formula:
Score = 1 / (k + rank_sparse) + 1 / (k + rank_dense)(wherekis typically 60). - The combined list of DTOs is sorted by the fused score and returned.
- The new
3.5. Embedding Model Evaluation & Upgrade
- Decision: Replace the default ChromaDB embedding function with a dedicated, explicitly configured multilingual model.
- Recommendation: Utilize
intfloat/multilingual-e5-small(for lightweight CPU environments) orsentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. Both provide excellent English-Russian cross-lingual semantic alignment. - Integration (DIP): Apply the Dependency Inversion Principle by injecting the embedding function (or an
IEmbeddingProviderinterface) into theChromaStoreconstructor. This allows for seamless A/B testing of embedding models without touching the core storage logic.
4. Application to the Agent Architecture
- Vector Storage Agent (Database): This agent's responsibility shifts from "pure vector storage" to "Hybrid Storage Management." It coordinates the
ChromaStore(Dense) andSQLiteStore(Sparse) implementations. - AI Processor Agent: To maintain Single Responsibility (SRP), embedding generation can be shifted from the storage layer to the AI Processor Agent. The AI Processor generates the vector using an Ollama hosted embedding model and attaches it directly to the
EnrichedNewsItemDTO. The Storage Agent simply stores the pre-calculated vector, drastically reducing the dependency weight of the storage module.
5. Next Steps for Implementation
- Add
sqlite3FTS5 table initialization to the project scaffolding. - Refactor
src/storage/base.pyto segregateIStoreQueryandIStoreCommand. - Update
ChromaStoreto accept pre-calculated embeddings and utilizeasyncio.to_thread. - Implement the RRF sorting algorithm in a new
search_hybridpipeline. - Update
src/bot/handlers.pyto route/latestthroughget_latest_by_category.