Go to file

Artur Mukhamadiev fbdb7d7806 feat(ai): optimize processor for academic content

- Add specialized prompt branch for research papers and SOTA detection
- Improve Russian summarization quality for technical abstracts
- Update relevance scoring to prioritize NPU/Edge AI breakthroughs
- Add README.md with project overview

2026-03-16 00:11:19 +03:00

ai/memory-bank/tasks

feat(storage): implement hybrid search and fix async chroma i/o

2026-03-16 00:11:07 +03:00

docs

feat(storage): implement hybrid search and fix async chroma i/o

2026-03-16 00:11:07 +03:00

src

feat(ai): optimize processor for academic content

2026-03-16 00:11:19 +03:00

tests

feat(ai): optimize processor for academic content

2026-03-16 00:11:19 +03:00

.gitignore

Refactor crawlers configuration and add new sources

2026-03-15 00:45:04 +03:00

AGENTS.md

[ai] mvp generated by gemini

2026-03-13 11:48:37 +03:00

README.md

feat(ai): optimize processor for academic content

2026-03-16 00:11:19 +03:00

requirements.txt

feat(storage): implement hybrid search and fix async chroma i/o

2026-03-16 00:11:07 +03:00

update_chroma_store.py

feat(storage): implement hybrid search and fix async chroma i/o

2026-03-16 00:11:07 +03:00

README.md

Trend-Scout AI

Trend-Scout AI is an intelligent Telegram bot designed for automated monitoring, analysis, and summarization of technological trends. It was developed to support R&D activities (specifically within the context of LG Electronics R&D Lab in St. Petersburg) by scanning the environment for emerging technologies, competitive benchmarks, and scientific breakthroughs.

🚀 Key Features

Automated Multi-Source Crawling: Monitors RSS feeds, scientific journals (Nature, Science), IT conferences (CES, CVPR), and corporate newsrooms using Playwright and Scrapy.
AI-Powered Analysis: Utilizes LLMs (via Ollama API) to evaluate the relevance of news articles based on specific R&D landscapes (e.g., WebOS, Chromium, Edge AI).
Russian Summarization: Automatically generates concise summaries in Russian for quick review.
Anomaly Detection: Alerts users when there is a significant surge in mentions of specific technologies (e.g., "WebGPU", "NPU acceleration").
Semantic Search: Employs a vector database (ChromaDB) to allow searching for trends and news by meaning rather than just keywords.
Telegram Interface: Simple and effective interaction via Telegram for receiving alerts and querying the latest trends.

🏗 Architecture

The project follows a modular, agent-based architecture designed around SOLID principles and asynchronous I/O:

Crawler Agent: Responsible for fetching and parsing data from various sources into standardized DTOs.
AI Processor Agent: Enriches data by scoring relevance, summarizing content, and detecting technological anomalies using LLMs.
Vector Storage Agent: Manages persistent storage and semantic retrieval using ChromaDB.
Telegram Bot Agent: Handles user interaction, command processing (/start, /latest, /help), and notification delivery.
Orchestrator: Coordinates the flow between crawling, processing, and storage in periodic background iterations.

🛠 Tech Stack

Language: Python 3.12+
Frameworks: aiogram (Telegram Bot), playwright (Web Crawling), pydantic (Data Validation)
Database: ChromaDB (Vector Store)
AI/LLM: Ollama (local or cloud models)
Testing: pytest, pytest-asyncio
Environment: Docker-ready, .env for configuration

📋 Prerequisites

Python 3.12 or higher
Ollama installed and running (for AI processing)
Playwright browsers installed (playwright install chromium)

⚙️ Installation & Setup

Clone the repository:

git clone https://github.com/your-repo/trend-scout-ai.git
cd trend-scout-ai

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt
playwright install chromium

Configure environment variables: Create a .env file in the root directory:

TELEGRAM_BOT_TOKEN=your_bot_token_here
TELEGRAM_CHAT_ID=your_chat_id_here
OLLAMA_API_URL=http://localhost:11434/api/generate
CHROMA_DB_PATH=./chroma_db

🏃 Usage

Start the Bot and Background Crawler

To run the full system (bot + periodic crawler):

python -m src.main

Run Manual Update

To trigger a manual crawl and update of the vector store:

python update_chroma_store.py

🧪 Testing

The project maintains a high test coverage following TDD principles.

Run all tests:

pytest

Run specific test categories:

pytest tests/crawlers/
pytest tests/processor/
pytest tests/storage/

📂 Project Structure

src/: Core application logic.
- bot/: Telegram bot handlers and setup.
- crawlers/: Web scraping modules and factory.
- processor/: LLM integration and prompt logic.
- storage/: Vector database operations.
- orchestrator/: Main service coordination.
tests/: Comprehensive test suite.
docs/: Architecture Decision Records (ADR) and methodology.
chroma_db/: Persistent vector storage (local).
requirements.txt: Python dependencies.