Artur Mukhamadiev fbdb7d7806 feat(ai): optimize processor for academic content
- Add specialized prompt branch for research papers and SOTA detection
- Improve Russian summarization quality for technical abstracts
- Update relevance scoring to prioritize NPU/Edge AI breakthroughs
- Add README.md with project overview
2026-03-16 00:11:19 +03:00
2026-03-13 11:48:37 +03:00

Trend-Scout AI

Trend-Scout AI is an intelligent Telegram bot designed for automated monitoring, analysis, and summarization of technological trends. It was developed to support R&D activities (specifically within the context of LG Electronics R&D Lab in St. Petersburg) by scanning the environment for emerging technologies, competitive benchmarks, and scientific breakthroughs.

🚀 Key Features

  • Automated Multi-Source Crawling: Monitors RSS feeds, scientific journals (Nature, Science), IT conferences (CES, CVPR), and corporate newsrooms using Playwright and Scrapy.
  • AI-Powered Analysis: Utilizes LLMs (via Ollama API) to evaluate the relevance of news articles based on specific R&D landscapes (e.g., WebOS, Chromium, Edge AI).
  • Russian Summarization: Automatically generates concise summaries in Russian for quick review.
  • Anomaly Detection: Alerts users when there is a significant surge in mentions of specific technologies (e.g., "WebGPU", "NPU acceleration").
  • Semantic Search: Employs a vector database (ChromaDB) to allow searching for trends and news by meaning rather than just keywords.
  • Telegram Interface: Simple and effective interaction via Telegram for receiving alerts and querying the latest trends.

🏗 Architecture

The project follows a modular, agent-based architecture designed around SOLID principles and asynchronous I/O:

  1. Crawler Agent: Responsible for fetching and parsing data from various sources into standardized DTOs.
  2. AI Processor Agent: Enriches data by scoring relevance, summarizing content, and detecting technological anomalies using LLMs.
  3. Vector Storage Agent: Manages persistent storage and semantic retrieval using ChromaDB.
  4. Telegram Bot Agent: Handles user interaction, command processing (/start, /latest, /help), and notification delivery.
  5. Orchestrator: Coordinates the flow between crawling, processing, and storage in periodic background iterations.

🛠 Tech Stack

  • Language: Python 3.12+
  • Frameworks: aiogram (Telegram Bot), playwright (Web Crawling), pydantic (Data Validation)
  • Database: ChromaDB (Vector Store)
  • AI/LLM: Ollama (local or cloud models)
  • Testing: pytest, pytest-asyncio
  • Environment: Docker-ready, .env for configuration

📋 Prerequisites

  • Python 3.12 or higher
  • Ollama installed and running (for AI processing)
  • Playwright browsers installed (playwright install chromium)

⚙️ Installation & Setup

  1. Clone the repository:

    git clone https://github.com/your-repo/trend-scout-ai.git
    cd trend-scout-ai
    
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    playwright install chromium
    
  4. Configure environment variables: Create a .env file in the root directory:

    TELEGRAM_BOT_TOKEN=your_bot_token_here
    TELEGRAM_CHAT_ID=your_chat_id_here
    OLLAMA_API_URL=http://localhost:11434/api/generate
    CHROMA_DB_PATH=./chroma_db
    

🏃 Usage

Start the Bot and Background Crawler

To run the full system (bot + periodic crawler):

python -m src.main

Run Manual Update

To trigger a manual crawl and update of the vector store:

python update_chroma_store.py

🧪 Testing

The project maintains a high test coverage following TDD principles.

Run all tests:

pytest

Run specific test categories:

pytest tests/crawlers/
pytest tests/processor/
pytest tests/storage/

📂 Project Structure

  • src/: Core application logic.
    • bot/: Telegram bot handlers and setup.
    • crawlers/: Web scraping modules and factory.
    • processor/: LLM integration and prompt logic.
    • storage/: Vector database operations.
    • orchestrator/: Main service coordination.
  • tests/: Comprehensive test suite.
  • docs/: Architecture Decision Records (ADR) and methodology.
  • chroma_db/: Persistent vector storage (local).
  • requirements.txt: Python dependencies.
Description
Yet another web scrapper generated by AI for the needs of R&D Lab
Readme 424 KiB
Languages
Python 100%