- Add specialized prompt branch for research papers and SOTA detection - Improve Russian summarization quality for technical abstracts - Update relevance scoring to prioritize NPU/Edge AI breakthroughs - Add README.md with project overview
110 lines
4.1 KiB
Markdown
110 lines
4.1 KiB
Markdown
# Trend-Scout AI
|
|
|
|
**Trend-Scout AI** is an intelligent Telegram bot designed for automated monitoring, analysis, and summarization of technological trends. It was developed to support R&D activities (specifically within the context of LG Electronics R&D Lab in St. Petersburg) by scanning the environment for emerging technologies, competitive benchmarks, and scientific breakthroughs.
|
|
|
|
## 🚀 Key Features
|
|
|
|
- **Automated Multi-Source Crawling:** Monitors RSS feeds, scientific journals (Nature, Science), IT conferences (CES, CVPR), and corporate newsrooms using Playwright and Scrapy.
|
|
- **AI-Powered Analysis:** Utilizes LLMs (via Ollama API) to evaluate the relevance of news articles based on specific R&D landscapes (e.g., WebOS, Chromium, Edge AI).
|
|
- **Russian Summarization:** Automatically generates concise summaries in Russian for quick review.
|
|
- **Anomaly Detection:** Alerts users when there is a significant surge in mentions of specific technologies (e.g., "WebGPU", "NPU acceleration").
|
|
- **Semantic Search:** Employs a vector database (ChromaDB) to allow searching for trends and news by meaning rather than just keywords.
|
|
- **Telegram Interface:** Simple and effective interaction via Telegram for receiving alerts and querying the latest trends.
|
|
|
|
## 🏗 Architecture
|
|
|
|
The project follows a modular, agent-based architecture designed around SOLID principles and asynchronous I/O:
|
|
|
|
1. **Crawler Agent:** Responsible for fetching and parsing data from various sources into standardized DTOs.
|
|
2. **AI Processor Agent:** Enriches data by scoring relevance, summarizing content, and detecting technological anomalies using LLMs.
|
|
3. **Vector Storage Agent:** Manages persistent storage and semantic retrieval using ChromaDB.
|
|
4. **Telegram Bot Agent:** Handles user interaction, command processing (`/start`, `/latest`, `/help`), and notification delivery.
|
|
5. **Orchestrator:** Coordinates the flow between crawling, processing, and storage in periodic background iterations.
|
|
|
|
## 🛠 Tech Stack
|
|
|
|
- **Language:** Python 3.12+
|
|
- **Frameworks:** `aiogram` (Telegram Bot), `playwright` (Web Crawling), `pydantic` (Data Validation)
|
|
- **Database:** `ChromaDB` (Vector Store)
|
|
- **AI/LLM:** `Ollama` (local or cloud models)
|
|
- **Testing:** `pytest`, `pytest-asyncio`
|
|
- **Environment:** Docker-ready, `.env` for configuration
|
|
|
|
## 📋 Prerequisites
|
|
|
|
- Python 3.12 or higher
|
|
- [Ollama](https://ollama.ai/) installed and running (for AI processing)
|
|
- Playwright browsers installed (`playwright install chromium`)
|
|
|
|
## ⚙️ Installation & Setup
|
|
|
|
1. **Clone the repository:**
|
|
```bash
|
|
git clone https://github.com/your-repo/trend-scout-ai.git
|
|
cd trend-scout-ai
|
|
```
|
|
|
|
2. **Create and activate a virtual environment:**
|
|
```bash
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
|
|
3. **Install dependencies:**
|
|
```bash
|
|
pip install -r requirements.txt
|
|
playwright install chromium
|
|
```
|
|
|
|
4. **Configure environment variables:**
|
|
Create a `.env` file in the root directory:
|
|
```env
|
|
TELEGRAM_BOT_TOKEN=your_bot_token_here
|
|
TELEGRAM_CHAT_ID=your_chat_id_here
|
|
OLLAMA_API_URL=http://localhost:11434/api/generate
|
|
CHROMA_DB_PATH=./chroma_db
|
|
```
|
|
|
|
## 🏃 Usage
|
|
|
|
### Start the Bot and Background Crawler
|
|
To run the full system (bot + periodic crawler):
|
|
```bash
|
|
python -m src.main
|
|
```
|
|
|
|
### Run Manual Update
|
|
To trigger a manual crawl and update of the vector store:
|
|
```bash
|
|
python update_chroma_store.py
|
|
```
|
|
|
|
## 🧪 Testing
|
|
|
|
The project maintains a high test coverage following TDD principles.
|
|
|
|
Run all tests:
|
|
```bash
|
|
pytest
|
|
```
|
|
|
|
Run specific test categories:
|
|
```bash
|
|
pytest tests/crawlers/
|
|
pytest tests/processor/
|
|
pytest tests/storage/
|
|
```
|
|
|
|
## 📂 Project Structure
|
|
|
|
- `src/`: Core application logic.
|
|
- `bot/`: Telegram bot handlers and setup.
|
|
- `crawlers/`: Web scraping modules and factory.
|
|
- `processor/`: LLM integration and prompt logic.
|
|
- `storage/`: Vector database operations.
|
|
- `orchestrator/`: Main service coordination.
|
|
- `tests/`: Comprehensive test suite.
|
|
- `docs/`: Architecture Decision Records (ADR) and methodology.
|
|
- `chroma_db/`: Persistent vector storage (local).
|
|
- `requirements.txt`: Python dependencies.
|