10 Commits

Author SHA1 Message Date
217037f72e feat(crawlers): convert multiple sources from Playwright to Static/RSS
- Added `StaticCrawler` for generic aiohttp+BS4 parsing.
- Added `SkolkovoCrawler` for specialized Next.js parsing of sk.ru.
- Converted ICRA 2025, RSF, CES 2025, and Telegram Addmeto to `static`.
- Converted Horizon Europe to `rss` using its native feed.
- Updated `CrawlerFactory` to support new crawler types.
- Validated changes with unit tests.
2026-03-15 21:21:14 +03:00
a363ca41cf feat(crawlers): implement specialized CppConf crawler and AI analysis
- Added CppConfCrawler using aiohttp and regex to parse Next.js JSON data, skipping the Playwright bottleneck.
- Added C++ specific prompts to OllamaProvider for trend analysis (identifying C++26, memory safety, coroutines).
- Created offline pytest fixtures and TDD unit tests for the parser.
- Created end-to-end pipeline test mapping Crawler -> AI Processor -> Vector DB.
2026-03-15 20:34:39 +03:00
a0eeba0918 Enhance /hottest command with optional limit 2026-03-15 01:34:33 +03:00
9fdb4b35cd Implement 'Top Ranked' feature and expand Habr sources 2026-03-15 01:32:25 +03:00
019d9161de Update crawler selectors and add comprehensive tests 2026-03-15 00:48:27 +03:00
87af585e1b Refactor crawlers configuration and add new sources
- Move hard-coded crawlers from main.py to crawlers.yml
- Use CrawlerFactory to load configuration
- Add 9 new sources: C++ Russia, ICRA 2025, Technoprom, INNOPROM, Hannover Messe, RSF, Skolkovo, Horizon Europe, Addmeto
- Update task list
2026-03-15 00:45:04 +03:00
9c31977e98 [feat] playwright crawler
:Release Notes:
-

:Detailed Notes:
-

:Testing Performed:
-

:QA Notes:
as always AI generated

:Issues Addressed:
-
2026-03-14 20:13:53 +03:00
4bf7cb4331 [perf] stabilization of previous release 2026-03-13 13:23:30 +03:00
9c8e4c7345 [tg] stats/search features
Processed data is not written back to user
2026-03-13 12:50:49 +03:00
5f093075f7 [ai] mvp generated by gemini 2026-03-13 11:48:37 +03:00