Here we develop and share a Web Extraction Suite designed to transform the chaotic web into clean, structured data for AI, Data Analysis, and modern Software development.
- article-extractor: The core engine for turning messy HTML into structured JSON.
- feed-extractor: High-performance logic to parse RSS/Atom/JSON feeds with zero overhead.
- oembed-extractor: Lightweight utility for social media metadata extraction.
Deploy them individually or in combination to power dynamic news platforms, automate content marketing pipelines, or curate high-quality datasets for NLP and AI research.
Have a feature request or encountered an issue? We welcome your feedback! Please open an issue to help us improve the ecosystem.