This project presents a parser specifically designed for extracting news articles from the New York Times. The parser aims to automate various tasks related to news articles retrieval, filtration, and data extraction.
The parser is capable of scraping article details like title, date of publication, description, and more directly from web elements in the New York Times website.
The project includes a UrlConstructor
class that assists in dynamically creating and modifying URLs for searching articles on the New York Times.
The parser has the capability to block and unblock the paywall for a seamless scraping experience.
You can filter out articles based on various criteria like date and keywords, though the date filter is currently under improvement.
This parser also has the ability to retrieve the full text of an article, providing more contextual information beyond headlines and summaries.
The parser can automatically detect mentions of various currencies like USD, EURO, and GBP within the article text.
After parsing, the data can be exported in either CSV or Excel formats for further analysis.