AIScrape is a Python tool for scraping and analyzing web content using OpenAI's GPT models. It leverages the GPT-3.5-turbo by default to identify the main content of webpages and extract it efficiently.
The AIScrape
class within the aiscrape.py
file is designed to initiate with an optional model parameter, use an AI model to determine the start and end of meaningful content on a webpage, and print the extracted content. This tool is useful for extracting clean, main content from any given webpage, avoiding navigation and extraneous information typically found in HTML pages.
Before you can use AIScrape, you need to ensure that you have Python installed, along with the requests
and bs4
(BeautifulSoup) libraries. If you haven't already set up an OpenAI API key, you will need to do that as well.
-
Clone this Repository:
git clone https://github.com/brosenberg/aiscrape cd aiscrape
-
Install Dependencies:
pip install -r requirements
-
Set up OpenAI API Key:
- Obtain an API key from OpenAI.
- Export the API key in your shell or add it to your environment variables:
export OPENAI_API_KEY='Your-OpenAI-API-Key-Here'
To run AIScrape, you need to pass a URL as a command-line argument:
python aiscrape.py https://example.com
-
API Key Issues:
- Ensure that your OPENAI_API_KEY is set correctly in your environment variables.
- Check if the API key is active and has the necessary permissions.
-
Connection Errors:
- Verify that the URL is correct and accessible.
- Ensure your network connection is stable and allows HTTP requests to external sites.
-
Dependency Errors:
- Make sure all required Python packages are installed.
- Check for compatibility issues between package versions.