Giter Site home page Giter Site logo

aiscrape's Introduction

AIScrape

AIScrape is a Python tool for scraping and analyzing web content using OpenAI's GPT models. It leverages the GPT-3.5-turbo by default to identify the main content of webpages and extract it efficiently.

Description

The AIScrape class within the aiscrape.py file is designed to initiate with an optional model parameter, use an AI model to determine the start and end of meaningful content on a webpage, and print the extracted content. This tool is useful for extracting clean, main content from any given webpage, avoiding navigation and extraneous information typically found in HTML pages.

Installation

Before you can use AIScrape, you need to ensure that you have Python installed, along with the requests and bs4 (BeautifulSoup) libraries. If you haven't already set up an OpenAI API key, you will need to do that as well.

  1. Clone this Repository:

    git clone https://github.com/brosenberg/aiscrape
    cd aiscrape
  2. Install Dependencies:

    pip install -r requirements
  3. Set up OpenAI API Key:

    • Obtain an API key from OpenAI.
    • Export the API key in your shell or add it to your environment variables:
      export OPENAI_API_KEY='Your-OpenAI-API-Key-Here'

Usage

To run AIScrape, you need to pass a URL as a command-line argument:

python aiscrape.py https://example.com

Troubleshooting

  1. API Key Issues:

    • Ensure that your OPENAI_API_KEY is set correctly in your environment variables.
    • Check if the API key is active and has the necessary permissions.
  2. Connection Errors:

    • Verify that the URL is correct and accessible.
    • Ensure your network connection is stable and allows HTTP requests to external sites.
  3. Dependency Errors:

    • Make sure all required Python packages are installed.
    • Check for compatibility issues between package versions.

aiscrape's People

Contributors

brosenberg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.