Giter Site home page Giter Site logo

noxels / openai-scraper Goto Github PK

View Code? Open in Web Editor NEW
24.0 3.0 4.0 27 KB

This is a template repository for building a web scraper with OpenAI support. The repository provides a basic project structure with TypeScript and Puppeteer pre-configured, as well as OpenAI's GPT-3 API integration. With this template, you can easily build a scraper that uses machine learning to analyze and extract insights from the scraped data.

Dockerfile 23.22% TypeScript 76.78%
chatgpt data-science davinci openai puppeteer scraper web-scraper

openai-scraper's Introduction

Advanced OpenAi TypeScript Puppeteer Web Scraper with MySQL Integration

This advanced TypeScript Puppeteer web scraper template offers a comprehensive solution for web scraping tasks, integrating Puppeteer with MySQL database and incorporating various Puppeteer plugins for enhanced functionality. Tailored for both development and production environments, this template extends beyond basic web scraping by offering features like automated scheduling, headless browser operation, and advanced error handling. It is perfect for developers seeking a robust and scalable web scraping setup.

Features

  • Puppeteer Plugins Integration: Includes plugins like puppeteer-extra-plugin-anonymize-ua, puppeteer-extra-plugin-adblocker, puppeteer-extra-plugin-recaptcha, and puppeteer-extra-plugin-stealth for enhanced scraping capabilities.
  • Automated Scheduling: Utilizes node-cron for scheduling scraping tasks, customizable for different intervals.
  • Environment-Specific Configuration: Leverages .env files for differentiating between development and production environments.
  • MySQL Database Integration: Features integration with MySQL using a connection pool for efficient data handling.
  • Error Handling and Debugging: Advanced error handling with screenshot capabilities for debugging, along with options to open devtools and slow down Puppeteer operations for detailed inspection.
  • Automated Deployment: Includes a docker-compose file for automated deployment of the scraper. This will automatically build the scraper, a MySQL database, and a phpMyAdmin instance for database management.

Getting Started

Prerequisites

  • Node.js installed on your system
  • MySQL database setup
  • Yarn or npm for dependency management

Installation

  1. Clone the repository or use the "Use this template" button on GitHub.

  2. Install the dependencies:

    yarn install
    # or
    npm install

Configuration

  1. Create thre three env files .env, database.env and phpmyadmin.env in the root directory.
  2. Add the necessary environment variables (as declared in the template.*.env files) to the .env files or environment variables.

Local Usage

  • Compile the scraper:

    npm run compile
    # or
    npm run dev-compile # for continuous compilation
  • Run the scraper:

    yarn start
    # or
    npm start

Docker Usage

  • Build the scraper, MySQL database, and phpMyAdmin instance:

    docker-compose up

    Make sure to add the necessary environment variables to the database.env and phpmyadmin.env files.

TypeScript and Puppeteer Integration

  • TypeScript Support: Fully supported with TypeScript for type safety and easier code management.
  • Puppeteer: Control headless Chrome or Chromium for web page navigation, interaction, and data extraction.

Customizing the Scraper

You can modify the scrape function in the scraper.ts file to add your custom scraping logic and interact with MySQL database.

Contributing

Contributions are welcome! If you have suggestions for improvement or encounter any issues, feel free to open an issue or submit a pull request.


This template provides a solid foundation for building sophisticated web scrapers with TypeScript and Puppeteer, optimized for both development and production use. Enjoy your scraping journey!

openai-scraper's People

Contributors

noxels avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.