Giter Site home page Giter Site logo

job-pulse-pure-scrape's Introduction

JobPulse.fyi Open Source Web Scraper

JobPulse.fyi is a powerful tool that tracks software engineering and product manager openings tailored for students. This repository is a part of the JobPulse.fyi project and is designed to scrape job information from company websites using Google's API.

Features

  • Job search: Given a query and a website, the scraper searches for job listings that match the query.
  • Data extraction: The scraper visits each job listing page and extracts relevant data, such as job title, years of experience, company, application link, location, and job description.

Getting Started

Prerequisites

  • Python 3.7 or above
  • Packages: BeautifulSoup, selenium, pytz, requests
  • Google API key
  • OpenAI API key

Installation

  1. Clone this repository:

  2. Install the required packages:

    pip install -r requirements.txt
  3. Get a Google API Key:

  4. Get an OpenAI API Key:

    • Follow the steps from OpenAI to get an API key.
  5. Set the environment variables:

    • Copy the .env.example file to a new file named .env and fill in the appropriate keys:

      GOOGLE_API_KEY=your_google_api_key
      CX_KEY=your_cx_key
      OPENAI_KEY=your_openai_key

Usage

  1. Modify the query and site variables in the main function as per your requirements.

  2. Run the code:

    python3 src/main.py --run_pure

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Please feel free to contact us if you have any questions about the project.

Join us on Discord: Discord Link

Happy Coding!

This README is subject to updates, please stay tuned for any changes.

jobPosting schema class Mandatory:

  • apply_link: str
  • company: str
  • date_added: str
  • title: str

Optional:

  • description: str
  • location: str
  • category: "Software Engineer"
  • title_correct_by_gpt: True

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.