Giter Site home page Giter Site logo

mo-cmyk / goscrape Goto Github PK

View Code? Open in Web Editor NEW
12.0 2.0 1.0 346 KB

A universal scraping tool to acquire CS:GO demofiles from professional esports events provided by hltv.org

License: MIT License

Python 100.00%
counter-strike csgo hltv python webscraping

goscrape's Introduction

GoScrape ๐Ÿ™: Universal hltv.org demofile scraper

Build and publish Python ๐Ÿ distributions ๐Ÿ“ฆ to PyPI

Go scrape is a little open source project I created to make it easy to bulk download demofiles for the FPS CS:GO from the popular CS:GO fansite hltv.org.

Installation in Python - PyPi release

GoScrape is on PyPi, so you can use pip to install it.

  pip install goscrape

TL;DR

GoScrape consists of two main commands.

command description
events used in the first step to create a json lookup file containing important and structured information about CS:GO esports events in a given timeframe and if specified also links to associated demofiles and matches.
fetch build on top of the events command and can be used to bulk download the demofile json output from the events command otherwise a single event id can be specified to simply download demofiles for that event.

tldr

Getting Started

Events ๐ŸŽฎ

events

argument datatype description notes
STARTDATE string the start date from when evet data should be gathered formatted as string 'YYYY-MM-DD' required
ENDDATE string the date to which event data should be gathered formatted as string 'YYYY-MM-DD' required
STORAGEPATH string the directory or filepath to which the resulting json should be stored optional (default is cwd)
MATCHES boolean whether match information and demofile urls should be scraped as well This flag is required if the resulting json file
should be used for the fetch command
optional (True if present)
EVENT TYPE enum Which type of event datashould be pulled (Online, Lan ...) optional (default is online)

The Objects in the resulting json are identified by their event id given as a key and will look something like this:

{
  "6475": {
    "event_data": {
      "entity": "event",
      "event_id": "6475",
      "event_url": "https://www.hltv.org/events/6475/iem-dallas-2022-oceania-open-qualifier-2",
      "event_name_encoded": "iem-dallas-2022-oceania-open-qualifier-2",
      "event_name_full": "IEM Dallas 2022 Oceania Open Qualifier 2",
      "nr_of_teams": "8+",
      "prize": "Other",
      "event_type": "Online",
      "location": "Oceania (Online)",
      "event_start": "2022-04-20",
      "event_end": "2022-04-21"
    },
    "matches": [
      {
        "entity": "match",
        "teams": ["Paradox", "Aftershock"],
        "date_time": "2022-04-21 10:00:00",
        "match_url": "https://www.hltv.org//matches/2355881/paradox-vs-aftershock-iem-dallas-2022-oceania-open-qualifier-2",
        "demo_id": "71497",
        "demo_url": "https://www.hltv.org/download/demo/71497"
      }
    ]
  }

Fetch ๐Ÿ’พ

fetch

argument datatype description notes
EVENT ID string | int the start date from when evet data should be gathered LOOKUP FILE & EVENT ID are mutually exclusive
only one can be used
required
LOOKUP FILE string the filepath of the by the events command generated lookup that should be sued for demo downloading LOOKUP FILE & EVENT ID are mutually exclusive
only one can be used
required
STORAGEPATH string the directory to which the demofiles should be written optional (default is cwd)
MULTIPROCESSING boolean whether multiprocessing should be utilized to speed up downloading optional (True if present)

Disclaimer

This tool nor I have any affiliation with HLTV. I originally built this CLI to aid in my ability to download demos for scientific research purposes. I made it publicly availible because I thought it might benefit others as well. If you download a lot of demos the tool will automatically implement a sleep time to avoid a temporary cloudflar ban.

Changelog

Version 0.1.3 (2022.09.22)

  • Resolved an issue where the package failed to gather the file name of the fetched demo file

Version 0.1.2 (2022.05.30)

  • Bug fixes and improvements

Version 0.1.1 (2022.04.29)

  • Bug fixes on multiprocessed downloading

Version 0.1.0 (2022.04.24)

  • Initial release

Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Issued

If you expierience any issues please message me or raise an issue here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.