Giter Site home page Giter Site logo

auto-motor-sport-laptimes's Introduction

auto-motor-sport-laptimes

Description

This is website crawler which uses Geb to read data from Auto motor und sport website, and save it in local Mongodb. It uses only Chrome to perform this crawling. This project used Example Geb and Gradle project as it base.

Gathered data

Crawler gathers data from Supertests. This tests are conducted on Nurburgring Nordschleife ("North Loop") and on Hockenheimring short course tracks. Gathered data contains:

  • Car data - make, model, production years, gearbox type, layout (AWD, FWD, RWD), weight, engine power, engine torque
  • Test info - driver, test date, url
  • Test results - Nordschleife laptime, Hockenheim laptime, acceleration times 0-100 km/h and 0-200 km/h

Supertest contains more data, but currently only data mentioned above is crawled. Information about tyres and some test dates was added manually. If there was no information about tyres in article I checked other tests from Auto motor und sport or looked for articles in internet.

Usage

Crawler has these scripts:

  • importLinks - it goes through paged list of supertests and saves url's
  • verifyLinks - it goes through saved url's and check's which supertests have tests results (some supertests are testing SUV/terrain cars, and don't contain track data; some tests of sports cars don't have relevant data in test results but in text, see below in Problems)
  • readTestData - it goes through verified links, checks if data exists once more and reads it
  • readMissingTestData - it goes through tests which were marked as not having relevant data, missing data will be added manually, this tests are mentioned below, this tests were picked manually, see below in Problems
  • addMissingData - added missing data from json (tyres, tyres spec, source of information about tyres, test date according to test title)
  • generateWikiTable - generate table for wiki

You run them in this order:

gradle importLinks
gradle verifyLinks
gradle readTestData
gradle readMissingTestData
gradle addMissingData
gradle generateWikiTable

Script importLinks inserts links to new tests to collection links. verifyLinks only updates data. readTestData inserts new data, and can be restarted if it fails. It starts where it stopped before. If you want to read data from scratch you must drop collection named results and links.

Why gather data this data

Data from Supertests especially Nordschleife laptimes are quite reliable. Most of tests are performed by one driver, Horst von Saurma and cars are in factory specification including tyres and equipment. Automakers have their 'own definition' of factory/street-legal specification. This tests are published in sport auto magazine

Why this technology stack

I wanted to become more familiar with Geb and use Mongodb which I hadn't chance to use before.

Problems

This tests results were added manually:

This tests was added manually (whole test):

This tests don't have any test results:

Other issues

  • Porsche 918 has got power only from combustion engine
  • gradle task have some steps in common, it can be refactored
  • no tyre types (semi, UHP, etc)

Additiona links with laptimes

auto-motor-sport-laptimes's People

Contributors

jtolkanowicz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.