Giter Site home page Giter Site logo

jakob-bach / ds-lab-2021 Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 23 KB

The supervisor repo for the "Data Science Laboratory Course" at the Karlsruhe Institute of Technology (KIT), summer term 2021.

Home Page: https://dbis.ipd.kit.edu/english/3031_3044.php

License: MIT License

Python 100.00%
machine-learning teaching-materials data-science

ds-lab-2021's Introduction

Data Science Laboratory Course 2021

This is the supervisor repo of the "Data Science Laboratory Course" at KIT in 2021. Students worked on two subtasks:

The repo provides files for preparing the datasets, some basic exploration, course-internal splitting, scoring, and demo submissions for that.

Setup

We use Python with version 3.9.2. We recommend to set up a virtual environment to install the dependencies, e.g., with virtualenv:

python -m virtualenv -p <path/to/right/python/executable> <path/to/env/destination>

or with conda (which we used, version 4.10.0):

conda create --name ds-lab-2021 python=3.9.2

Next, install the dependencies with

python -m pip install -r requirements.txt

If you make changes to the environment and you want to persist them, run

python -m pip freeze > requirements.txt

We installed spyder-kernels into the environment, so you should be able to use the environment in the IDE Sypder (if the versions of spyder-kernels and Spyder are compatible).

Task 1: Data Mining Cup 2021 (Task_1_DMC_2021/)

Preparation

Download the DMC task from the website. Place the three CSVs in a folder called data in the folder Task_1_DMC_2021.

Exploration

explore_data.py allows very basic interactive (e.g., in IDE) exploration.

Scoring

  • check_submission_validity.py checks whether submission files have the right format.
  • check_submission_identity.py checks whether identically-named submission files have the same content (= checks reproducibility).
  • prepare_manual_scoring.py prepares input files and output files for manual course-internal scoring of randomly sampled recommendations.
  • evaluate_manual_scoring.py reads output files of manual scoring and combines them.

Demo Submissions

We provide two simple demo submission scripts that produce solutions observing the DMC submission format:

  • recommend_global_favorites.py: Ignore item to evaluate and recommend globally popular items.
  • recommend_cooccurring_favorites.py: Find items which are most popular in sessions with item to evaluate.

Distributed Submission

For the DMC submission, we combine multiple submissions that were created by the teams' pipelines. To this end, we let the participants manually decide which of the submissions to use for each item. As making this decision for the full test set would require a lot of effort, we distribute the manual work over all participants.

  • prepare_distributed_solution.py prepares input files and output files for the manual selection process.
  • combine_distributed_solution.py read output files of manual selection as well as submission files and combines them.

Task 2: Verification of Auction Process Models (Task_2_Auction_Verification/)

For some background on the scenario, you can read

Ordoni, E., Mülle, J., & Böhm, K. (2020). Verification of Data-Value-Aware Processes and a Case Study on Spectrum Auctions.

Our datasets are based on verification of process models mimicking the German 4G Spectrum Auction, featuring six products and four bidders. We have two datasets, which have nearly the same columns, but differ in the domain of their data values. I.e., one dataset has a higher range of prices than the other one and therefore has more data objects. Also, the interpretation of the formula (= property) to be verified is slightly different between the two dataset. In the smaller dataset (auction_verification), each price is encoded as a separate data value. In the larger dataset (auction_verification_large), prices are encoded binarily, which results in longer formulas.

Preparation

Obtain the raw small dataset Process4.csv and the raw partititions of the large dataset result[0-6].csv. Place them a folder called data in the folder Task_2_Auction_Verification. Run prepare_data.py to create student-friendly, pre-processed versions of the datasets.

Exploration

explore_data.py allows basic interactive (e.g., in IDE) exploration and predictions.

ds-lab-2021's People

Contributors

jakob-bach avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.