stefan-grafberger Goto Github PK

followers: 46.0 following: 90.0 repos: 23.0 gists: 1.0

Name: Stefan Grafberger

Type: User

Company: University of Amsterdam

Bio: I am a Ph.D. student at BIFOLD & TU Berlin, conducting research at the intersection of data management and machine learning.

Twitter: SGrafberger

Location: Amsterdam

Blog: https://stefan-grafberger.com

Stefan Grafberger's Projects

csvmatch

🔎 Finds fuzzy matches between CSV spreadsheets

dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.

latex-make-action

Action for compiling latex with make

learnedcardinalities

Code and workloads from the Learned Cardinalities paper (https://arxiv.org/abs/1809.00677)

ml-pipeline-datasets

Some datasets for ML pipelines that I want to use for some experiments

mlinspect

Inspect ML Pipelines in Python in the form of a DAG

mlinspect-cidr

Inspect ML Pipelines in Python in the form of a DAG (CIDR Submission version)

mlinspect-demo

mlinspect-exploratory-user-study

The files for an initial exploratory user study. It provides the foundation for a larger user study in future work.

mlwhatif

Data-Centric What-If Analysis for Native Machine Learning Pipelines

noworkflow

Supporting infrastructure to run scientific experiments without a scientific workflow management system.

pgbm

Probabilistic Gradient Boosting Machines

plantestic

shadow-pipeline-experiments

st-cytoscape

A Fork to add dagre layout support

streamdq

StreamDQ is a library built on top of Apache Flink for defining "unit tests for data", which measure data quality in large data streams.

stefan-grafberger Goto Github PK

Stefan Grafberger's Projects

Recommend Projects

Recommend Topics

Recommend Org