Giter Site home page Giter Site logo

hauxlee / hybrid-error-data-cleaning-technology-for-human-in-the-loop Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 500.91 MB

The framework includes detection and repair methods for missing points and outlier, two common data quality problems. This study enables human experts to continuously adjust the dynamic and reasonable value range of data to achieve more accurate detection and repair of outliers. Reduce labor costs while improving data cleaning quality.

Python 100.00%

hybrid-error-data-cleaning-technology-for-human-in-the-loop's Introduction

I. Project Overview

This study introduces a hybrid human-in-the-loop framework for erroneous data cleaning. The framework comprises methods for detecting and repairing two common data quality issues: missing points and outliers. The algorithm first assesses whether it can automatically correct anomalies. If it does not meet the confidence threshold, it interacts with human experts, involving them in the repair while recording their opinions. This approach allows human experts to continuously adjust the dynamic rational value range of the data, achieving more accurate detection and repair of anomalies. It aims to improve data cleaning quality while reducing manual labor costs. The study conducts experiments on the classification accuracy of the proposed cleaning framework under different noise data ratios and regression accuracy after changes in the rational value environment. It considers changes in classification accuracy and predictive regression accuracy when multiple variables change. Experiments show that the framework has good application value in improving data quality when variables meet the requirements.

II. Execution Environment

Software Version
Windows Windows 10
Python Python 3.7

III. Software Setup

Execution Software: Pycharm

Configuration:

Python packages: pandas, keras, tensorflow, sklearn, numpy

IV. Program Introduction

Module Module Description
main Run this module to start the entire cleaning framework
data Data operations
MissingDataOperation Check and repair missing points
OutlierDataOperation Check and repair outliers
AutomaticCorrectData Machine learning-based automatic repair
UserOperation User-provided repair suggestions
RecordUserOperation Record user opinion logs
Experiment1_MakeData Generate data for Experiment 1
Experiment1_TestData Test results of Experiment 1
Experiment2_MakeData Generate data for Experiment 2
Experiment2_TestData Test results of Experiment 2

V. Usage Instructions

Run main.py to start the entire cleaning framework.

Run Experiment1_MakeData.py to generate data for Experiment 1.

Run Experiment1_TestData.py to test the results of Experiment 1.

Run Experiment2_MakeData.py to generate data for Experiment 2.

Run Experiment2_TestData.py to test the results of Experiment 2.

hybrid-error-data-cleaning-technology-for-human-in-the-loop's People

Contributors

hauxlee avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.