Light

ai-se / best-practice-se-text-mining Goto Github PK

View Code? Open in Web Editor NEW

0.0 4.0 0.0 1.17 MB

TeX 100.00%

best-practice-se-text-mining's Introduction

Best-Practice-SE-text-mining

For BIGDSE ’16

Notes

Understand the relevance of SE processes for ML/Big Data
There has been a constant push for using ML in SE. But, what about SE for ML
We'd like to explore what SE can teach ML
Big data and ML practitioners have an variety of tools at their disposal, with the growing size of such requires a validation team.
Mythical man month - 1/2 of the time is used of testing
Coding takes only 1/6th the entire time.
Industrial data mining has taught us that the significance of goal of a certain task
Key take way: Your goals are not my goals.

DM at LN

The problem is indeed unique. No dearth of data, but labeling data is quite expensive
Emulating real world data is hard — forum such as stack exchange can be used to address these issues.
TAR is primarily a binary classification task.
StackEx using a site level granularity produces a satisfactory analogy to the real problem in hand.
Binary classification of this sort is vastly different from other techniques. This enables us to take to shortcuts.
These lessons are by no means general, we only endeavor to highlight the challenges in industrial data mining.

Structure: feel free to modify this

Abstract

Introduction

Motivations and background
Description of Data
Related works

Technology Assisted Review

My goals are not your goals.
My data isn't your data
Describe Prec/Recall and their importance
StackEx data
Prevelance
Sampling - Stratified sampling, Unequal Sampling
Big data sometimes isn't
Challenges in EDISC: See p13 of the refernce.

Experiments

Best Decision
All other decisons comparing with the best one

Discusions

some words to justify the best decision
lessons learnt from this project
The role of validation teams:
- Mining large industrial data has signification lessons that can be learnt from SE practices
validity threats

Conclusion

best-practice-se-text-mining's People

Contributors

Watchers

best-practice-se-text-mining's Issues

Experiments: Comparison of results - the validation part

Discussion1: some words to justify the best decision

Method

Discussion2: lessons learnt

Discussion3: validity threats

Abstract and Conclusion

Introduction3: related works

Camera Ready Updates

Section 3
Reviewer 2
Reviewer 3

Introduction2: Data Description

Describe why we are using StackExchange Data, how is it like.
Especially for SExx data sets, how we get it, the characteristic of the data sets.

Data sets will be used in this paper:

tex drupal academia apple gamedev rpg english electronics physics scifi SE0 SE1 SE2 SE3 SE4 SE5 SE6 SE7 SE8 SE9 SE10 SE11 SE12 SE13 SE14

Introduction1: motivations, backgrounds...

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.