Giter Site home page Giter Site logo

best-practice-se-text-mining's Introduction

Best-Practice-SE-text-mining

For BIGDSE โ€™16

Notes

  • Understand the relevance of SE processes for ML/Big Data

  • There has been a constant push for using ML in SE. But, what about SE for ML

  • We'd like to explore what SE can teach ML

  • Big data and ML practitioners have an variety of tools at their disposal, with the growing size of such requires a validation team.

  • Mythical man month - 1/2 of the time is used of testing

  • Coding takes only 1/6th the entire time.

  • Industrial data mining has taught us that the significance of goal of a certain task

  • Key take way: Your goals are not my goals.

DM at LN

  • The problem is indeed unique. No dearth of data, but labeling data is quite expensive
  • Emulating real world data is hard โ€” forum such as stack exchange can be used to address these issues.
  • TAR is primarily a binary classification task.
  • StackEx using a site level granularity produces a satisfactory analogy to the real problem in hand.
  • Binary classification of this sort is vastly different from other techniques. This enables us to take to shortcuts.
  • These lessons are by no means general, we only endeavor to highlight the challenges in industrial data mining.

Structure: feel free to modify this

Abstract

Introduction

  • Motivations and background
  • Description of Data
  • Related works

Technology Assisted Review

  1. My goals are not your goals.

  2. My data isn't your data

  3. Describe Prec/Recall and their importance

  4. StackEx data

  5. Prevelance

  6. Sampling - Stratified sampling, Unequal Sampling

  7. Big data sometimes isn't

  8. Challenges in EDISC: See p13 of the refernce.

Experiments

  • Best Decision
  • All other decisons comparing with the best one

Discusions

  • some words to justify the best decision
  • lessons learnt from this project
  • The role of validation teams:
    • Mining large industrial data has signification lessons that can be learnt from SE practices
  • validity threats

Conclusion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.