Giter Site home page Giter Site logo

ml-smores / fast Goto Github PK

View Code? Open in Web Editor NEW
30.0 9.0 12.0 3.04 MB

Feature Aware Student knowledge Tracing Toolkit. Implements HMMs with features for modeling student performance

Home Page: http://ml-smores.github.io/fast/

License: GNU General Public License v2.0

Java 99.32% Ruby 0.08% Python 0.60%

fast's Introduction

FAST: Feature-Aware Student knowledge Tracing

This is the repository of FAST, an efficient toolkit for modeling time-changing student performance ([González-Brenes, Huang, Brusilovsky et al, 2014] (http://educationaldatamining.org/EDM2014/uploads/procs2014/long%20papers/84_EDM-2014-Full.pdf)). FAST is alterantive to the [BNT-SM toolkit] (http://www.cs.cmu.edu/~listen/BNT-SM/), a toolkit that requires the researcher to design a different different Bayes Net for each feature set they want to prototype. The FAST toolkit is up to 300x faster than BNT-SM, and much simpler to use.

We presented the model in the 7th International Conference on Educational Data Mining (2014) (see [slides] (http://www.cs.cmu.edu/~joseg/files/fast_presentation.pdf) ), where it was selected as one the top 5 paper submissions.

Technical Details

FAST learns per parameters for each skill using an HMM with Features ([Berg-Kirpatrick et al, 2010] (http://www.cs.berkeley.edu/~tberg/papers/naaclhlt2010.pdf)).

Running FAST

Quick Start

  1. Download the latest release [here] (https://github.com/ml-smores/fast/releases).
  2. Decompress the file. It includes sample data for getting you started quickly.
  3. Open a terminal and type (you need to be in the same directory as the fast-2.1.1-final.jar file in your console, which can be achieved by the cd command):
    ``` java -jar fast-2.1.1-final.jar ++data/IRT_exp/FAST+IRT1.conf ````

Congratulations! You just trained a student model (with IRT features) using state of the art technology.

Please see the Wiki for more information.

Please cite our work (and provide the link https://github.com/ml-smores/fast) if you use our tool in your published papers: González-Brenes, J. P., Huang, Y., & Brusilovsky, P. (2014). General features in knowledge tracing: applications to multiple subskills, temporal item response theory, and expert knowledge. In Proc. 7th Int. Conf. on Educational Data Mining (pp. 84-91).

Contact us

We would love to hear your feedback. Please [email us] (mailto:[email protected])!

Thanks, Yun, Jose, and Peter

fast's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast's Issues

Bug in basicModelName

The two variables "basicModelName" and "modelName" are not self-explanatory.

Moreover, the modelName field requires the string "FAST" or "KT" which is not documented in the wiki

Firecracker needs (from Elliott)

@e-bartsch

  1. how to make sure knowledge inference doesn't decrease with correct response (output guess+slip<1 for all records)
  2. the update is insensitive
  3. initializing by reasonable bounds (now random)
  4. constrain the search
  5. what's the impact of features for updating (e.g., item)?

Some conveniences for users

-- remove "++" for specifying ".conf" files
-- explain running multiple files better
-- explain the naming conventions better
-- give specific error types for users to debug the input files.
-- outcome should only be "correct|incorrect"
-- input header one column shouldn't have space between words.

Problematic feature

The problem came from one feature (number of turns) that was not binary but is increasing over multiple turns (1, 2, 3 etc). I haven't figured out why yet but when i turned this into a binary feature (new turn = 0 or re-try =1), FAST worked. I think it has something to do with the ordering of the data but i am still working on it.

[12/22] output student and KC id for matching the test set

  1. I did a join of the input sequence and the prediction file for one of the tests sets (attached), and I think something is wrong. The actual label fields are not consistent with the outcome field (maybe I am misunderstanding this, or maybe the prediction file is sorted in someway). Can you look into this.

-- Rohit

[12/22] EvaluationGeneral should allow testXX or testXXX in addition to testX

  1. In EvaluationGeneral.java:1268
    String foldIDStr = fileName.substring(pos - 1, pos);
    assumes the number of folds will be a single digit.
    Changed it to
    String foldIDStr = fileName.substring(fileName.indexOf(standardFilePrefix) + standardFilePrefix.length()).replace(suffix, "");

With this fix, was able to run a 40 fold (leave one student out) KT experiment (10 KCs). Output file is attached.

-- Rohit

[12/22] print our knowledge state

Is there a way to print out (if it isn’t already printed), the “probability of being in the known state of the skill”. I am thinking it would be useful to correlate this to independent measures of knowledge we have.

The question about measuring other probabilities is still open. I suppose we can add a new return of List in the doPredict function in Predict.java:111 and implement some calculator for that in the testsequence loop in that function. I not very familiar with the HMM implementation you are using. If you could make this modification (or let me know what I can change) that would help.

-- Rohit

Bug in calculating probability of observation?

In OpdfContextAwareLogistic, the probability is calculated in linear space:

    double logit = 0.0;
    for (int i = 0; i < featureWeights.length; i++) {
        // System.out.println("feature weight:" + featureWeights[i] + ",value:"
        // + featureValues[i]);
        logit += featureWeights[i] * featureValues[i];
    }

Shouldn't this be in loglinear space?

curvature or line searchstep size underflow

It seems when we have features with collinearity issues, LBFGS will output such warnings.

For example:
-- subskill practice features
-- item practice features (when ranging up to 15)
-- past N action features in a data where students have almost the same activity order.

It seems that seeting initialWeightsBounds = 0.1 can practically avoid such problems. Yet this setting will limit the initial value range. So you should be careful if you want to get truly randomized initial values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.