ml-smores / fast Goto Github PK

Feature Aware Student knowledge Tracing Toolkit. Implements HMMs with features for modeling student performance

Home Page: http://ml-smores.github.io/fast/

License: GNU General Public License v2.0

Java 99.32% Ruby 0.08% Python 0.60%

fast's Introduction

FAST: Feature-Aware Student knowledge Tracing

This is the repository of FAST, an efficient toolkit for modeling time-changing student performance ([González-Brenes, Huang, Brusilovsky et al, 2014] (http://educationaldatamining.org/EDM2014/uploads/procs2014/long%20papers/84_EDM-2014-Full.pdf)). FAST is alterantive to the [BNT-SM toolkit] (http://www.cs.cmu.edu/~listen/BNT-SM/), a toolkit that requires the researcher to design a different different Bayes Net for each feature set they want to prototype. The FAST toolkit is up to 300x faster than BNT-SM, and much simpler to use.

We presented the model in the 7th International Conference on Educational Data Mining (2014) (see [slides] (http://www.cs.cmu.edu/~joseg/files/fast_presentation.pdf) ), where it was selected as one the top 5 paper submissions.

Technical Details

FAST learns per parameters for each skill using an HMM with Features ([Berg-Kirpatrick et al, 2010] (http://www.cs.berkeley.edu/~tberg/papers/naaclhlt2010.pdf)).

Running FAST

Quick Start

Download the latest release [here] (https://github.com/ml-smores/fast/releases).
Decompress the file. It includes sample data for getting you started quickly.
Open a terminal and type (you need to be in the same directory as the fast-2.1.1-final.jar file in your console, which can be achieved by the cd command):
``` java -jar fast-2.1.1-final.jar ++data/IRT_exp/FAST+IRT1.conf ````

Congratulations! You just trained a student model (with IRT features) using state of the art technology.

Please see the Wiki for more information.

Please cite our work (and provide the link https://github.com/ml-smores/fast) if you use our tool in your published papers: González-Brenes, J. P., Huang, Y., & Brusilovsky, P. (2014). General features in knowledge tracing: applications to multiple subskills, temporal item response theory, and expert knowledge. In Proc. 7th Int. Conf. on Educational Data Mining (pp. 84-91).

Contact us

We would love to hear your feedback. Please [email us] (mailto:[email protected])!

Thanks, Yun, Jose, and Peter

fast's People

Stargazers

Watchers

Forkers

summer-liu severinklingler summyfeb12 charlesdguthrie lidhcs bgnkim michaelzhouwang khushsi cywongnorman cutekane yangcen-ann weexp

fast's Issues

Bug in basicModelName

The two variables "basicModelName" and "modelName" are not self-explanatory.

Moreover, the modelName field requires the string "FAST" or "KT" which is not documented in the wiki

Firecracker needs (from Elliott)

@e-bartsch

how to make sure knowledge inference doesn't decrease with correct response (output guess+slip<1 for all records)
the update is insensitive
initializing by reasonable bounds (now random)
constrain the search
what's the impact of features for updating (e.g., item)?

set parameters fixed; initialize feature parameters

trace subskills' knowledge levels directly

output the same order as test set

Some conveniences for users

-- remove "++" for specifying ".conf" files
-- explain running multiple files better
-- explain the naming conventions better
-- give specific error types for users to debug the input files.
-- outcome should only be "correct|incorrect"
-- input header one column shouldn't have space between words.

explain the IRT featues better

Problematic feature

The problem came from one feature (number of turns) that was not binary but is increasing over multiple turns (1, 2, 3 etc). I haven't figured out why yet but when i turned this into a binary feature (new turn = 0 or re-try =1), FAST worked. I think it has something to do with the ordering of the data but i am still working on it.

[12/22] output student and KC id for matching the test set

I did a join of the input sequence and the prediction file for one of the tests sets (attached), and I think something is wrong. The actual label fields are not consistent with the outcome field (maybe I am misunderstanding this, or maybe the prediction file is sorted in someway). Can you look into this.

-- Rohit

[12/22] EvaluationGeneral should allow testXX or testXXX in addition to testX

In EvaluationGeneral.java:1268
String foldIDStr = fileName.substring(pos - 1, pos);
assumes the number of folds will be a single digit.
Changed it to
String foldIDStr = fileName.substring(fileName.indexOf(standardFilePrefix) + standardFilePrefix.length()).replace(suffix, "");

With this fix, was able to run a 40 fold (leave one student out) KT experiment (10 KCs). Output file is attached.

-- Rohit

Speed up / debugging when running a large number of subskills

[12/22] print our knowledge state

Is there a way to print out (if it isn’t already printed), the “probability of being in the known state of the skill”. I am thinking it would be useful to correlate this to independent measures of knowledge we have.

The question about measuring other probabilities is still open. I suppose we can add a new return of List in the doPredict function in Predict.java:111 and implement some calculator for that in the testsequence loop in that function. I not very familiar with the HMM implementation you are using. If you could make this modification (or let me know what I can change) that would help.

-- Rohit

Bug in calculating probability of observation?

In OpdfContextAwareLogistic, the probability is calculated in linear space:

    double logit = 0.0;
    for (int i = 0; i < featureWeights.length; i++) {
        // System.out.println("feature weight:" + featureWeights[i] + ",value:"
        // + featureValues[i]);
        logit += featureWeights[i] * featureValues[i];
    }

Shouldn't this be in loglinear space?

Output files should have standard extensions

Files have weird extensions ".eval", ".out", etc... eventhough they are ".csv" or ".tsv" files. We should fix this.

curvature or line searchstep size underflow

It seems when we have features with collinearity issues, LBFGS will output such warnings.

For example:
-- subskill practice features
-- item practice features (when ranging up to 15)
-- past N action features in a data where students have almost the same activity order.

It seems that seeting initialWeightsBounds = 0.1 can practically avoid such problems. Yet this setting will limit the initial value range. So you should be careful if you want to get truly randomized initial values.

for preserving the ordering of test set
for saving space

ml-smores / fast Goto Github PK

fast's Introduction

FAST: Feature-Aware Student knowledge Tracing

Technical Details

Running FAST

Quick Start

Contact us

fast's People

Stargazers

Watchers

Forkers

fast's Issues

Recommend Projects

Recommend Topics

Recommend Org