educationaltestingservice / rsmtool Goto Github PK

View Code? Open in Web Editor NEW

65.0 14.0 17.0 58.52 MB

A Python package to facilitate research on building and evaluating automated scoring models.

Home Page: https://rsmtool.readthedocs.io

License: Apache License 2.0

Python 81.47% Jupyter Notebook 18.21% Smarty 0.01% TeX 0.09% PowerShell 0.21% HTML 0.01%

scoring-models automated-scoring-engines python visualization scikit-learn fairness hacktoberfest

rsmtool's People

Contributors

Stargazers

Watchers

Forkers

weilamchung jkahn pombredanne srhrshr vikingmew jrosen48 tzclk navinthenapster ankitamandal dragomirradev orkohunter copperdong marcogorelli xinyuwang1126 parejar hairlongzx rishab-ai4ed

rsmtool's Issues

Pip installation succeeds without installing dependencies

I think it should probably fail or at least specify that the dependencies need to be installed.

mkvirtualenv rsmtool-tester --python=/usr/bin/python3
cd $checkoutdir
pip install .
... # succeeds quietly

Allow user to specify objective for SKLL models

We are currently forcing the objective to be pearson.

Also, display the objective used in the report.

And potentially change the default from pearson to neg_mean_error.

Get rid of the stable branch

Since we have releases for every new version, there's no point in having a stable branch.

Refactor `train_builtin_model()` to make each built-in model a separate method

Currently, all built-in models in RSMTool Refactor are written out in an if statement in a single method --train_builtin_model() -- in the Modeler class. It would be good to break these out into separate methods to (1) make the code more readable, and (2) make it easier to override these methods if we want to use a different implementation of any model in a separate branch.

Add disattenuated correlations to evaluations

HM/sqrt(HH)
These will only be available if second human is provided.

Links to .csv files from .html report

It could be useful to have links in .html report which would allow you to immediately open the specific .csv files for this given analysis rather than asking people to remember our sometimes not very transparent naming scheme. I was just trying to show all of this to Jay and it's a bit messy even with output_csv.md file. The problem is of course that if the report is forwarded without the output folder the links will be lost.

A perhaps better alternative could be to generate a separate .html report which contains only explanations+links to available .csv files and store this report under output. Then somebody from IDEAS who ran the tool and then wants to play around with .csv could quickly locate them by using this "report".

Merge `rsmtool` master branch into `rsmtool-refactor` branch

Merge rsmtool master branch into rsmtool-refactor branch and resolve inconsistencies.

This will entail incorporating all changes associated with v5.7 release, including upgrading Python and skll versions and updating unit test data.

Quoted boolean values misinterpreted in config files

Valid boolean values in JSON files do not include quotation marks surrounding the values, so, if these values are surrounded by quotation marks in RSMTool configuration files, the values are not interpreted correctly (or do not raise an exception that might be helpful in identifying the problem). For example, if exclude_zero_scores is set to "false" (rather than false) in an RSMEval configuration file, it will still be interpreted as True and could log a confusing warning to that effect. If not changed to accepte quoted boolean values, then maybe the invalid value can just be rejected immediately and a helpful error message could be displayed.

nosetests raises Runtime and Deprecation warnings

Seems like deprecation warnings should be disabled in testing if you're explicitly expecting them, or the tests should use something else.

But you're also getting a RuntimeWarning that might be meaningful -- and if not, worth trying to suppress, because it looks dire.

$ nosetests
./home/jeremy/Envs/rsmtool-dev/lib/python3.5/site-packages/numpy/lib/function_base.py:2079: RuntimeWarning: Degrees of freedom <= 0 for slice
  warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
.................................................../home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "expID" is deprecated and will be removed  in a future release, please use the new field name "experiment_id" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "train.lab" is deprecated and will be removed  in a future release, please use the new field name "train_label_column" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "LRmodel" is deprecated and will be removed  in a future release, please use the new field name "model" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "test" is deprecated and will be removed  in a future release, please use the new field name "test_file" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "test.lab" is deprecated and will be removed  in a future release, please use the new field name "test_label_column" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "trim.max" is deprecated and will be removed  in a future release, please use the new field name "trim_max" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "train" is deprecated and will be removed  in a future release, please use the new field name "train_file" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "trim.min" is deprecated and will be removed  in a future release, please use the new field name "trim_min" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "feature" is deprecated and will be removed  in a future release, please use the new field name "features" instead.
  category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:205: DeprecationWarning: The model name "empWt" is deprecated and will be removed  in a future release, please use the new model name "LinearRegression" instead.
  category=DeprecationWarning)
..............................................................................................................................................................
----------------------------------------------------------------------
Ran 210 tests in 83.081s

OK

RSMTool does not support half-point rounded scores.

Update python, SKLL, and other dependencies.

Processing delay at prediction step

For a relatively large Train/Test set (300K + 100K, 44 feats), the generate predicitions step takes much longer than any other step.

The delay is at the step predict:predict_with_the_model function before it starts post-processing the predictions.

Add new learners added to the latest SKLL release

We'll need to add the to:

List of learners
Documentation
Tests (note that for models based on optimization we only test that the test actually runs since the predictions might differ across systems)

Allow using SKLL classifiers with probability output

Option to load images as clickable thumbnails in HTML report

Large HTML reports with lots of images can take a very long time to load. It would be helpful to allow users to generate clickable thumbnail images (rather than full-sized images) in the report. The images can be saved in the figures folder.

In addition, it may be useful to produce a zipped version of the experiment.

Save RSMCompare outputs

For later use 2. To simplify testing

Error during rsmeval: `RuntimeError: Kernel died before replying to kernel_info`

Below is the relevant context from our output. This might be a jupyter_client issue.

This seems to be intermittent: it only happened once among 7 datasets. This dataset has had rsmeval work on its output without problem in the past.

INFO:rsmtool.rsmeval:Processing predictions
INFO:rsmtool.rsmeval:Saving pre-processed predictions and the metadata to disk
INFO:rsmtool.rsmeval:Running analyses on predictions
INFO:rsmtool.rsmeval:Starting report generation
INFO:rsmtool.report:Merging sections
INFO:rsmtool.report:Exporting HTML
INFO:root:Executing notebook with kernel: python3
INFO:rsmtool.rsmeval:Assuming given system predictions are unscaled and will be used as such.
INFO:rsmtool.rsmeval:Reading predictions: [...]
INFO:rsmtool.rsmeval:Processing predictions
INFO:rsmtool.rsmeval:Saving pre-processed predictions and the metadata to disk
INFO:rsmtool.rsmeval:Running analyses on predictions
INFO:rsmtool.rsmeval:Starting report generation
INFO:rsmtool.report:Merging sections
INFO:rsmtool.report:Exporting HTML
INFO:root:Executing notebook with kernel: python3
Traceback (most recent call last):
  [...]
  File "/opt/python/rsmtool/lib/python3.4/site-packages/rsmtool/rsmeval.py", line 485, in run_evaluation
    context='rsmeval')
  File "/opt/python/rsmtool/lib/python3.4/site-packages/rsmtool/report.py", line 513, in create_report
    join(reportdir, '{}.html'.format(report_name)))
  File "/opt/python/rsmtool/lib/python3.4/site-packages/rsmtool/report.py", line 592, in convert_ipynb_to_html
    output, resources = exportHtml.from_filename(notebook_file)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 165, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 183, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/html.py", line 65, in from_notebook_node
    return super(HTMLExporter, self).from_notebook_node(nb, resources, **kw)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/templateexporter.py", line 200, in from_notebook_node
    nb_copy, resources = super(TemplateExporter, self).from_notebook_node(nb, resources, **kw)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 130, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 302, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb,resources)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/preprocessors/execute.py", line 141, in preprocess
    cwd=path)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/jupyter_client/manager.py", line 433, in start_new_kernel
    kc.wait_for_ready(timeout=startup_timeout)
  File "/opt/python/rsmtool/lib/python3.4/site-packages/jupyter_client/blocking/client.py", line 59, in wait_for_ready
    raise RuntimeError('Kernel died before replying to kernel_info')
RuntimeError: Kernel died before replying to kernel_info

Copyright year

Change the copyright year in LICENSE and on ReadTheDocs to 2018.

evaluation by group in documentation should be evaluation_by_group

In RSMTool as well as RSMCompare documentation, it should be evaluation_by_group
Ref: http://rsmtool.readthedocs.io/en/latest/usage_rsmtool.html#general-sections-optional

Force length column to be numeric

After we filter out flagged response, we should force length_column to be numeric before checking for missing values/std==0.

API in refactored version

The following functions from API are used in other research scripts:
from rsmtool.analysis import metrics_helper
from rsmtool.report import convert_ipynb_to_html
from rsmtool.preprocess import remove_outliers

Would be good to check if it's possible to keep those available with a deprecation warning or a document suggesting how to achieve the same functionality?

Remove nullification of model.logger in `update_skll_models` script

Once we update the SKLL dependency to v1.5.1 since the issue will be fixed in that release.

Streamline configuration file log messages

The log now prints the following two messages which can probably be combined into a single message:

Reading, normalizing, validating, and processing configuration.
Reading configuration file: /Users/nmadnani/work/rsmtool/tests/data/experiments/lr-with-custom-sections-and-order/lr_with_custom_sections_and_order.json

only show subgroups with >X members in the `by_group` sections

It would be nice in the subgroup evaluation sections to only show subgroups that have more than X members, where you would define X in the config.

RSMCompare on RSMEval output can't find results

I'm running v 5.1.0 rsmcompare on two rsmeval outputs (generated with v 5.0.1), but the comparison report says that "This information is not available for either of the models.". If I look at the individual rsmeval reports, the information does seem to be there.

In particular, I am looking at the following sections: consistency, score_distributions, evaluation

Is is that this rsmcompare comparison only works with reports generated with this version too? UPDATE: I just regenerated the rsmeval output using the latest version of rsmtool and am still having the same problem.

Pyflakes and pep8 raise many concerns

A few minor nitpicks from pyflakes

$ pyflakes3 rsmtool/*.py
rsmtool/__init__.py:11: 'rsmextra' imported but unused
rsmtool/model.py:508: local variable 'intercept' is assigned to but never used
rsmtool/predict.py:38: local variable 'logger' is assigned to but never used
rsmtool/rsmpredict.py:320: local variable 'logger' is assigned to but never used

and _lots_ of complaints (about line length, whitespace, boolean conditions, etc) from pep8. Flagged results from many files:

$ pep8 -q rsmtool/
rsmtool/analysis.py
rsmtool/create_features.py
rsmtool/input.py
rsmtool/model.py
rsmtool/predict.py
rsmtool/preprocess.py
rsmtool/report.py
rsmtool/rsmcompare.py
rsmtool/rsmeval.py
rsmtool/rsmpredict.py
rsmtool/rsmtool.py
rsmtool/test_utils.py
rsmtool/utils.py

Update API page in docs

Update the API page to list the static/instance methods that we think are likely to be useful. This is also related to #130.

Allow users to pass flag_column dictionaries for both train and test sets

Currently, RSMTool allows users to pass a flag_column dictionary, which makes it possible to only use responses with particular values in a given column (see documentation below). This dictionary is used for both the training and test sets.

However, users may wish to remove different flags from the training and test sets, thus this feature would allow for different dictionaries for train and test sets, if desired.

An straightforward way to resolve this may be to add a flag_column_test configuration parameter to the configuration file.

http://rsmtool.readthedocs.io/en/latest/usage_rsmtool.html?highlight=flag_column#flag-column-optional

OSError: [Errno 36] File name too long

While running rsmcompare, the file name became too long and it threw an error. It might be better if we could supply optional parameter report_name in the config file so it does not have the issue of long file name.

Error when config file contains incorrect path

When the config file contains a relative path to a non-existing file, this leads to an error in data.reader instead of the error message "file not found".

  File "/Users/aloukina/miniconda3/envs/rsmtool57/bin/rsmtool", line 11, in <module>
    load_entry_point('rsmtool', 'console_scripts', 'rsmtool')()
  File "/Users/aloukina/Tools/rsmtool/rsmtool/rsmtool/rsmtool.py", line 348, in main
    run_experiment(config_file, output_dir)
  File "/Users/aloukina/Tools/rsmtool/rsmtool/rsmtool/rsmtool.py", line 116, in run_experiment
    data_container = reader.read()
  File "/Users/aloukina/Tools/rsmtool/rsmtool/rsmtool/reader.py", line 246, in read
    if not exists(set_path):
  File "/Users/aloukina/miniconda3/envs/rsmtool57/lib/python3.6/genericpath.py", line 19, in exists
    os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Fix features documentation

The documentation for {{features}} filed need to: (a) give the link to what is 'below' and specify somewhere the format for that field. Currently there is disconnect between the section on selecting features and the list of fields.

Add config option allowing users to save output in other formats (e.g. Excel)

The refactored version of RSMTool includes a Writer class with a write_experiment_output() method. This method can optionally take a file_format argument, which allows output files to be saved as Excel, CSV, or JSON.

Given that we have this functionality, we may want to add a configuration option (and associated checks in the ConfigurationParser class) that allows users to specify other output formats for their experiments.

I think we can add this in after the refactoring is finished.

read_data_file can no longer accept files with atypical names

In (at least) version 5.5.2, it is now not possible to use read_data_file (used in the generation of reports) with an input file that has an atypical extension, e.g. not one of the following: .tsv, .csv, .xls, or .xlsx. This came up in the case where a skll output predictions file was being used, which has the extension .predictions but is basically a .tsv file and could be used in the generation of reports in the past. There are obviously ways to get around this, e.g. by renaming the file perhaps. I'm not sure what the solution would be if implemented directly in rsmtool without adding in some sort of hack, some way of overriding the file extension. Perhaps there could be an ext keyword that has a default value of None and, if specified, its value can be used in place of the file extension and otherwise, if it's None, then it defaults to the actual file extension. Or, since this is probably only an issue for text files (rather than Excel files), perhaps a familiar sep keyword argument could be added that, if specified, would override the file extension.

a --version argument to rsmtool would be nice

Refactor tests to leverage parallelization in CircleCI

In CircleCI, we can run up to 4 containers in parallel. We should try and think about refactoring the tests so that we can leverage that parallelization. An easy option would be to:

Split test_experiment.py into four different files - one for each separate tool to make it a little more fine grained.
Make four groups out of all the test files and run them in parallel in the four separate containers.

Improve confusion matrix and score distributions

Right now, the confusion matrix and the score distribution plots rely entirely on the test set human scores. This means that if there are, say, 6s in the training data but none in the test data, the confusion matrix will only be 5x5 and the score distribution plot will only show 5 bars. Perhaps it would be better to take the union of the human scores from both the training and the test sets?

Add a test for rsmsummarize for models without scaling

Our current tests for rsmsummarize always include scaling which is why one of the bugs in the refactored code was not picked up by the existing tests. We need to fix that.

Warning in rsmcompare when computing summary evaluations

In the old code, load_rsmtool_output in compare.py called make_summary_stat_df which in turn computed the medians of various values.

For some unknown reason, we actually call that functions for all kinds of things that do not later make it into the report like partial correlations by group etc.

When some of the subgroups have small N and therefore NaNs for either marginal or partial correlations, the old make_summary_stat_df function would print a warning when trying to compute the summary stat.

We can probably just suppress that warning.

The branch contains the test that actually fails not because of the warning but because of something else that I did not have time to investigate.

Allow specifying features as a list in the config file

We should allow features to be a field that can also take a list of feature names so that people don't have to make a JSON file if all they want to do is to use the raw features without transforming them. More advanced users who want to play with the signs and transforms can use the JSON file.

Run Python linting and other code quality metrics

I have reactivate SublimeLinter and I am seeing a lot of things that seem to violate PEP-8 conventions. We should fix most, if not all, of these issues.

Check DFF notebook

In the text of the DFF notebook we say "The features are shown after applying transformations (if applicable) and truncation of outliers." Yet looking at the plots those seem to be raw values.

Remove reference to R from the header of model.py

rsmtool hangs when generating the report

When generating multiple reports sequentially (automatically), rsmtool seems to hang at the report generation stage: INFO:traitlets:Executing notebook with kernel: python3.

There are no other rsmtool or ipython processes running on the machine at the time.

I tried adding in a sleep command after each report (for 1 minute), but that did not help.

Unfortunately, it's not reproducible. If I run the same sequence multiple times, sometimes it hangs, sometimes it doesn't. And when it does hang it's not necessarily at the same point.

Pass fixed parameters to SKLL

For SKLL learners that can take in extra arguments via fixed_parameters, we currently don't have a way to pass those into RSMTool. Perhaps we can add a config option like skll_fixed_parameters?

We already added skll_objective. We don't want to replicate all of SKLL's field here though. In the longer term, perhaps a better option would be to have something like skll_args and let that be a dictionary that the user can pass any SKLL fields in?

Remaining ToDos for readthedocs documentation

Add details of pre-processing and post-processing in pipeline.rst.
Add examples of all configuration files.
Read through everything and make sure all the links are in the right place and that they work. Add new links where needed. Also make sure to use stable for explicit links to repository.
Add API documentation.
Properly explain under what conditions the feature directory is created.
Port example from README.md.
Update README.md to point to readthedocs