educationaltestingservice / rsmtool Goto Github PK
View Code? Open in Web Editor NEWA Python package to facilitate research on building and evaluating automated scoring models.
Home Page: https://rsmtool.readthedocs.io
License: Apache License 2.0
A Python package to facilitate research on building and evaluating automated scoring models.
Home Page: https://rsmtool.readthedocs.io
License: Apache License 2.0
I think it should probably fail or at least specify that the dependencies need to be installed.
mkvirtualenv rsmtool-tester --python=/usr/bin/python3
cd $checkoutdir
pip install .
... # succeeds quietly
We are currently forcing the objective to be pearson.
Also, display the objective used in the report.
And potentially change the default from pearson to neg_mean_error.
Since we have releases for every new version, there's no point in having a stable branch.
Currently, all built-in models in RSMTool Refactor are written out in an if
statement in a single method --train_builtin_model()
-- in the Modeler
class. It would be good to break these out into separate methods to (1) make the code more readable, and (2) make it easier to override these methods if we want to use a different implementation of any model in a separate branch.
HM/sqrt(HH)
These will only be available if second human is provided.
It could be useful to have links in .html report which would allow you to immediately open the specific .csv files for this given analysis rather than asking people to remember our sometimes not very transparent naming scheme. I was just trying to show all of this to Jay and it's a bit messy even with output_csv.md file. The problem is of course that if the report is forwarded without the output folder the links will be lost.
A perhaps better alternative could be to generate a separate .html report which contains only explanations+links to available .csv files and store this report under output. Then somebody from IDEAS who ran the tool and then wants to play around with .csv could quickly locate them by using this "report".
Merge rsmtool
master branch into rsmtool-refactor
branch and resolve inconsistencies.
This will entail incorporating all changes associated with v5.7 release, including upgrading Python and skll
versions and updating unit test data.
Valid boolean values in JSON files do not include quotation marks surrounding the values, so, if these values are surrounded by quotation marks in RSMTool configuration files, the values are not interpreted correctly (or do not raise an exception that might be helpful in identifying the problem). For example, if exclude_zero_scores
is set to "false"
(rather than false
) in an RSMEval configuration file, it will still be interpreted as True
and could log a confusing warning to that effect. If not changed to accepte quoted boolean values, then maybe the invalid value can just be rejected immediately and a helpful error message could be displayed.
Seems like deprecation warnings should be disabled in testing if you're explicitly expecting them, or the tests should use something else.
But you're also getting a RuntimeWarning that might be meaningful -- and if not, worth trying to suppress, because it looks dire.
$ nosetests
./home/jeremy/Envs/rsmtool-dev/lib/python3.5/site-packages/numpy/lib/function_base.py:2079: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
.................................................../home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "expID" is deprecated and will be removed in a future release, please use the new field name "experiment_id" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "train.lab" is deprecated and will be removed in a future release, please use the new field name "train_label_column" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "LRmodel" is deprecated and will be removed in a future release, please use the new field name "model" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "test" is deprecated and will be removed in a future release, please use the new field name "test_file" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "test.lab" is deprecated and will be removed in a future release, please use the new field name "test_label_column" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "trim.max" is deprecated and will be removed in a future release, please use the new field name "trim_max" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "train" is deprecated and will be removed in a future release, please use the new field name "train_file" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "trim.min" is deprecated and will be removed in a future release, please use the new field name "trim_min" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:170: DeprecationWarning: The field name "feature" is deprecated and will be removed in a future release, please use the new field name "features" instead.
category=DeprecationWarning)
/home/jeremy/src/rsmtool/rsmtool/input.py:205: DeprecationWarning: The model name "empWt" is deprecated and will be removed in a future release, please use the new model name "LinearRegression" instead.
category=DeprecationWarning)
..............................................................................................................................................................
----------------------------------------------------------------------
Ran 210 tests in 83.081s
OK
For a relatively large Train/Test set (300K + 100K, 44 feats), the generate predicitions
step takes much longer than any other step.
The delay is at the step predict:predict_with_the_model function before it starts post-processing the predictions.
We'll need to add the to:
Large HTML reports with lots of images can take a very long time to load. It would be helpful to allow users to generate clickable thumbnail images (rather than full-sized images) in the report. The images can be saved in the figures
folder.
In addition, it may be useful to produce a zipped version of the experiment.
Below is the relevant context from our output. This might be a jupyter_client issue.
This seems to be intermittent: it only happened once among 7 datasets. This dataset has had rsmeval work on its output without problem in the past.
INFO:rsmtool.rsmeval:Processing predictions
INFO:rsmtool.rsmeval:Saving pre-processed predictions and the metadata to disk
INFO:rsmtool.rsmeval:Running analyses on predictions
INFO:rsmtool.rsmeval:Starting report generation
INFO:rsmtool.report:Merging sections
INFO:rsmtool.report:Exporting HTML
INFO:root:Executing notebook with kernel: python3
INFO:rsmtool.rsmeval:Assuming given system predictions are unscaled and will be used as such.
INFO:rsmtool.rsmeval:Reading predictions: [...]
INFO:rsmtool.rsmeval:Processing predictions
INFO:rsmtool.rsmeval:Saving pre-processed predictions and the metadata to disk
INFO:rsmtool.rsmeval:Running analyses on predictions
INFO:rsmtool.rsmeval:Starting report generation
INFO:rsmtool.report:Merging sections
INFO:rsmtool.report:Exporting HTML
INFO:root:Executing notebook with kernel: python3
Traceback (most recent call last):
[...]
File "/opt/python/rsmtool/lib/python3.4/site-packages/rsmtool/rsmeval.py", line 485, in run_evaluation
context='rsmeval')
File "/opt/python/rsmtool/lib/python3.4/site-packages/rsmtool/report.py", line 513, in create_report
join(reportdir, '{}.html'.format(report_name)))
File "/opt/python/rsmtool/lib/python3.4/site-packages/rsmtool/report.py", line 592, in convert_ipynb_to_html
output, resources = exportHtml.from_filename(notebook_file)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 165, in from_filename
return self.from_file(f, resources=resources, **kw)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 183, in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/html.py", line 65, in from_notebook_node
return super(HTMLExporter, self).from_notebook_node(nb, resources, **kw)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/templateexporter.py", line 200, in from_notebook_node
nb_copy, resources = super(TemplateExporter, self).from_notebook_node(nb, resources, **kw)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 130, in from_notebook_node
nb_copy, resources = self._preprocess(nb_copy, resources)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/exporters/exporter.py", line 302, in _preprocess
nbc, resc = preprocessor(nbc, resc)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/preprocessors/base.py", line 47, in __call__
return self.preprocess(nb,resources)
File "/opt/python/rsmtool/lib/python3.4/site-packages/nbconvert/preprocessors/execute.py", line 141, in preprocess
cwd=path)
File "/opt/python/rsmtool/lib/python3.4/site-packages/jupyter_client/manager.py", line 433, in start_new_kernel
kc.wait_for_ready(timeout=startup_timeout)
File "/opt/python/rsmtool/lib/python3.4/site-packages/jupyter_client/blocking/client.py", line 59, in wait_for_ready
raise RuntimeError('Kernel died before replying to kernel_info')
RuntimeError: Kernel died before replying to kernel_info
Change the copyright year in LICENSE and on ReadTheDocs to 2018.
In RSMTool as well as RSMCompare documentation, it should be evaluation_by_group
Ref: http://rsmtool.readthedocs.io/en/latest/usage_rsmtool.html#general-sections-optional
After we filter out flagged response, we should force length_column
to be numeric before checking for missing values/std==0.
The following functions from API are used in other research scripts:
from rsmtool.analysis import metrics_helper
from rsmtool.report import convert_ipynb_to_html
from rsmtool.preprocess import remove_outliers
Would be good to check if it's possible to keep those available with a deprecation warning or a document suggesting how to achieve the same functionality?
Once we update the SKLL dependency to v1.5.1 since the issue will be fixed in that release.
The log now prints the following two messages which can probably be combined into a single message:
Reading, normalizing, validating, and processing configuration.
Reading configuration file: /Users/nmadnani/work/rsmtool/tests/data/experiments/lr-with-custom-sections-and-order/lr_with_custom_sections_and_order.json
It would be nice in the subgroup evaluation sections to only show subgroups that have more than X members, where you would define X in the config.
I'm running v 5.1.0 rsmcompare on two rsmeval outputs (generated with v 5.0.1), but the comparison report says that "This information is not available for either of the models.". If I look at the individual rsmeval reports, the information does seem to be there.
In particular, I am looking at the following sections: consistency, score_distributions, evaluation
Is is that this rsmcompare comparison only works with reports generated with this version too? UPDATE: I just regenerated the rsmeval output using the latest version of rsmtool and am still having the same problem.
A few minor nitpicks from pyflakes
$ pyflakes3 rsmtool/*.py
rsmtool/__init__.py:11: 'rsmextra' imported but unused
rsmtool/model.py:508: local variable 'intercept' is assigned to but never used
rsmtool/predict.py:38: local variable 'logger' is assigned to but never used
rsmtool/rsmpredict.py:320: local variable 'logger' is assigned to but never used
$ pep8 -q rsmtool/
rsmtool/analysis.py
rsmtool/create_features.py
rsmtool/input.py
rsmtool/model.py
rsmtool/predict.py
rsmtool/preprocess.py
rsmtool/report.py
rsmtool/rsmcompare.py
rsmtool/rsmeval.py
rsmtool/rsmpredict.py
rsmtool/rsmtool.py
rsmtool/test_utils.py
rsmtool/utils.py
Update the API page to list the static/instance methods that we think are likely to be useful. This is also related to #130.
Currently, RSMTool allows users to pass a flag_column
dictionary, which makes it possible to only use responses with particular values in a given column (see documentation below). This dictionary is used for both the training and test sets.
However, users may wish to remove different flags from the training and test sets, thus this feature would allow for different dictionaries for train and test sets, if desired.
An straightforward way to resolve this may be to add a flag_column_test
configuration parameter to the configuration file.
While running rsmcompare
, the file name became too long and it threw an error. It might be better if we could supply optional parameter report_name
in the config file so it does not have the issue of long file name.
When the config file contains a relative path to a non-existing file, this leads to an error in data.reader
instead of the error message "file not found".
File "/Users/aloukina/miniconda3/envs/rsmtool57/bin/rsmtool", line 11, in <module>
load_entry_point('rsmtool', 'console_scripts', 'rsmtool')()
File "/Users/aloukina/Tools/rsmtool/rsmtool/rsmtool/rsmtool.py", line 348, in main
run_experiment(config_file, output_dir)
File "/Users/aloukina/Tools/rsmtool/rsmtool/rsmtool/rsmtool.py", line 116, in run_experiment
data_container = reader.read()
File "/Users/aloukina/Tools/rsmtool/rsmtool/rsmtool/reader.py", line 246, in read
if not exists(set_path):
File "/Users/aloukina/miniconda3/envs/rsmtool57/lib/python3.6/genericpath.py", line 19, in exists
os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
The documentation for {{features}} filed need to: (a) give the link to what is 'below' and specify somewhere the format for that field. Currently there is disconnect between the section on selecting features and the list of fields.
The refactored version of RSMTool includes a Writer
class with a write_experiment_output()
method. This method can optionally take a file_format
argument, which allows output files to be saved as Excel, CSV, or JSON.
Given that we have this functionality, we may want to add a configuration option (and associated checks in the ConfigurationParser
class) that allows users to specify other output formats for their experiments.
I think we can add this in after the refactoring is finished.
In (at least) version 5.5.2, it is now not possible to use read_data_file
(used in the generation of reports) with an input file that has an atypical extension, e.g. not one of the following: .tsv
, .csv
, .xls
, or .xlsx
. This came up in the case where a skll
output predictions file was being used, which has the extension .predictions
but is basically a .tsv
file and could be used in the generation of reports in the past. There are obviously ways to get around this, e.g. by renaming the file perhaps. I'm not sure what the solution would be if implemented directly in rsmtool
without adding in some sort of hack, some way of overriding the file extension. Perhaps there could be an ext
keyword that has a default value of None
and, if specified, its value can be used in place of the file extension and otherwise, if it's None
, then it defaults to the actual file extension. Or, since this is probably only an issue for text files (rather than Excel files), perhaps a familiar sep
keyword argument could be added that, if specified, would override the file extension.
In CircleCI, we can run up to 4 containers in parallel. We should try and think about refactoring the tests so that we can leverage that parallelization. An easy option would be to:
test_experiment.py
into four different files - one for each separate tool to make it a little more fine grained.Right now, the confusion matrix and the score distribution plots rely entirely on the test set human scores. This means that if there are, say, 6s in the training data but none in the test data, the confusion matrix will only be 5x5 and the score distribution plot will only show 5 bars. Perhaps it would be better to take the union of the human scores from both the training and the test sets?
Our current tests for rsmsummarize
always include scaling which is why one of the bugs in the refactored code was not picked up by the existing tests. We need to fix that.
In the old code, load_rsmtool_output
in compare.py
called make_summary_stat_df
which in turn computed the medians of various values.
For some unknown reason, we actually call that functions for all kinds of things that do not later make it into the report like partial correlations by group etc.
When some of the subgroups have small N and therefore NaNs for either marginal or partial correlations, the old make_summary_stat_df
function would print a warning when trying to compute the summary stat.
We can probably just suppress that warning.
The branch contains the test that actually fails not because of the warning but because of something else that I did not have time to investigate.
We should allow features
to be a field that can also take a list of feature names so that people don't have to make a JSON file if all they want to do is to use the raw features without transforming them. More advanced users who want to play with the signs and transforms can use the JSON file.
I have reactivate SublimeLinter and I am seeing a lot of things that seem to violate PEP-8 conventions. We should fix most, if not all, of these issues.
In the text of the DFF notebook we say "The features are shown after applying transformations (if applicable) and truncation of outliers." Yet looking at the plots those seem to be raw values.
When generating multiple reports sequentially (automatically), rsmtool seems to hang at the report generation stage: INFO:traitlets:Executing notebook with kernel: python3
.
There are no other rsmtool or ipython processes running on the machine at the time.
I tried adding in a sleep command after each report (for 1 minute), but that did not help.
Unfortunately, it's not reproducible. If I run the same sequence multiple times, sometimes it hangs, sometimes it doesn't. And when it does hang it's not necessarily at the same point.
For SKLL learners that can take in extra arguments via fixed_parameters
, we currently don't have a way to pass those into RSMTool. Perhaps we can add a config option like skll_fixed_parameters
?
We already added skll_objective
. We don't want to replicate all of SKLL's field here though. In the longer term, perhaps a better option would be to have something like skll_args
and let that be a dictionary that the user can pass any SKLL fields in?
pipeline.rst
.stable
for explicit links to repository.feature
directory is created.README.md
.Create a new configuration parameter -- something like zip_directory
-- to zip the experiment directory. This should be triggered automatically with use_thumbnails
, but optional otherwise.
Currently, rsmtool
saves scaled/rounded/trimmed scores for test set predictions only. Also, save them for training set predictions.
I think at least the jupyter
dependencies should be installed separately (perhaps in a notebook
extra?)
Currently they are fixed at +0.49998 and -0.49998 which are reasonable, but it would be nice to be able to override them in certain situations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.