hpclab / rankeval Goto Github PK

View Code? Open in Web Editor NEW

87.0 14.0 11.0 7.92 MB

Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.

Home Page: http://rankeval.isti.cnr.it/

License: Mozilla Public License 2.0

Python 92.84% C++ 5.99% Shell 1.17%

learning-to-rank evaluation-framework evaluation-metrics analysis-framework ensemble-models regression-trees

rankeval's Introduction

RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions

RankEval is an open-source tool for the analysis and evaluation of Learning-to-Rank models based on ensembles of regression trees. The success of ensembles of regression trees fostered the development of several open-source libraries targeting efficiency of the learning phase and effectiveness of the resulting models. However, these libraries offer only very limited help for the tuning and evaluation of the trained models.

RankEval aims at providing a common ground for several Learning to Rank libraries by providing useful and interoperable tools for a comprehensive comparison and in-depth analysis of ranking models. Target audience is the machine learning (ML) and information retrieval (IR) communities.

RankEval is available under Mozilla Public License 2.0.

The official GitHub repository is: here.

For questions/suggestions on how to improve RankEval, send us an email: [email protected]

Features

Rankeval provides a common ground between several pre-existing tools and offers services which support the interpretation of differently generated models in a unified environment, allowing an easy, comprehensive comparison and in-depth analysis.

The main functionalities of RankEval can be summarized along five dimensions:

effectiveness analysis
feature analysis
structural analysis
topological analysis
interoperability among GBRT libraries

Regarding the interoperability, Rankeval is able to read and process ranking ensembles learned with learning-to-rank libraries such as QuickRank, RankLib, XGBoost, LightGBM, Scikit-Learn, CatBoost, JForest. This advanced interoperability is implemented through proxy classes that make possible to interpret and understand the specific format used to represent the ranking ensemble without using the codebase of the learning-to-rank library. Thus RankEval does not have any dependency from the learning-to-rank library of choice of the user.

These functionalities can be applied to several models at the same time, so to have a direct comparison of the analysis performed. The tool has been written to ensure flexibility, extensibility, and efficiency.

Documentation

The official API documentation is available at: here. Soon on ReadTheDocs!

Installation

The library works with OpenMP so you need a compiler supporting it. If your machine uses a default compiler different from GNU GCC, change it appropriately before proceeding with the installation:

export CC=gcc-5
export CXX=g++-5

Moreover, RankEval needs the following libraries to be installed before the installation process begin (used for compiling the low-level code by the installation process):

numpy >= 1.13
scipy >= 0.14
cython >= 0.25
matplotlib >= 2.0.2

Additional dependencies will be installed automatically by setuptools. RankEval can be installed from the source by running:

python setup.py install

RankEval can also be easily installed from Python Package Index (PyPI). In this case, most probably you don't even need cython locally to compile low level code since the binaries should already been available for your platform.
You may download and install it by running:

pip install rankeval

Alternatively, you can build the library from the latest commit on the master branch of the repository. Below an example of installation.

pip install git+https://github.com/hpclab/rankeval

Development

If you would like to install the library in development mode, i.e., you can edit the source code and see the changes directly without having to reinstall every time that you make a little change, than you have to run the following command which will install as well the libraries required for development (documentation generation and unittests):

pip install -e .[develop]

Local installation of compiled libraries:

python setup.py build_ext -i

Execution of unit tests:

python setup.py test

or (if you have nose already installed):

nosetests -v

Cite RankEval

If you use RankEval, please cite us!

@inproceedings{rankeval-sigir17,
  author = {Claudio Lucchese and Cristina Ioana Muntean and Franco Maria Nardini and
            Raffaele Perego and Salvatore Trani},
  title = {RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions},
  booktitle = {SIGIR 2017: Proceedings of the 40th International {ACM} {SIGIR}
               Conference on Research and Development in Information Retrieval},
  year = {2017},
  location = {Tokyo, Japan}
}

Credits

- Dataset loader: https://github.com/deronnek/svmlight-loader
- Query id implementation: https://github.com/mblondel/svmlight-loader/pull/6

rankeval's People

Stargazers

Watchers

Forkers

zhoujialinmumu qfnuhp zhouyonglong zenogantner gtesei cmacdonald elseviersoftwarex sarshaw abe2g kiminh madquirk-hash

rankeval's Issues

turn coremltools dependency into a soft dependency?

Hi,

coremltools has a fixed version dependency on six 1.10.0, which is not the latest, and which often causes dependency/version problems. See e.g. apple/coremltools#141

It would be nice to turn this into a soft dependency (meaning we should try to import the module on demand, and give out a friendly error message if the import fails), because coremltools does not provide a core feature, but is merely used to support catboost.

Currently, I am commenting out the dependency completely to be able to install rankeval, however this is not a sustainable solution...

inaccurate description/reasoning in LIght

http://rankeval.isti.cnr.it/docs/rankeval.model.html
under rankeval.model.proxy_LightGBM module:

... This is required because LtR datasets do not have missing values, but have feature values equals to zero (while LightGBM consider zero valued feature as missing values). ...

I do not think this is correct.

This is what the LightGBM documentation says:

LightGBM enables the missing value handle by default. Disable it by setting use_missing=false.
LightGBM uses NA (NaN) to represent missing values by default. Change it to use zero by setting zero_as_missing=true.
When zero_as_missing=false (default), the unshown values in sparse matrices (and LightSVM) are treated as zeros.

Input proxy for Jforest.

Hi,

Nice demo yesterday. Perhaps you could add support for parsing Jforests model files. Example XML model file included below. NB: Jforests doesn't use an XML parser to parse it, so there is no declaration at the top of the file.

Let me know if you have question in interpreting how to make the XML format into a tree.

<Ensemble>
    <Tree leaves="7" weight="1.0">
        <SplitFeatures>16 10 6 9 6 9</SplitFeatures>
        <LeftChildren>1 4 -3 -4 -1 -6</LeftChildren>
        <RightChildren>-2 2 3 -5 5 -7</RightChildren>
        <Thresholds>25050 31260 24147 32216 24147 29700</Thresholds>
        <OriginalThresholds>0.7645119941402674 0.9540377220289324 0.7369529390221571 0.9832143075138864 0.7369529390221571 0.9064273942501373</OriginalThresholds>
        <LeafOutputs>-2.0 1.5769671648438965 -0.262839281614885 1.9562399004573359 -2.0 1.9353035413956268 1.5149362903356052</LeafOutputs>
    </Tree>
    <Tree leaves="7" weight="1.0">
        <SplitFeatures>0 0 4 15 4 0</SplitFeatures>
        <LeftChildren>1 3 -3 4 -1 -6</LeftChildren>
        <RightChildren>-2 2 -4 -5 5 -7</RightChildren>
        <Thresholds>32138 28687 31497 0 32358 18957</Thresholds>
        <OriginalThresholds>0.9808337911249466 0.8755112006348044 0.9612708295184033 0.0 0.987548068119392 0.5785570408350119</OriginalThresholds>
        <LeafOutputs>-1.8400881506001467 1.810074728431422 -1.843558296666071 1.8091450294064726 -1.617818578255056 -1.9877378361172633 1.810248328082937</LeafOutputs>
    </Tree>
</Ensemble>

validation and test dataset are not used in feature analysis notebook

msn_validation = dataset_container.validation_dataset
msn_test = dataset_container.test_dataset

Is this intentional?
Or were the supposed to be used somewhere?

For example, instead of the training set, NDCG@10 could be measured on validation or test:

y_pred = msn_lgbm_lmart_1Ktrees_model.score(msn_train)
print "%s: %.3f" % (ndcg_10, ndcg_10.eval(msn_train, y_pred)[0])

support for categorical features and missing values in LGBM

In loading a model I get:

rankeval_lgb_model = RTEnsemble('lgb.model', name="LightGBM model", format="LightGBM")
[...]
AssertionError: Decision Tree not supported. RankEval does not support categorical features and missing values.

Is there a way to work around this? Or will there be support for LGBM cat features and missing values?

wishlist: TF-Ranking support

It would be great to have support for the (fairly new) TF-Ranking library by Google: https://github.com/tensorflow/ranking

dataset documentation does not match behavior

I have a 92921 line input file in LibSVM/RankLib format, with 2155 query IDs:

> wc -l output.libsvm
92921 output.libsvm
> cut -f 2 -d ' ' output.libsvm | sort | uniq | wc -l
2155

dataset = Dataset.load("output.libsvm")
print(dataset)
print("n_queries: %s" % dataset.n_queries)
print("len(query_ids): %s" % len(dataset.query_ids))

Expected output (according to documentation):

Dataset (92921, 6)
n_queries: 2155
len(query_ids): 92921

Actual output:

Dataset (92921, 6)
n_queries: 2155
len(query_ids): 2156

Documentation snippet in https://github.com/hpclab/rankeval/blob/master/rankeval/dataset/dataset.py I am referring to:

query_ids : numpy 1d array of int
        It is a ndarray of shape(nsamples,)

I think the reason why it works not as expected can be found in the constructor:

if len(query_ids) == X.shape[0]:

I guess this is done for easier by-query access.
However, it does not match the documented behavior.
Either the implementation or the documentation should be changed.

Note also that the logic here does not work in the case n_queries == n_instances.

'XGBRanker' object has no attribute 'score'

Ma sta robba non funziona! ;-)

I've created an XGBRanker object with the sklearn API and tried to use the rankeval effectiveness analysis. It requires a score() function, which makes sense, but I don't see that XGBRanker has one, and I don't know if sklearn requires one. Thoughts?

from rankeval.analysis.effectiveness import model_performance

model_perf = model_performance(
    datasets=[x_valid], 
    models=[model], 
    metrics=[precision_10, recall_10, ndcg_10])
model_perf.to_dataframe()

with the following output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-707-76c7248fd353> in <module>()
      4     datasets=[x_valid],
      5     models=[model],
----> 6     metrics=[precision_10, recall_10, ndcg_10])
      7 model_perf.to_dataframe()

~/rankeval/rankeval/rankeval/analysis/effectiveness.py in model_performance(datasets, models, metrics, cache)
     54     for idx_dataset, dataset in enumerate(datasets):
     55         for idx_model, model in enumerate(models):
---> 56             y_pred = model.score(dataset, detailed=False, cache=cache)
     57             for idx_metric, metric in enumerate(metrics):
     58                 data[idx_dataset][idx_model][idx_metric] = metric.eval(dataset,


AttributeError: 'XGBRanker' object has no attribute 'score'

KeyError: 'None of [...] are in the [index]'

I'm a bit lost here. Is there a toy example I can play with ?

from rankeval.analysis.effectiveness import model_performance

model_perf = model_performance(
    datasets=[rank_train], 
    models=[rankeval_model], 
    metrics=[precision_5, recall_5, ndcg_5])

model_perf.to_dataframe()

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-211-bb7252cc105d> in <module>()
      4     datasets=[rank_train],
      5     models=[rankeval_model],
----> 6     metrics=[ ndcg_5])
      7 
      8 model_perf.to_dataframe()

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/analysis/effectiveness.py in model_performance(datasets, models, metrics, cache)
     57             for idx_metric, metric in enumerate(metrics):
     58                 data[idx_dataset][idx_model][idx_metric] = metric.eval(dataset,
---> 59                                                                        y_pred)[0]
     60 
     61     performance = xr.DataArray(data,

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/metrics/ndcg.py in eval(self, dataset, y_pred)
     91             for rel_id, (qid, q_y, _) in enumerate(
     92                     self.query_iterator(dataset, dataset.y)):
---> 93                 idcg_score[rel_id] = self.dcg.eval_per_query(q_y, q_y)
     94 
     95             self._cache_idcg_score[self._current_dataset] = idcg_score

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/metrics/dcg.py in eval_per_query(self, y, y_pred)
     97             gain = y[idx_y_pred_sorted]
     98         elif self.implementation == "exp":
---> 99             gain = np.exp2(y[idx_y_pred_sorted]) - 1.0
    100 
    101         dcg = (gain / discount).sum()

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    808             key = check_bool_indexer(self.index, key)
    809 
--> 810         return self._get_with(key)
    811 
    812     def _get_with(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    840             if key_type == 'integer':
    841                 if self.index.is_integer() or self.index.is_floating():
--> 842                     return self.loc[key]
    843                 else:
    844                     return self._get_values(key)

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1476 
   1477             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1478             return self._getitem_axis(maybe_callable, axis=axis)
   1479 
   1480     def _is_scalar_access(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1899                     raise ValueError('Cannot index with multidimensional key')
   1900 
-> 1901                 return self._getitem_iterable(key, axis=axis)
   1902 
   1903             # nested tuple slicing

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1141             if labels.is_unique and Index(keyarr).is_unique:
   1142                 indexer = ax.get_indexer_for(key)
-> 1143                 self._validate_read_indexer(key, indexer, axis)
   1144 
   1145                 d = {axis: [ax.reindex(keyarr)[0], indexer]}

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1204                 raise KeyError(
   1205                     u"None of [{key}] are in the [{axis}]".format(
-> 1206                         key=key, axis=self.obj._get_axis_name(axis)))
   1207 
   1208             # we skip the warning on Categorical/Interval

KeyError: 'None of [3807    76\n4956    59\n3972    72\n635     73\n3664    20\nName: target, dtype: int64] are in the [index]'

 =

Problem with VERSION file

When installing from Github (either via setup.py, or pip3), I get the following error when trying to import anything from rankeval:

Traceback (most recent call last):
  File "./ranking-eval.py", line 10, in <module>
    from rankeval.analysis.effectiveness import query_class_performance
  File "/usr/local/lib/python3.6/dist-packages/rankeval/__init__.py", line 11, in <module>
    encoding='utf-8').read().strip()
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/rankeval/../VERSION'

Looking at /usr/local/lib/python3.6/dist-packages/, there is no VERSION (and there should not be, of course). Note that in the source tree, the VERSION file is present in the parent directory of init.py.

give example of loading dataset and models without using using dataset_container / load_dataset()

support for build with Xcode?

Mac OSX Xcode supports OpenMP per this post https://iscinumpy.gitlab.io/post/omp-on-high-sierra/ . I tried it by setting the following env variables:

export CC='clang -Xpreprocessor '
export CXX='clang++ -Xpreprocessor '

and I got pretty far, were it not for this error:

clang -Xpreprocessor -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/paulperry/anaconda3/include -arch x86_64 -I/Users/paulperry/anaconda3/include -arch x86_64 -I./rankeval/analysis -I/Users/paulperry/anaconda3/include/python3.6m -I/Users/paulperry/anaconda3/lib/python3.6/site-packages/numpy/core/include -c ./rankeval/analysis/_efficient_feature_impl.cpp -o build/temp.macosx-10.7-x86_64-3.6/./rankeval/analysis/_efficient_feature_impl.o -fopenmp -O3 -w -std=c++11
    ./rankeval/analysis/_efficient_feature_impl.cpp:90:27: error: no matching constructor for initialization of 'std::vector<TreeNode>'
        std::vector<TreeNode> queue = { root };
                              ^       ~~~~~~~~

Where this might fix it: https://stackoverflow.com/questions/26144299/compiler-error-when-constructing-a-vector-of-pairs

But before I go mucking with the code I wondered if anyone else has gone down this path and succeeded. Thx

support Python 3

Do you have plans to support Python 3?
Would you be interested in pull requests that move the project closer to supporting Python 3?

No version file /site-packages/rankeval/../VERSION'

Updating to the latest build I get:

!pip install rankeval

Collecting rankeval
  Downloading https://files.pythonhosted.org/packages/83/cb/20aa574ce29312e8a7e2bc79fd1f9ebccebff8015866133073979d99b543/rankeval-0.7.2.tar.gz (8.6MB)
    100% |████████████████████████████████| 8.6MB 4.1MB/s 
Requirement already satisfied: numpy>=1.13 in /opt/conda/lib/python3.6/site-packages (from rankeval) (1.15.2)
Requirement already satisfied: scipy>=0.14.0 in /opt/conda/lib/python3.6/site-packages (from rankeval) (1.1.0)
Requirement already satisfied: six>=1.9.0 in /opt/conda/lib/python3.6/site-packages (from rankeval) (1.11.0)
Requirement already satisfied: pandas>=0.19.1 in /opt/conda/lib/python3.6/site-packages (from rankeval) (0.23.4)
Requirement already satisfied: xarray>=0.9.5 in /opt/conda/lib/python3.6/site-packages (from rankeval) (0.10.9)
Requirement already satisfied: seaborn>=0.8 in /opt/conda/lib/python3.6/site-packages (from rankeval) (0.9.0)
Collecting coremltools>=0.8 (from rankeval)
  Downloading https://files.pythonhosted.org/packages/b9/9d/7ec5a2480c6afce4fcb99de1650b7abfd1457b2ef1de5ce39bf7bee8a8ae/coremltools-2.1.0-cp36-none-manylinux1_x86_64.whl (2.7MB)
    100% |████████████████████████████████| 2.7MB 5.9MB/s 
Requirement already satisfied: matplotlib>=2.0.2 in /opt/conda/lib/python3.6/site-packages (from rankeval) (2.2.3)
Requirement already satisfied: python-dateutil>=2.5.0 in /opt/conda/lib/python3.6/site-packages (from pandas>=0.19.1->rankeval) (2.6.0)
Requirement already satisfied: pytz>=2011k in /opt/conda/lib/python3.6/site-packages (from pandas>=0.19.1->rankeval) (2018.5)
Requirement already satisfied: protobuf>=3.1.0 in /opt/conda/lib/python3.6/site-packages (from coremltools>=0.8->rankeval) (3.6.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=2.0.2->rankeval) (2.2.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=2.0.2->rankeval) (1.0.1)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=2.0.2->rankeval) (0.10.0)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.6/site-packages (from protobuf>=3.1.0->coremltools>=0.8->rankeval) (39.1.0)
Building wheels for collected packages: rankeval
  Running setup.py bdist_wheel for rankeval ... done
  Stored in directory: /root/.cache/pip/wheels/61/96/a8/6d3b323ae7c815d647e20e949b19437a9198c375afcb9c6d31
Successfully built rankeval
mxnet 1.3.0.post0 has requirement numpy<1.15.0,>=1.8.2, but you'll have numpy 1.15.2 which is incompatible.
kmeans-smote 0.1.0 has requirement imbalanced-learn<0.4,>=0.3.1, but you'll have imbalanced-learn 0.5.0.dev0 which is incompatible.
kmeans-smote 0.1.0 has requirement numpy<1.15,>=1.13, but you'll have numpy 1.15.2 which is incompatible.
fastai 0.7.0 has requirement torch<0.4, but you'll have torch 0.4.1 which is incompatible.
anaconda-client 1.7.2 has requirement python-dateutil>=2.6.1, but you'll have python-dateutil 2.6.0 which is incompatible.
imbalanced-learn 0.5.0.dev0 has requirement scikit-learn>=0.20, but you'll have scikit-learn 0.19.1 which is incompatible.
Installing collected packages: coremltools, rankeval
Successfully installed coremltools-2.1.0 rankeval-0.7.2
You are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

import rankeval

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-122-c66b5899c31b> in <module>
----> 1 import rankeval

/opt/conda/lib/python3.6/site-packages/rankeval/__init__.py in <module>
      9 
     10 __version__ = io.open(os.path.join(cur_dir, '..', 'VERSION'),
---> 11                       encoding='utf-8').read().strip()

FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/lib/python3.6/site-packages/rankeval/../VERSION'

dataset.clear_X(): What is it used for?

Is this method really needed/worth keeping?

It is not used within the code base.
dataset.X = None should be sufficient if one really wants to save memory under specific circumstances, and should leave X for garbage collection (if there are no other references to it).

del self.X modifies something that was passed into the object.
Not sure whether this is really something that you want to do...

XGBoost loader fails when the training prunes out some nodes

Currently, Rankeval does not support loading a XGBoost model with holes in the node identifiers. This effect takes place when a XGBoost model is trained and the pruning phase (of XGBoost) removes some nodes from the final model, leaving the identifiers associated with these nodes out.

E.g., the following tree miss of the node identifiers 9 and 10 that were removed at training time (2 pruned nodes from the training log of this tree):

booster[0]:
0:[f64<0.00485350005] yes=1,no=2,missing=1
        1:[f133<0.5] yes=3,no=4,missing=3
                3:[f109<22.529314] yes=7,no=8,missing=7
                        7:[f114<-26.4824524] yes=15,no=16,missing=15
                                15:leaf=-0.0236083996
                                16:leaf=-0.0109101823
                        8:[f94<401.155518] yes=17,no=18,missing=17
                                17:leaf=-0.00601068465
                                18:leaf=0.012365385
                4:leaf=0.0293395668
        2:[f133<0.5] yes=5,no=6,missing=5
                5:[f107<11.7844181] yes=11,no=12,missing=11
                        11:[f116<-6.82519722] yes=23,no=24,missing=23
                                23:leaf=-0.00373374182
                                24:leaf=0.0127977673
                        12:[f17<28.2889977] yes=25,no=26,missing=25
                                25:leaf=0.00760455802
                                26:leaf=0.0176214874
                6:[f134<7.5] yes=13,no=14,missing=13
                        13:[f131<69.5] yes=27,no=28,missing=27
                                27:leaf=0.0253543444
                                28:leaf=6.53095121e-05
                        14:leaf=0.0354171321

The solution is to consider the node identifier not as strictly incremental but with possible holes inside.

feature importance should be normalised.

It is customary to normalise them to be 100 max. See Hastie et al. pg 368:
https://web.stanford.edu/~hastie/Papers/ESLII.pdf

I used

100*feature_analysis.T[:,0]/np.max(feature_analysis.T[:,0])

feature_importance error

I'm running into an error and summarized it in this toy example:

X = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
y = pd.DataFrame([0,0,1])
g = pd.Series([1,1,2])
dataset = Dataset(X, y, g, name='dataset')
mse = MSE()

feature_analysis = feature_importance(model=X, dataset=dataset, metric=mse)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-be667221664d> in <module>
      5 mse = MSE()
      6 
----> 7 feature_analysis = feature_importance(model=X, dataset=dataset, metric=mse)

~/anaconda3/lib/python3.7/site-packages/rankeval/analysis/feature.py in feature_importance(model, dataset, metric, normalize)
     63 
     64     if isinstance(metric, RMSE) or isinstance(metric, MSE):
---> 65         feature_imp, feature_count = eff_feature_importance(model, dataset)
     66         if isinstance(metric, RMSE):
     67             feature_imp[0] = np.sqrt(feature_imp[0])

~/anaconda3/lib/python3.7/site-packages/rankeval/analysis/_efficient_feature.pyx in rankeval.analysis._efficient_feature.eff_feature_importance()

TypeError: Cannot convert DataFrame to numpy.ndarray

supporting (gzip) compressed svmlight formatted files?

please.

Perhaps this SO comment has the solution:
https://stackoverflow.com/a/7841686/9989050

Craig