Hi, I've run into a bit of a use-case that I'm not sure is quite

Support batch-mode queries? about modal HOT 10 CLOSED

modal-python commented on May 27, 2024

Support batch-mode queries?

from modal.

Comments (10)

cosmic-cortex commented on May 27, 2024

Hi there!

You are right, multiple instances can be selected using n_instances in the built-in query strategies, but so far, it is very simple and only returns the instances with largest utility value. It can happen that they are very close to each other in the feature space. I think it is a good idea to implement more sophisticated strategies as you suggested.

Currently, this is how a query strategy looks like in general.

def custom_query_strategy(classifier, X, a_keyword_argument=42):
    # measure the utility of each instance in the pool
    utility = utility_measure(classifier, X)

    # select the indices of the instances to be queried
    query_idx = select_instances(utility)

    # return the indices and the instances
    return query_idx, X[query_idx]

In the built-in queries, the select_instances() function is multi_argmax(values, n_instances=1) from modAL.utils.selection, which does what I have described earlier. By replacing this with the function described in the Ranked batch-mode paper, it can be included in the current query strategies easily.

What I would suggest is to add these functions to the modAL.utils.selection. If you open a pull request as you suggested, we can work together on integrating these features!

from modal.

dataframing commented on May 27, 2024

Sounds good! I'll try and get a PR in by tomorrow night (my time, so Wednesday morning your time?). Thanks!

from modal.

dataframing commented on May 27, 2024

Hey Tivadar, a quick question: as it turns out, I'm having a bit of difficulty integrating my implementation of the ranked batch mode learner with modAL's architecture. Not because I don't understand the individual pieces (I think I understand those...) but I think it's because it's a model and not just a query strategy...does that make sense? E.g. the paper describes being able to access the "model" training data, which doesn't seem accessible from the scope of any particular sampling function within modAL.utils.selection, unless I'm mistaken.

For what it's worth, feel free to check out my implementation in the gist here (the actual implementation is below the notebook). It's not properly commented nor done, but maybe you get an idea of what I mean? The RankedBatchLearner's query method (again, I know it's not aligned with the modAL API/architecture) relies on being able to build off of the initial training set for building the ranked batch.

from modal.

cosmic-cortex commented on May 27, 2024

Thanks, great work! I'll review your implementation in detail soon.

The problem you mention about training data not accessible in the scope of functions within modAL.utils.selection is not necessarily true. The first argument of every query strategy is the active learner itself which has access to the training data. Since the selection functions are usually called within the query strategy, you can pass these as arguments for the function selecting the instances. I'll try to outline this in detail in code today after I finished reading the paper.

Also, a quick note. In modAL.density, you can find the similarize_distance decorator, which can be used for similar purposes you used the euclidean_sim. Feel free to use it if you find it suitable!

from modal.

dataframing commented on May 27, 2024

Ah! I see now. If that's the case, it shouldn't be too bad going forward — I had assumed the classifier was the core estimator.

Also, a quick note. In modAL.density, you can find the similarize_distance decorator, which can be used for similar purposes you used the euclidean_sim. Feel free to use it if you find it suitable!

Hahaha yes, I hand-rolled my own during the implementation but realized it's already there and modular enough for any distance function afterwards. Thanks!!

from modal.

cosmic-cortex commented on May 27, 2024

Ok, now the feature is implented and merged to the dev branch! Thanks @dataframing!

from modal.

nikolay-bushkov commented on May 27, 2024

Thanks for the feature! I went through the tutorial and it works fine, but I see here https://github.com/cosmic-cortex/modAL/blob/308af9b0ffff30597431ffac5ca44e3ad518c607/examples/ranked_batch_mode.py#L36
a possibility to get X_raw.shape[0] as a training index that will result in IndexError...
Check my suspicion, please)

Also, in order to have consistent documentation across web-site, Jupyter notebooks and py-files it is convinient to use Sphinx with a couple of extensions... PyTorch tutorials, for example, are built using these things.

from modal.

cosmic-cortex commented on May 27, 2024

I plan to switch to Sphinx, several people suggested that the API reference and the website should be merged. It might take a while however, since I am not familiar with it.

from modal.

nikolay-bushkov commented on May 27, 2024

Recently, I was responsible for documentation refactoring here http://docs.deeppavlov.ai/en/master/intro/hello_bot.html
I can contribute to modAL doing similar refactoring if you ok with it, but at first, we need to discuss several things.

I see some sphinx/readthedocs artifacts in the repo, but eventually, Github Pages is used for the site, am I right?

I strongly suggest choosing NumPy or Google style for docstrings (http://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html). Conversion could be done with https://github.com/dadadel/pyment. For example, sklearn uses NumPy style, while PyTorch uses Google style. As for me, the latter is preferred in case of type annotation in function signatures. What do you think?

Considering type annotation. As I understand, Python 2 support is not planned (and its great!), so it is useful to force type annotation, which simplifies docstrings (type annotation is more about syntax, while docstrings is more about semantic).

Also, I have not found licensing information. I suggest using Apache 2.0 as a friendly one for both academia and industry.

Waiting for your reply...

from modal.

cosmic-cortex commented on May 27, 2024

Sorry for not answering sooner, I was on vacation for the last two weeks.

It sounds great! I would really appreciate your help! I have opened up a new issue (#22) for discussing this. Github pages is used for the site itself and readthedocs for the autogenerated documentation. I would like to merge these two just as the PyTorch docs for instance, as you have mentioned.

I am not really familiar with the NumPy or Google style themselves, but I'll take a look ASAP.

Regarding licensing, I use MIT license (https://github.com/cosmic-cortex/modAL/blob/master/LICENSE).

from modal.

Support batch-mode queries? about modal HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent