Giter Site home page Giter Site logo

Comments (10)

cosmic-cortex avatar cosmic-cortex commented on May 27, 2024

Hi there!

You are right, multiple instances can be selected using n_instances in the built-in query strategies, but so far, it is very simple and only returns the instances with largest utility value. It can happen that they are very close to each other in the feature space. I think it is a good idea to implement more sophisticated strategies as you suggested.

Currently, this is how a query strategy looks like in general.

def custom_query_strategy(classifier, X, a_keyword_argument=42):
    # measure the utility of each instance in the pool
    utility = utility_measure(classifier, X)

    # select the indices of the instances to be queried
    query_idx = select_instances(utility)

    # return the indices and the instances
    return query_idx, X[query_idx]

In the built-in queries, the select_instances() function is multi_argmax(values, n_instances=1) from modAL.utils.selection, which does what I have described earlier. By replacing this with the function described in the Ranked batch-mode paper, it can be included in the current query strategies easily.

What I would suggest is to add these functions to the modAL.utils.selection. If you open a pull request as you suggested, we can work together on integrating these features!

from modal.

dataframing avatar dataframing commented on May 27, 2024

Sounds good! I'll try and get a PR in by tomorrow night (my time, so Wednesday morning your time?). Thanks!

from modal.

dataframing avatar dataframing commented on May 27, 2024

Hey Tivadar, a quick question: as it turns out, I'm having a bit of difficulty integrating my implementation of the ranked batch mode learner with modAL's architecture. Not because I don't understand the individual pieces (I think I understand those...) but I think it's because it's a model and not just a query strategy...does that make sense? E.g. the paper describes being able to access the "model" training data, which doesn't seem accessible from the scope of any particular sampling function within modAL.utils.selection, unless I'm mistaken.

For what it's worth, feel free to check out my implementation in the gist here (the actual implementation is below the notebook). It's not properly commented nor done, but maybe you get an idea of what I mean? The RankedBatchLearner's query method (again, I know it's not aligned with the modAL API/architecture) relies on being able to build off of the initial training set for building the ranked batch.

from modal.

cosmic-cortex avatar cosmic-cortex commented on May 27, 2024

Thanks, great work! I'll review your implementation in detail soon.

The problem you mention about training data not accessible in the scope of functions within modAL.utils.selection is not necessarily true. The first argument of every query strategy is the active learner itself which has access to the training data. Since the selection functions are usually called within the query strategy, you can pass these as arguments for the function selecting the instances. I'll try to outline this in detail in code today after I finished reading the paper.

Also, a quick note. In modAL.density, you can find the similarize_distance decorator, which can be used for similar purposes you used the euclidean_sim. Feel free to use it if you find it suitable!

from modal.

dataframing avatar dataframing commented on May 27, 2024

Ah! I see now. If that's the case, it shouldn't be too bad going forward — I had assumed the classifier was the core estimator.

Also, a quick note. In modAL.density, you can find the similarize_distance decorator, which can be used for similar purposes you used the euclidean_sim. Feel free to use it if you find it suitable!

Hahaha yes, I hand-rolled my own during the implementation but realized it's already there and modular enough for any distance function afterwards. Thanks!!

from modal.

cosmic-cortex avatar cosmic-cortex commented on May 27, 2024

Ok, now the feature is implented and merged to the dev branch! Thanks @dataframing!

from modal.

nikolay-bushkov avatar nikolay-bushkov commented on May 27, 2024

Thanks for the feature! I went through the tutorial and it works fine, but I see here https://github.com/cosmic-cortex/modAL/blob/308af9b0ffff30597431ffac5ca44e3ad518c607/examples/ranked_batch_mode.py#L36
a possibility to get X_raw.shape[0] as a training index that will result in IndexError...
Check my suspicion, please)

Also, in order to have consistent documentation across web-site, Jupyter notebooks and py-files it is convinient to use Sphinx with a couple of extensions... PyTorch tutorials, for example, are built using these things.

from modal.

cosmic-cortex avatar cosmic-cortex commented on May 27, 2024

I plan to switch to Sphinx, several people suggested that the API reference and the website should be merged. It might take a while however, since I am not familiar with it.

from modal.

nikolay-bushkov avatar nikolay-bushkov commented on May 27, 2024

Recently, I was responsible for documentation refactoring here http://docs.deeppavlov.ai/en/master/intro/hello_bot.html
I can contribute to modAL doing similar refactoring if you ok with it, but at first, we need to discuss several things.

I see some sphinx/readthedocs artifacts in the repo, but eventually, Github Pages is used for the site, am I right?

I strongly suggest choosing NumPy or Google style for docstrings (http://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html). Conversion could be done with https://github.com/dadadel/pyment. For example, sklearn uses NumPy style, while PyTorch uses Google style. As for me, the latter is preferred in case of type annotation in function signatures. What do you think?

Considering type annotation. As I understand, Python 2 support is not planned (and its great!), so it is useful to force type annotation, which simplifies docstrings (type annotation is more about syntax, while docstrings is more about semantic).

Also, I have not found licensing information. I suggest using Apache 2.0 as a friendly one for both academia and industry.

Waiting for your reply...

from modal.

cosmic-cortex avatar cosmic-cortex commented on May 27, 2024

Sorry for not answering sooner, I was on vacation for the last two weeks.

It sounds great! I would really appreciate your help! I have opened up a new issue (#22) for discussing this. Github pages is used for the site itself and readthedocs for the autogenerated documentation. I would like to merge these two just as the PyTorch docs for instance, as you have mentioned.

I am not really familiar with the NumPy or Google style themselves, but I'll take a look ASAP.

Regarding licensing, I use MIT license (https://github.com/cosmic-cortex/modAL/blob/master/LICENSE).

from modal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.