Giter Site home page Giter Site logo

Directions forward about behavioralthor HOT 6 OPEN

ardila avatar ardila commented on September 9, 2024
Directions forward

from behavioralthor.

Comments (6)

ardila avatar ardila commented on September 9, 2024

MCC confusion matrices
HMO confusion matrix
screen shot 2013-09-30 at 6 06 53 pm

Pixel confusion matrix
screen shot 2013-09-30 at 6 08 31 pm

from behavioralthor.

yamins81 avatar yamins81 commented on September 9, 2024

Comments:

  1. For both of the two options above, we could also replace the
    "average-L3-hard" score with the "HMO-0" score, correct? By HMO-0, I mean
    the current HMO model as extracted so far. I haven't yet completely
    thought through what I think is best here. Or we could do V1-hard or
    HMAX-hard, right? I am kind of leaning toward HMO0-hard at the moment.
    What are your thoughts?

  2. The method you described might generally be called "worst margin", e.g.
    you pick images as the ones with the worst margin on a classifier. I
    think this should be amended in two ways:
    a) First, we should make sure that any margins are those averaged
    over a set of splits, so that the "bad images" are truly those that have
    stably bad margins, regardless of the specific distractors.
    b) We should include a set of additional distractors that are
    randomly chosen with respect to margins. The reason to do this is that
    oftentimes, I have had the impression that "hard images" (or hard objects)
    are hard because of the "easier" distractors that are also in a given set.
    In other words, those "easier distractors" being present is important to
    expose the difficulty of a given "hard" image or object. If we remove all
    the easy ones, than it might suddenly look easy to solve the hard images,
    because they get "moved into place" on top of where the easy ones used to
    be. Then, once we try to combine the solution back in complementarily, it
    won't work. So, we'll want to keep at least some easy distractors
    around that are uniformly distributed in image space with regard to margins
    on the test algorithms (and classes).

  3. I assume that you think we should draw the images from which to choose
    these set from the pixel-hard synsets as opposed to random 250K images.
    This is why you're saying that we'll start extracting from tomorrow
    the "PixelHardSynsets"
    set, right? How many hard synsets are you thinking? Or will that be set
    by N1 to fix the size of the total set?

--> On a separate note, what we're doing here is basically stacking a

hierarchical series of increasingly stringent tests, to winnow down the
set. Starting with pixels, and using that to cut down the set a lot.
Then we can cut it down further with HMO0- or HMAX- or whatever we decide
on point 1) above. We then run THAT through either HMO procedure
directly, or THEN through humans to re-weight it.

  1. From the plot you made for the HMO0 model, I don't agree that we're
    seeing saturation, though, in the performance as a function of training
    examples. In fact, it looks to me like its slow (approximately
    logarithmic) increase, much like in the case of HvM. I expect that
    performance will keep increasing slowly with the number of examples. But
    I think N1 = 400 is fine, probably, since we don't need to push out to
    saturation, we just need a representative sample.

  2. Does this plan relate clearly to the psychophysics plan you came up with
    a couple of months ago? Can you spell that out a little more explicitly
    now, again?

On Mon, Sep 30, 2013 at 5:47 PM, Diego Ardila [email protected]:

@yamins81 https://github.com/yamins81
There are 2 main goals

  1. A screening set that is representative of the difficulty in the
    1000-way categorization task, for creating a challenge submission

  2. A screening set that is representative of the difficulty that humans
    are good at in all of imagenet, for getting better neural fits

re 1)
We should use random L3 models (5 sets of features, one from each random
model) and find a set of images that is hard to separate on average for the
model class. This would mean extracting #N1 images from each synset, then
getting margins for all 2-ways for each image. Then, we could just take the
mean of the set of negative margins for each image as a score, and take the
#N2 lowest scoring images.

re 2)
We should find the largest negative margins as above, but then for each of
these margins, test it in humans. This means that we will have ranked list
of tuples ranked by margin (most negative first):
(image, distractor_synset, margin)

And we will search through this set of image tuples using psychophysics to
find the first (going down the ordered list) #N2 tuples that have a
performance above some threshold.

Here are some training curve results for MCC2 classification
The results for linearsvc are still being calculated (takes about 210
minutes to generate one of these curves.)

[image: screen shot 2013-09-30 at 5 22 44 pm]https://f.cloud.github.com/assets/2701347/1241139/cdc9188e-2a17-11e3-99a5-c8acd6783cb1.png

Immediate points of action:

  1. Deciding how many images per synset to extract (#N1), then extracting
    them.
  2. Deciding the size of the screening set (#N2)

#N1 seems to be around 400 given the training curve (saturation around
300-350, need 50-100 test examples)

If you agree with this decision for #N1, then I will create a new dataset
called PixelHardSynsets which you should then extract

import imagenetdataset = imagenet.dldataset.PixelHardSynsets


Reply to this email directly or view it on GitHubhttps://github.com//issues/2
.

from behavioralthor.

ardila avatar ardila commented on September 9, 2024

Some vocabulary:
challenge subset -> dataset for goal one
imagenet subset -> dataset for goal two

1)

The various options are
Just pixels
V1 <- probably will require some engineering effort/setup time from me
Hmax <- probably will require some engineering effort/setup time from me
V1+HMax <- probably will require some engineering effort/setup time from me
Random L3
HMO
The problem with HMO hard is that if we believe that HMO is capturing key axes of difficulty, then we will be removing those from the dataset. This is ok if we have some principled way of combining the model we screen on the challenge subset with our existing model, but even if we do, at some point we should think about regularization (how many times is it fair to screen on a new dataset and add more components to the model).
If we are not combining models, then we want to remove only the axes of difficulty that will automatically be captured by almost any member of the model class, which is why I suggested random L3s

2)

a) agreed
b) Once we have a set of tuples with high deltas:
(image, distractor_synset, delta = margin-human performance as margin (using logistic regression))
We can construct the imagenet subset in several ways, here is one suggestion:
If we think of the deltas as weights, then every distractor synset will have some amount of weight summed over all the tuples. We should take a random sample of images from each synset whose size proportional to the synset's weight.
There is now one free parameter: # hard images/# images from distractor synsets
which can be set empirically to ensure that the screening set is actually difficult for HMO-0.

3)

It depends on N1. Since we've agreed N1=400 is ok, the number of synsets depends on the budget for extraction which you said was 250,000 (833 synsets). If that is correct, then you should begin extraction of PixelHardSynsets ASAP (should be ready in 15 minutes from the time I post this)

from behavioralthor.

ardila avatar ardila commented on September 9, 2024

PixelHardSynsets is now available: e93d9e99547c2fe05e48d264bf9219589ca9bc54

Here are svm results (not much different from MCC results):
screen shot 2013-10-01 at 2 58 35 pm

I am also running the following classifiers using compute metric base: 5-NN and SGDClassifier

from behavioralthor.

ardila avatar ardila commented on September 9, 2024

hmo conf_mat3

from behavioralthor.

ardila avatar ardila commented on September 9, 2024

@yamins81
In talking with Jim about priorities, I think we came to the conclusion that we need to take advantage of the work I've done so far in some way, instead of just dropping it all to move to a new problem. Looking through what I have I was wondering if you still thought that "finding the hard parts of imagenet" is a useful goal.

I'm pretty convinced that I've done this: I have all 2-ways of the best model I can run, and found the densest part of the space. I have measured the human and model performance at just a few points in this space and it looks like there is a significant gap with humans (just not in 2-ways because humans and models are near ceiling). If you are not convinced of this gap, what would it take?

Is it possible to run the HMO procedure again on a combination of however much of this dense space would be appropriate + the synthetic set from before?

At the very least, I want run some sort of apples-to-apples comparison on Imagenet with HMO and the others, especially since I've found that

  1. The gap on HvM is still significant, and here HMO is the most consistent with humans
  2. The consistency between humans and the convnet models is generally low on imagenet, especially in the dense subspace.

from behavioralthor.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.