Giter Site home page Giter Site logo

emdist's People

Contributors

s-u avatar

Stargazers

Andres Muniz Piniella avatar Junqiang Wang avatar Daniel Jordan avatar

Watchers

James Cloos avatar  avatar Qi avatar  avatar

Forkers

phauchamps

emdist's Issues

emd2d to compare two matrix of different dimension

I am willing to use emd2d to compare two matrix (each containing latitude and longitude of POIs from two sources) but they have different dimensions. As a result I am having an error saying A and B must be matrices of the same dimensions. Is it mandatory to have matrix of same dimension to calculate EMD?

Result is zero despite unequal A and B

Consider the following example of two histograms that are fairly similar but not entirely:

A <- c(4,5,5,5,4,1)
B <- c(4,5,5,6,4,3)
emdw(1:length(A), A, 1:length(B), B, dist="manhattan")

The returned result is 0, i.e. to be interpreted as "no work required to turn A into B", despite A and B obviously not being identical. Why is this the case?

I suspected it may have to do with sum(A) being different from sum(B), i.e. not having the same amount of dirt in the two inputs. But I tested other combinations with the same "flaw" and they gave results > 0 as expected. I tried to find documentation/papers on what exactly happens when the sums are not equal, but couldn't find detailed information on this topic besides the following, which indicates that what I'm trying to do should be possible (and comes from the homepage of Rubner himself).

The size of the two signatures can be different. Also, the sum of weights of one signature can be different than the sum of weights of the other (partial match).
http://robotics.stanford.edu/~rubner/emd/default.htm

memory issue

I keep running into a memory error of some sort when I use emd2d. I'm assuming that the error should read 'Unable to allocate memory...' but I'm not sure why it's trying to allocate a negative amount.

mat_a <- matrix(sample(c(rep(0, 3e+07), floor(runif(501900, 1, 30))), 30501900), 
  nrow = 3575, ncol = 8532)
mat_b <- matrix(sample(c(rep(0, 3e+07), floor(runif(501900, 1, 30))), 30501900), 
  nrow = 3575, ncol = 8532)
emdist::emd2d(mat_a, mat_b)
#> Error in emdr(A, B, dist = dist, ...): Unable to memory (-1620.2 MB) in emdist

I'm running R 3.3.2 x64 and emdist 0.3-2 on a machine with 32GB of ram.

Maximum iterations reached

Hi Simon,

This package is so useful! Is there a recommended size for the length of the matrix - in terms of how many location points to include? I have been getting a lot of the following warning:

In emdr(A, B, dist = dist, ...) :
emd: Maximum number of iterations has been reached (500)

The output nevertheless makes sense and fits the numbers I put in, but of course a warning can be a bit worrying and maybe means the estimate is not accurate?

When I reduce the number of location points from 200 to 100, the warning is greatly reduced/barely occurs. Perhaps I should even reduce it further. Might an excessive number of location points cause the number of iterations to be regularly exceeded, or is there another more fundamental reason?

Clarifying inputs for emd()

Hello Simon,
I was looking at your package for implementation in R and was just wondering if you could clarify the data inputs further for the emd() function, as I am unsure what the function is expecting for values A and B.

Is it looking for A and B to be a function for a distribution? An object that contains points of a probability distribution function? Maybe it would help to describe my use case - I essentially have a bunch of sample sites, and have time series data for these points over time - what I was planning to do is extract the distributions from a histogram for each point, and compare them using earth mover's distance. Where I am stuck with the emd() function is what the input needs to be. I was hoping you could advise.

Many thanks in advance and I look forward to hearing back from you.
With care,
Bryant

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.