s-u / emdist Goto Github PK
View Code? Open in Web Editor NEWEarth Mover's Distance implementation for R
License: Other
Earth Mover's Distance implementation for R
License: Other
I am willing to use emd2d to compare two matrix (each containing latitude and longitude of POIs from two sources) but they have different dimensions. As a result I am having an error saying A and B must be matrices of the same dimensions. Is it mandatory to have matrix of same dimension to calculate EMD?
Consider the following example of two histograms that are fairly similar but not entirely:
A <- c(4,5,5,5,4,1)
B <- c(4,5,5,6,4,3)
emdw(1:length(A), A, 1:length(B), B, dist="manhattan")
The returned result is 0
, i.e. to be interpreted as "no work required to turn A into B", despite A and B obviously not being identical. Why is this the case?
I suspected it may have to do with sum(A)
being different from sum(B)
, i.e. not having the same amount of dirt in the two inputs. But I tested other combinations with the same "flaw" and they gave results > 0 as expected. I tried to find documentation/papers on what exactly happens when the sums are not equal, but couldn't find detailed information on this topic besides the following, which indicates that what I'm trying to do should be possible (and comes from the homepage of Rubner himself).
The size of the two signatures can be different. Also, the sum of weights of one signature can be different than the sum of weights of the other (partial match).
http://robotics.stanford.edu/~rubner/emd/default.htm
I keep running into a memory error of some sort when I use emd2d
. I'm assuming that the error should read 'Unable to allocate memory...' but I'm not sure why it's trying to allocate a negative amount.
mat_a <- matrix(sample(c(rep(0, 3e+07), floor(runif(501900, 1, 30))), 30501900),
nrow = 3575, ncol = 8532)
mat_b <- matrix(sample(c(rep(0, 3e+07), floor(runif(501900, 1, 30))), 30501900),
nrow = 3575, ncol = 8532)
emdist::emd2d(mat_a, mat_b)
#> Error in emdr(A, B, dist = dist, ...): Unable to memory (-1620.2 MB) in emdist
I'm running R 3.3.2 x64 and emdist 0.3-2 on a machine with 32GB of ram.
Hi Simon,
This package is so useful! Is there a recommended size for the length of the matrix - in terms of how many location points to include? I have been getting a lot of the following warning:
In emdr(A, B, dist = dist, ...) :
emd: Maximum number of iterations has been reached (500)
The output nevertheless makes sense and fits the numbers I put in, but of course a warning can be a bit worrying and maybe means the estimate is not accurate?
When I reduce the number of location points from 200 to 100, the warning is greatly reduced/barely occurs. Perhaps I should even reduce it further. Might an excessive number of location points cause the number of iterations to be regularly exceeded, or is there another more fundamental reason?
Hello Simon,
I was looking at your package for implementation in R and was just wondering if you could clarify the data inputs further for the emd() function, as I am unsure what the function is expecting for values A and B.
Is it looking for A and B to be a function for a distribution? An object that contains points of a probability distribution function? Maybe it would help to describe my use case - I essentially have a bunch of sample sites, and have time series data for these points over time - what I was planning to do is extract the distributions from a histogram for each point, and compare them using earth mover's distance. Where I am stuck with the emd() function is what the input needs to be. I was hoping you could advise.
Many thanks in advance and I look forward to hearing back from you.
With care,
Bryant
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.