kkduncan / saliencydetection Goto Github PK
View Code? Open in Web Editor NEWSaliency detection in images and videos.
Saliency detection in images and videos.
`#ifndef CANNYEDGEDETECTOR_H_
`/*
* Perform Gaussian smoothing
*/
smoothImage(src, smoothedImg);
/*
* Compute the first derivative in the x and y directions
*/
calculateDerivatives(src, deltaX, deltaY);`
In other words, you create smoothedImg but calc derivatives on the src.
Also I didn't fork but derived another repository: https://github.com/bootchk/saliencyLibrary.git , where I have made some trivial changes to clean up the code (while I am trying to understand it, and hopefully wrap it in a Gimp plugin.)
I started here when I saw you forked resynthesizer. It is curious that repeated applications of resynthesizer (from an image to itself) tends to obliterate unique, man-made features (salient?) whereas your algorithm seeks to find those features.
What I am hoping to do as a plugin is image summarization or autocrop, by thresholding the saliency map and then cropping the original to the bounding box of that.
I don't see that this code operates at all scales (Gaussian pyramid) as you say in the paper, but possibly I just misunderstand.
The updatePixelEntropy() function is called every 32 iterations. It stores a result in the densityEstimate, but the result does not seem to be a function of prior results. Also, after iteration stops, it seems like there likely have been additional samples (after the last call to updateEntropy) which have not contributed to the estimate.
I removed the periodic call to this function and made method KernelDensityInfo.entropy() (instead of a data field), called only once when creating the saliency map. That doesn't seem to affect the results, good or bad. So this is not much of an issue.
But just for my understanding, is the purpose to be found in equation 5 of your paper? My best guess is that the top equation of 5 should be computed every iteration by updatePixelEntropy(), but that computing it every 32 iterations is heuristic simplification from the ideal. But the data flow doesn't seem to support that purpose.
An enhancement.
My initial thought was that retaining color channels throughout would give better results. In my current code I calculate an orientation difference for each channel RGB, in range [0,pi] but then sum that into one angle difference kernel (theta) in the range [0, 3pi.] Still using a kernelSum of K(d, theta). That seems ineffective e.g. for a yellow highway stripe.
So I will try again, calculating a kernel sum of K(d, thetaR, thetaG, thetaB). Do you have an intuition about that?
Consider depth-enabled photography, where the image has additional channels: infrared IR and laser-ranging depth e.g. the Intel RealSense cameras. In many cases, one channel might have the most readily available object (saliency) information. Without any apriori knowledge of which channel that is, a general-purpose algorithm (for nieve GIMP users) should work channel-wise?
What's your opencv version?
Using the minus operator for float types holding gradient orientations (angular coordinate of polar coordinates) in units of radians yields either the interior (small) or exterior (large) subtended angle, depending on the order of the operands. Instead, using the function atan2(sin(x-y), cos(x-y)) gives the smallest angle, and seems to give crisper saliency results.
This issue might have crept in during your translation from legacy code to openCV and polar coordinates. I just stumbled upon this as I was working on color gradients.
I am surprised openCV does not have an Angle class with this method.
[http://stackoverflow.com/questions/1878907/the-smallest-difference-between-2-angles(url)
I found that thresholding before computing saliency gives results more like I expected.
For example, a yellow highway stripe (that I mentioned earlier) now becomes salient. Another example is image 'i2.jpg' from the MIT saliency database (an image of a postage stamp on a letter), where the postage stamp now becomes salient.
I also removed the smoothing preprocess, without much change in the result. As an architectural issue, I think any preprocess should be in the API like postProcessSaliencyMap() is, making it optional.
Its non-intuitive to me that thresholding should give better(?) results. Why should discarding information up front give better results? (Not discarding information is why I attempted to retain color channels in the input and throughout the calculations, but that was ineffective. More on that to follow in another issue.)
If you threshold, then many values become discrete. Then you could speed it up by eliminating floating point operations and using table lookups for transcendental functions? Premature to try that I suppose.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.