xxxnell / flex Goto Github PK
View Code? Open in Web Editor NEWProbabilistic deep learning for data streams.
License: MIT License
Probabilistic deep learning for data streams.
License: MIT License
Type of issue
Description
When calculating the KL-divergence, the boundary is vanishing. So, the calculation results doesn't included for it. Therefore, when the sampling number is too small (100>), or when the ratio of boundary is too high (0.01<), the numerical calculation result of KL-divergence is inaccurate.
Implement SelectiveSketch
which performs deepUpdate
selectively only when there is a discrepancy between the temporarily collected sample datas and the recorded distribution by Sketch
.
Apply sbt-release(https://github.com/sbt/sbt-release) to manage version simply.
ND4J, or N-Dimensional Arrays for Java is scientific computing libraries for the JVM. They are meant to be used in production environments, which means routines are designed to run fast with minimum RAM requirements.
It would be better to replace array computation.
Currently, Sketch
only has a working monad operation, so if you include Dist
except Sketch
in for comprehension, it will not work properly.
The result of featured sample code in README.md
diverges: it returns 0.9724554061259115.
Getting icdf
is an expensive operation. Therefore, cache icdf
to improve performance.
Now the sample
of Sketch isn't implemented. Implement sample
using inverse transform sampling.
Configuration parameters are duplicated. For example, boundaryRatio
of EqualizedIcdfSamplingConf
is duplicated with boundaryCorrection
of SketchConfB
.
To support Java, this project need an interface and syntactic sugar written in Java.
Custom configuration generates the ambiguous implicit values for DistArthmeticSyntax
. See flip.experiment.BasicBimodalDistExp
KLD uses conf1 (configuration of first distribution) and conf2 at the same time, but it takes conf2 only.
The fastSampling
function of Sketch
uses a parameter (called ratio
) when it defines the sampling points. However, there's no rule to set the value of the parameter. So, it should be customized or defined automatically.
sample
of Sketch
returns boundary values (e.g. List(..., -1.3346329812349141E307, 7.927339694866348E307, ..., 4.2420349412446703E307, ..., -1.1082857601763558E308...)).
When Sketch
estimates the density distribution, too low a KL-divergence value is obtained because the boundary is not processed properly. Therefore, as a way of smoothing the edges, we use the large scale ConcatSmoothingPs
and then re-examine KL-divergence when performing deepUpdate
.
The open source project have to be applied linting tool such as scalafix(https://github.com/scalacenter/scalafix).
Now RecurSketch only override update, and count updates when the update is called. Therefore, RecurSketch doesn't update its count when it calls the narrowUpdate only.
narrowUpdate of RecurSketch must be overrided with updating its count.
smoothing
operations are used in several places. The use of UpdateCmap and DeepUpdate is especially important.
As part of refactoring the smoothing operation, several methods should be applicable dynamically.
import flip._
didn't work in Intellij syntax highlighter. (but compiling works fine.)
From now, 40% of samplings are garbages after flatMap. We have to reduced it by abount 10%.
There is some error (about 10-15%) between interpolationPdf
and fastPdf
. See fastPdf
part of AdaPerSketchSpec
.
sbt experiment
command in root should execute all experiments (c.f. flex.experiment
package). However, for now, only one experiment is executed (with arg0
). Therefore, the experiment
command that does not have an argument must perform all the experiment codes. See Tasks
.
Too many Option
s in Sketch
ops to handle Sketch
with empty structure.
Sketch.fastPdf returns (_, NaN) in some case.
map
and flatMap
generates NaN when Sketch contains nothing (Sketch.empty).
So:
Flip seems to be able to compose various sampling methodologies such as MCMC or Gibbs.
So far I had to call runMain
to run the implemented experiments.However, as the number of experiments increases, it is no longer possible to run all the experiments one by one. Therefore, sbt task to execute all experiments must be needed.
Now plot contains primitive records only. However, in some cases, plot with measurable range, or RangeM
would be useful.
Deploy to maven central repository.
The purpose of concat
of RangePlot
is unclear. This function seems to decompose more than two primitives.
HCounter is subcategory of Counter, but they recide in different packages.
Now sampling
of SamplingDist
returns Option
of DensityPlot
for empty structure Sketch
. However, it can return DensityPlot.empty
instead of None.
Theoretically, inverse-cdf (quantile) returns ยฑโ at 0 and 1. However, due to the limitations of the way Sketch
treats boundaries, this value only returns a finite large value.
For now, we take the approach of artificially removing the two values of the boundaries, but we need a more sophisticated way of getting a new Cmap
in this function.
bind
returns NaN for this configuration:
val samplingNo = 50
implicit val conf: SketchConf = SketchConf(
startThreshold = 50,
thresholdPeriod = 100,
boundaryCorr = 0.1,
decayFactor = 0,
queueSize = 30,
cmapSize = samplingNo,
cmapNo = 5,
cmapStart = Some(-10d),
cmapEnd = Some(10),
counterSize = samplingNo
)
For more detail, see the code.
I have now independently packaged the sampling algorithm to separate the sampling methods. However, the legacy is strongly combined, so one have to replace it with the new one.
See cmapForEqualSpaceCumCorr
of EqualSpaceCdfUpdate
FlatMap
is too slow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.