Comments (7)
I like the concept, but unless I did something wrong here there's a big performance hit for:
function fit!(o::OnlineStat, y::Union{AVec, AMat})
for i in 1:size(y, 1)
fit!(o, row(y, i))
end
end
I tried adding the following:
function fit2!(o::OnlineStat, y::Union{AVec, AMat})
for i in 1:size(y, 1)
fit2!(o, row(y, i))
end
end
fit2!(o::OnlineStat, y::Real) = (fit!(o, y); o)
julia> y = randn(10_000_000);
julia> o1 = Variance(); o2 = Variance()'
▌ Variance
▶ value: 0.0
▶ nobs: 0
julia> @time fit!(o1, y)
0.059439 seconds (4 allocations: 160 bytes)
julia> @time OnlineStats.fit2!(o2, y)
0.131593 seconds (10.00 M allocations: 152.588 MB, 7.01% gc time)
Before the rewrite, we had update_get!
...how about fit_get!
?
from onlinestats.jl.
Something seems weird here. I don't think you can use the same fit!
method for Mean and Means, for example. Is a vector multiple observations (Mean), or a single observation (Means)? I like that you drastically reduced the definitions of fit!
scattered throughout, but you lost this important distinction.
Also, it's really bad to take row
for each item in a column vector. I think these definitions could use a rewrite, and might need to consider adding subtypes UnivariateOnlineStat
and MultivariateOnlineStat
to get this to work.
Finally we need to revisit the idea of row-based vs column-based storage. We should consider a type RowMatrix
and flip-flop the row
/column
calls, so that column-based access is the default.
Thoughts?
from onlinestats.jl.
The fit!
methods for single observations are always more specific than the fit!
methods for multiple observations, so using the same method for both Mean and Means does work. But you're right that calling row
on a vector is slow. Here's the change:
function fit!(o::OnlineStat, y::AMat)
for i in 1:size(y, 1)
fit!(o, row(y, i))
end
end
function fit!(o::OnlineStat, y::AVec)
for yi in y
fit!(o, yi)
end
end
from onlinestats.jl.
The fit! methods for single observations are always more specific than the fit! methods for multiple observations
I think I understand now. There's a fit!(o::OnlineStat, y::AVec)
defined for each multivariate stat, and that takes precedence over the more abstract one. Got it.
Here's the change
This should be better.
Back to the original point... I'd be really surprised if returning the OnlineStat
from the fit!
method makes any noticeable difference in performance. It should be a no-op if you're not using it in the resulting call.
from onlinestats.jl.
Strange, I just tried it again and saw no time/allocation difference. In that case, I like the idea of returning the object. I'll run through everything and add it.
from onlinestats.jl.
fit!
now returns the OnlineStat. Also FYI, future PRs can go into the dev branch. I'll be using dev, rather than josh, from now on.
from onlinestats.jl.
Cool thanks!
On Feb 6, 2016, at 10:02 AM, Josh Day [email protected] wrote:
fit! now returns the OnlineStat. Also FYI, future PRs can go into the dev branch. I'll be using dev, rather than josh, from now on.
—
Reply to this email directly or view it on GitHub.
from onlinestats.jl.
Related Issues (20)
- Possible type instability in `OnlineStatsBase.jl` HOT 1
- Group with 3 Stats not working for multi-observations? HOT 3
- Julia VS Code extension reports "Possible method call error" for `fit!` HOT 3
- _fit! on AutoCov is not type stable HOT 1
- Extract field of an observation before feeding an OnlineStats - ValueExtractor wrapper HOT 2
- Feature Request: OnlineStat Chaining HOT 1
- Using StatLag without depending on OnlineStats (just OnlineStatsBase) HOT 4
- ExtremeValues doesn't work HOT 2
- Odd interaction of `Group` with broadcast HOT 2
- [speculative] `NullStat` HOT 1
- Plot of GroupBy of HeatMap fails
- when fit!-ing a Group to a NamedTuple, the names are ignored HOT 2
- Documentation Request: List which Monoids support merge
- Feature Request: PCA wrapper around CovMatrix which also supports transform methods
- Pretty printing is unpretty inside DataFrame
- Support `keys` and `values` on `GroupBy` HOT 1
- Bug: Y-Marginals for heatmap are wrong
- Allow counts argument in `fit!` HOT 5
- Suggestions for OnlineStats v2
- Standard Deviation - StdDev
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onlinestats.jl.