Giter Site home page Giter Site logo

Comments (7)

joshday avatar joshday commented on May 18, 2024

I like the concept, but unless I did something wrong here there's a big performance hit for:

function fit!(o::OnlineStat, y::Union{AVec, AMat})
    for i in 1:size(y, 1)
        fit!(o, row(y, i))
    end
end

I tried adding the following:

function fit2!(o::OnlineStat, y::Union{AVec, AMat})
    for i in 1:size(y, 1)
        fit2!(o, row(y, i))
    end
end
fit2!(o::OnlineStat, y::Real) = (fit!(o, y); o)
julia> y = randn(10_000_000);

julia> o1 = Variance(); o2 = Variance()'
▌ Variance
  ▶     value: 0.0
  ▶      nobs: 0


julia> @time fit!(o1, y)
  0.059439 seconds (4 allocations: 160 bytes)

julia> @time OnlineStats.fit2!(o2, y)
  0.131593 seconds (10.00 M allocations: 152.588 MB, 7.01% gc time)

Before the rewrite, we had update_get!...how about fit_get!?

from onlinestats.jl.

tbreloff avatar tbreloff commented on May 18, 2024

Something seems weird here. I don't think you can use the same fit! method for Mean and Means, for example. Is a vector multiple observations (Mean), or a single observation (Means)? I like that you drastically reduced the definitions of fit! scattered throughout, but you lost this important distinction.

Also, it's really bad to take row for each item in a column vector. I think these definitions could use a rewrite, and might need to consider adding subtypes UnivariateOnlineStat and MultivariateOnlineStat to get this to work.

Finally we need to revisit the idea of row-based vs column-based storage. We should consider a type RowMatrix and flip-flop the row/column calls, so that column-based access is the default.

Thoughts?

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

The fit! methods for single observations are always more specific than the fit! methods for multiple observations, so using the same method for both Mean and Means does work. But you're right that calling row on a vector is slow. Here's the change:

function fit!(o::OnlineStat, y::AMat)
    for i in 1:size(y, 1)
        fit!(o, row(y, i))
    end
end
function fit!(o::OnlineStat, y::AVec)
    for yi in y
        fit!(o, yi)
    end
end

from onlinestats.jl.

tbreloff avatar tbreloff commented on May 18, 2024

The fit! methods for single observations are always more specific than the fit! methods for multiple observations

I think I understand now. There's a fit!(o::OnlineStat, y::AVec) defined for each multivariate stat, and that takes precedence over the more abstract one. Got it.

Here's the change

This should be better.

Back to the original point... I'd be really surprised if returning the OnlineStat from the fit! method makes any noticeable difference in performance. It should be a no-op if you're not using it in the resulting call.

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

Strange, I just tried it again and saw no time/allocation difference. In that case, I like the idea of returning the object. I'll run through everything and add it.

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

fit! now returns the OnlineStat. Also FYI, future PRs can go into the dev branch. I'll be using dev, rather than josh, from now on.

from onlinestats.jl.

tbreloff avatar tbreloff commented on May 18, 2024

Cool thanks!

On Feb 6, 2016, at 10:02 AM, Josh Day [email protected] wrote:

fit! now returns the OnlineStat. Also FYI, future PRs can go into the dev branch. I'll be using dev, rather than josh, from now on.


Reply to this email directly or view it on GitHub.

from onlinestats.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.