lindahua / empiricalrisks.jl Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 8.0 79 KB

Julia implementation of predictors and loss functions for empirical risk minimization

License: Other

Julia 100.00%

empiricalrisks.jl's People

Contributors

Stargazers

Watchers

Forkers

tkelman cody-g lxw4939 nefthys smily1984 standardgalactic

empiricalrisks.jl's Issues

Memory footprint

Hi! While playing around with Regression.jl, I noticed that the memory allocations for GD() scale with the number of iterations as well as observations in the dataset. After some digging, I discovered that the source of this is the predict method of EmpiricalRisks.jl which allocates the predicted vector within the method.

Since Regression.solve! calls Regression.backtrack! multiple times, which in turn calls EmpiricalRisks.predict multiple times, the memory footprint accumulates. So the issue is that the prediction vector gets allocated multiple times each iteration when using Regression.solve!

I have come up with a potential solution by implementing a predict! method that fills a preallocated prediction vector. Using this method I could rewrite solve! to preallocate the prediction vector only once

function predict!{T<:BlasReal}(pm::AffinePred, θ::StridedVector{T}, X::StridedMatrix{T}, r::StridedVector{T})
    d = pm.dim
    n = size(X,2)
    @_checkdims length(θ) == d + 1 && size(X,1) == d && n == length(r)
    b = convert(T, pm.bias) * θ[d+1]
    for col = 1:n
        @inbounds r[col] = b
        for row = 1:d
            @inbounds r[col] += X[row, col] * θ[row]
        end
    end
    r
end

Here is a benchmark:

d = 3
n = 1000000
w = randn(d+1)
X = randn(d, n)
t = sign(X'w[1:d] + w[d+1] + 0.01 * randn(n))

using EmpiricalRisks
pm = AffinePred(d, 1.)

@time predict(pm, w, X)
@time predict(pm, w, X)

r = zeros(n)
@time predict!(pm, w, X, r)
@time predict!(pm, w, X, r)

println("Loop with predict")
@time for i = 1:1000
  s = predict(pm, w, X)
end

println("Loop with predict!")
@time for i = 1:1000
  s = predict!(pm, w, X, r)
end

elapsed time: 0.356688159 seconds (21377200 bytes allocated)
elapsed time: 0.005305386 seconds (8000456 bytes allocated) <-- predict
elapsed time: 0.021158343 seconds (409856 bytes allocated)
elapsed time: 0.006868238 seconds (80 bytes allocated) <-- predict!
Loop with predict
elapsed time: 16.95238578 seconds (8000377088 bytes allocated, 43.88% gc time)
Loop with predict!
elapsed time: 4.788862339 seconds (0 bytes allocated)

What do you think? would you be interested in merging these changes if I create the PRs?

Tag a new release with 0.4 deprecations fixed

Master version is working fine on 0.4, but the one you get from Pkg.add is throwing lots of deprecation warnings.

Info about upcoming removal of packages in the General registry

As described in https://discourse.julialang.org/t/ann-plans-for-removing-packages-that-do-not-yet-support-1-0-from-the-general-registry/ we are planning on removing packages that do not support 1.0 from the General registry. This package has been detected to not support 1.0 and is thus slated to be removed. The removal of packages from the registry will happen approximately a month after this issue is open.

To transition to the new Pkg system using Project.toml, see https://github.com/JuliaRegistries/Registrator.jl#transitioning-from-require-to-projecttoml.
To then tag a new version of the package, see https://github.com/JuliaRegistries/Registrator.jl#via-the-github-app.

If you believe this package has erroneously been detected as not supporting 1.0 or have any other questions, don't hesitate to discuss it here or in the thread linked at the top of this post.

Future directions

Hi! Thanks for all your amazing efforts.

Is this package still of interest to you? Since I am switching over my work and Machine Learning research to Julia I am in the process of writing a few libraries that I need. There are many cool things already established, but only "kinda", if you know what I mean.

Long story short, if you are still interested in maintaining this package (and the two sub-packages), I want to contribute the necessary things for getting them up to speed (or my biased interpretation of it; egoistically motivated).

here's what I want to work on:

EmpiricalRisks.jl currently collides with StatsBase in respect to predict. I would need them to play nice.
Regression.jl: since it's deterministic optimization, it would probably be better off making use of Optim.jl where the optimization community is spending it's focus on. I think the two would be a great fit and it would reduce redundancy. Though, this point kinda depends on if my pull-request there (that introduces callback functions) gets through, because I need those.
SGDOptim.jl: The mini-batch stream is very verbose to deal with because it is only iterated over once instead of multiple times... I wonder anyhow if it wouldn't be better to structure the package similar to Optim.jl in the sense that it is agnostic to where the function and gradient comes from and thus need not depend on EmpiricalRisks.jl

I'm asking because other packages, where I made pull requests, have slow to no response (not even negative) which slows me down enough to make it worth my while to do things from scratch. (I do have deadlines). I'd rather contribute to your great efforts though than doing a reboot, provided you are interested and agree with what I want to change.

What are your thoughts on this? Am I coming off as a crazy person?

Load error in 0.5.0

julia> using EmpiricalRisks
ERROR: LoadError: LoadError: UndefVarError: FloatingPoint not defined
in include_from_node1(::String) at .\loading.jl:488 (repeats 2 times)
in eval(::Module, ::Any) at .\boot.jl:234
in require(::Symbol) at .\loading.jl:415
while loading C:\Users\pawn0002.julia\v0.5\EmpiricalRisks\src\common.jl, in expression starting on line 35
while loading C:\Users\pawn0002.julia\v0.5\EmpiricalRisks\src\EmpiricalRisks.jl, in expression starting on line 84

isclassifier and decision_function

what do you think about the idea of adding the two functions isclassifier(::Loss)::Bool and decision_function(::Loss, ::PredictionModel)::Function to the package? I could really use an (extensible) way to check if I am dealing with a classifier

EDIT: after extensive reading I came up with a better solution

Add regularizers that may not have a defined gradient

For some regularizers, the proximal operator is a (short) iterative calculation that does not yield a defined gradient expression : L1Ball (for the true LASSO), Simplex, ...
This may be a problem for regular optimization algos but not for the proximal gradient that does not need these gradients. This could make them a useful addition to this package.
I understand that the EmpiricalRisks package should be as generic as possible, but would it be possible anyway to include them ? Perhaps through a subtype of Regularizers with only a prox! method but no value_and_addgrad!() ?

If you think its worthwhile I'd be ready to contribute that extension to the package.

Thanks.

Tag a new release for Julia 0.5

I see there's a similar issue already for tagging a 0.4-compatible release, but I'll second that request now that tests pass on 0.5 after the commits I made in January. Tagging this will also allow my open PR at Regression.jl to pass Travis so we can get that package working in 0.5.

lindahua / empiricalrisks.jl Goto Github PK

empiricalrisks.jl's People

Contributors

Stargazers

Watchers

Forkers

empiricalrisks.jl's Issues

Memory footprint

Tag a new release with 0.4 deprecations fixed

Info about upcoming removal of packages in the General registry

Future directions

Load error in 0.5.0

isclassifier and decision_function

Add regularizers that may not have a defined gradient

Tag a new release for Julia 0.5

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent