Giter Site home page Giter Site logo

drchainsaw / naivenaslib.jl Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 1.0 2.75 MB

Relentless mutation!!

License: MIT License

Julia 100.00%
deep-learning machine-learning neural-networks transfer-learning morphisms architecture-search hyperparameter-optimization mutation

naivenaslib.jl's People

Contributors

drchainsaw avatar github-actions[bot] avatar juliatagbot avatar simeonschaub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

simeonschaub

naivenaslib.jl's Issues

Work out the pruning use case

Pruning is basically supported, but it would probably need some more quality of life kinda-functions as the process is probably a bit awkward, especially if one wants to automate it.

Pain points:

  • One needs to create a copy of the graph with IoInidices
  • Size changes are not really visible after the metadata has been changed. Add some MutationOp to spy on them when they happen?
  • There exists no simple way right now to translate a "size changed" graph into a pruned one.

Especially concatenations have a tendency to differ as sizes are just split while indices can not be "split". Would be good to have a function which somehow helps getting the right number of indicies from each vertex.

What to do if a whole branch/tower/path gets pruned (no indices from it is selected)? Crash? Remove is silently (or with warning)? All vertices in it? Or try to reconnect it to some size absorbing vertex in a neighbouring branch/path/tower... ugh...

Help with doing parameter selection in nin direction

See testset "Mutate-prune" - "Merge two vertices".

Basic issue is that when selecting indices which might propagate through a SizeStack vertex after having changed the size with an IoChange mutation, one must be careful to not select "out of range".

Either some improvement to allow for the param selection to override the size or some helper function to see what are the valid indices, or maybe both would be useful.

For now, one is better of selecting parameters from the bottom in the "nout" direction.

Checklist for 2.0

With #40 potentially being solved it might be feasible to go for #41. Since #41 is breaking I'll just list a few things in there which I would also like to change for 2.0 to minimize risk of forgetting them.

By judging from the influx of issues I'm probably the only user of this package and therefore I'm planning to be quite liberal with breakage. Hopefully the end result is a leaner, more idiomatic API which is easier to use. If you are user of this package and don't want unnecessary breakage then please let me know.

  • Remove size state
  • Remove the min delta factor functions
  • Add bangs to mutating methods
  • Add support for Functors.jl
  • Remove most of the pretty printing and vertex formatting stuff
  • Remove dependencies to LightGraphs and MetaGraphs
  • Remove export of internal/special purpose functions

Loosen up output insert constraints

Current way to model the "rules" for inserting new outputs put uneccessary constraints on the model:

  1. There is a hard cap on the maximum number of inserts
  2. The number of inserts plus the number of selected outputs must be equal to the hard cap of 1.

Reason for 1. is that insertions are modeled in the same ways as selections, with a binary variable per possible insertion position where 'true' means 'insert' and 'false' means 'dont insert' and the number of binary variables must ofc be limited to something.

Reason for 2. is to prevent impossible outcomes such as 'keep current outputs number 1,3 and 5 and insert a new output at position 10'. What to do with outputs 4-9 in this case? There is absolutely no guarantee that the result is feasible if one just squash the output and insert a new output at position 4.

A potential way to relax 1 could be to replace the binary contraint on the insertion variable with a >= 0 constraint and let the outcome represent how many consecutive outputs to insert starting at that position. If variable has the same size as the (current) number of outputs it should cover all possible ways to insert new neurons, or?

Relaxing 2 seems a fair bit trickier.

One thing that maybe can be exploited: I think that things work out if any deficits are strictly confined to the end of the inservar. I.o.w. selectvar = [1, 1, 0, 0, 1, 0], insertvar = [1 ,0, 0, 0, 1, 0, 0] is feasible as this corresponds to "select indices 1,2 and 5 and insert new outputs at position 1 and 5" as this results in [new, 1, 2, 5, new, ?, ?] and last two indices can simply be dropped, while selectvar = [1, 1, 0, 0, 1, 0], insertvar = [1 ,0, 0, 0, 0, 1, 0] is not feasible as this corresponds to "select indices 1, 2 and 5 and insert a new output at position 1 and position 6" resulting in [new, 1, 2, 5, ?, new, ?] .

Here is a short snippet with a binary variable constrained to be the consecutive zeros at the end of another binary variable using MIP formulations of not and and:

    using Test
    @testset "Last consecutive zeros" begin

        import JuMP
        import Cbc
        import JuMP: @variable, @constraint, @objective, @expression

        model =  JuMP.Model(JuMP.with_optimizer(Cbc.Optimizer, loglevel=0))

        x = JuMP.@variable(model, x[1:10], Bin)
        a = JuMP.@variable(model, a[1:10], Bin)

        JuMP.@constraint(model, x[6] == 1)
        JuMP.@constraint(model, a[1] == 1 - x[10])
        JuMP.@constraint(model, [i=1:9], 0 <= 1 - x[10-i] + a[i] - 2 * a[i+1] <= 1)

        JuMP.@objective(model, Max, sum(a))

        JuMP.optimize!(model)

        @test JuMP.termination_status(model) == JuMP.MOI.OPTIMAL
        @test JuMP.value.(x) == [0, 0, 0, 0, 0, 1, 0, 0 ,0 ,0]
        @test JuMP.value.(a) == [1, 1, 1, 1, 0, 0, 0, 0, 0, 0]

    end

What is missing from the above is how to deal with selections. For instance, if sum(select) + sum(insert) + sum(conseczeros) == length(insert) one is constrained to either insert or select but not both. Would it be enough to change the equality to an inequality or do I need to have some kind of and relation between select and conceczeros?

Not top prio right now as things work ok with current formulation. Will revisit when time permits.

Segfault during testing

PkgEval is running into the following when testing this package:

[ Info: Testing computation
[ Info: Testing gradients
[ Info: Testing mutation
[ Info: Testing size mutation

signal (11): Segmentation fault
in expression starting at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/test/mutation/size.jl:30
_ZN11ClpPresolve20gutsOfPresolvedModelEP10ClpSimplexdbibbPKcS3_ at /home/pkgeval/.julia/artifacts/e4a36d92f6628275dd9546eabfde4e94b1ffb986/lib/libClp.so (unknown line)
_ZN21OsiClpSolverInterface7resolveEv at /home/pkgeval/.julia/artifacts/e4a36d92f6628275dd9546eabfde4e94b1ffb986/lib/libOsiClp.so (unknown line)
_Z8CbcMain1iPPKcR8CbcModelPFiPS2_iER19CbcSolverUsefulData at /home/pkgeval/.julia/artifacts/1263af5e59820ee3b62d2f59e030cdcc86380f82/lib/libCbcSolver.so (unknown line)
Cbc_solve at /home/pkgeval/.julia/artifacts/1263af5e59820ee3b62d2f59e030cdcc86380f82/lib/libCbcSolver.so (unknown line)
Cbc_solve at /home/pkgeval/.julia/packages/Cbc/dIPfi/src/gen/libcbc_api.jl:306 [inlined]
optimize! at /home/pkgeval/.julia/packages/Cbc/dIPfi/src/MOI_wrapper/MOI_wrapper.jl:521
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/MathOptInterface.jl:85 [inlined]
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x7fc30e5fc362)
unknown function (ip: 0x7fc30e5e29f9)
unknown function (ip: 0x7fc30e5e2905)
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/Bridges/bridge_optimizer.jl:376 [inlined]
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/MathOptInterface.jl:85 [inlined]
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x7fc30e5e28d2)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
#optimize!#107 at /home/pkgeval/.julia/packages/JuMP/pQApG/src/optimizer_interface.jl:440
optimize! at /home/pkgeval/.julia/packages/JuMP/pQApG/src/optimizer_interface.jl:410 [inlined]
newsizes at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/mutation/size.jl:184
Δsize! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/mutation/size.jl:57
unknown function (ip: 0x7fc30e75f442)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
Δsize! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:62
Δsize! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:76
unknown function (ip: 0x7fc30e73f24d)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
Δnin! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:168
Δnin! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:170

Can be reproduced by checking out PkgEval and doing:

julia --project PkgEval/bin/test_package.jl --julia=stable --name=NaiveNASlib

I'm not sure whether this is a bug with this package or with MathOptInterface.jl, but since it's being flagged as a crash in PkgEval reports it would be good to fix 🙂

Different changes to inputs to SizeStack causes exact OutSelect to be infeasible

In particular, this happens when (at least) one input increases in size (outputs shall be added) while others decrease so that the total output of the SizeStack is decreased.

Reason is that the constraints for the size only target the output "as a whole" and they say that "no new outputs shall be added". This is indeed infeasible since (at least) one of the inputs have to add new outputs (combined with the nature of SizeStack where outputs are the concatenation of its inputs).

This might be related to #40, but I think it is neither sufficient nor necessary. Simplest I can think of is to reformulate the size constraint for SizeStack to look at each input individually instead of the total output.

For example, if one input has increased size from N1 to N1+d1 while another has decreased from N2 to N2-d2, then for an exact solution, one shall select N1 + N2 - d2 existing outputs and add d1 new outputs. The input-to-output mapping constraints should take care of the rest.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Revert of remove_edge! after PostSelectOutputs does not revert sizes correctly

Happens when output selection fails with size stack.

Here is roughly the sequence of events:

PostSelectOutput applies NoutRevert, reverting all vertices to their original size
After this FailAlignSizeRevert is applied meaning that vout is added back as output to vin with the strategy NoSizeChange
This will add nout(vi) to the output size of vo without any further size changes, causing it to be too big

mwe to run in structure.jl:

            @testset "PostSelectOutputs fail select" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 3, name="v1")
                v2 = av(v0, 4, name="v2")
                v3 = av(v0, 5, name="v3")
                v4 = sv(v1,v2,v3, name = "v4")
                v5 = av(v4, 3, name="v5")

                @test_logs (:warn, r"Could not align size") remove_edge!(v2, v4, strategy=PostSelectOutputs(
                align = PostAlignJuMP(), select = NoutRevert(), fallback=FailAlignSizeWarn()))

                @test inputs(v4) == [v1, v2, v3]
                @test nin(v4) == nout.([v1, v2, v3]) == [3,4,5]
                @test nin_org(v4) == nout_org.([v1, v2, v3]) == [3,4,5]
                @test [nout(v4)] == nin(v5) == [3+4+5]
                @test [nout_org(v4)] == nin_org(v5) == [3+4+5]
            end

Add functionality to add layers

Much easier than removing or changing size if one imposes the constraing that new layers are initalized with nin=nout.

Not sure if more is needed. I guess one can always mutate the size afterwards

QoL improvements when declaring strategies

A pattern of chaining strategies like has emerged in this form:
``` PrimaryStrategy(params..., FallBackStrategy1(params..., FallBackStrategy2(params... etc````

While I do like the flexibility this offers functionally, it tends to look like dogbarf in the code which declares it.

Real life horror example:

(RemoveStrategy(CheckAligned(CheckNoSizeCycle(ApplyMutation(SelectOutputs(select = SelectDirection(OutSelect{NaiveNASlib.Exact}(NaiveNASlib.LogSelectionFallback("Reverting...", NoutRevert()))), valuefun = default_neuronselect, align=IncreaseSmaller(DecreaseBigger(AlignSizeBoth(FailAlignSizeWarn()))))), FailAlignSizeWarn(msgfun = (vin,vout) -> "Can not remove vertex $(name(vin))! Size cycle detected!"))))

One small thing which could give a little payoff is to just allowing setting the last fallback strategy (usually either throw error or do nothing).

Select outputs after creating an edge to size stack at position other than last fails

Root cause is that variable array for selecting existing outputs can only select from previous existing outputs.

The logic to deal with this in inoutconstraint!(s, ::SizeStack, v, model, vardict::Dict) assumes that any misaligned inputs are last in inputs(v).

It might be possible to avoid aligning select vars but do align insertion vars as they can be assumed to have the same size. MWE below fails to align sizes and reverts the edge addition when this change is made though.

MWE testset which runs in select.jl:

    @testset "Create edge to SizeStack pos 1" begin
        inpt = iv(3)
        v1 = av(inpt, 2, "v1")
        v2 = av(inpt, 3, "v2")
        v3 = cc(v1, name="v3")
        v4 = av(v3, 3, "v4")

        g = CompGraph(inpt, v3)
        @test size(g(ones(Float32, 1,3))) == (1, nout(v3))

        create_edge!(v2, v3, pos=1)
        Δoutputs(v3, v -> 1:nout_org(v))

        @test in_inds(op(v4)) == [-1,-1,-1,1,2]

        @test size(g(ones(Float32, 1,3))) == (1, nout(v3))
    end

Size cycles when adding edge

There is currently no check for size cycles (#34) when adding a new edges. Current handling only works for removal due to

  1. It actually does not check anything unless vin==vout and vin/vout is to be removed
  2. It checks before the edge has been made, so if adding the edge creates the cycle it wouldn't find it even without the above check

Handle "size loops" when changing structure

Removing a vertex which happens to be the only SizeAbsorb vertex in a fork-path which is eventually joined by a SizeInvariant vertex is makes the graph invalid if there is a SizeStack with more than one input after the removed vertex.

Reason is that this results in the impossible situation where the output size of the SizeStack must be equal to its own output size (due to SizeInvariant forcing this on the path where the SizeAbsorb vertex was removed) plus a non-zero term (from the other inputs).

Possible strategies to deal with this are:
1. Detect the size-cycle and don't remove such vertices (probably easy). done in #35
2. Remove the whole path (might be hard).
3. Remove the vertex and connect the loose ends to some "nearby" vertices (another path maybe) for which the situation does not occur (head hurts).

Change everything to MIP

As a MIP solver is anyways used for selection, one might as well rewrite all types of size changing ops as a MIP program to

  1. Reduce code base size
  2. Improve program correctness (current traversal alg seems to fail in some yet-to-figure-out-why cases)

Almost done WIP in #37

Add option to change (decorating) MutationTrait

The fancy debugging stuff would be alot more useful if it would be possible to plug it in to an existing model.

MutationVertex is already prepared for it, so it should be a matter of

  1. Exposing it from to top level API (copy)
  2. Handle it (by ignoring it) for all vertices which don't have a MutationTrait

Support for flow control

There is currently very limited (at best) support for controlling the program flow (e.g. compute this x-times or compute this if y, else that, neural ODEs etc.).

A possible approach is to implement some kind of "graph in vertex" concept, where the computation of a vertex may consist of one or more graphs (which may in turn be mutated just as any other graph).

Some of the things needed of the top of my head:

  1. Not too cumbersome way to combine graph(s) and control flow so that graphs are accessible for mutation.
    -Maybe a struct with graphs and computation (which uses those graphs in an arbitrary way) as separate fields is enough.
  2. Ability to tie sizes of arbitrary vertices together.
    -For example, looping the output back as input obviously requires that output size is same as input size.
    -A use graph1 if x, else graph2 type of vertex also requires that both graph1 and graph2 are aligned with the input/output size of the vertex they are inside.

Incorrect outputs selected by PostSelectOutputs and remove_edge! of SizeStack

Reason is basically that the edge is removed before outputs are selected.

Mwe to run in structure.jl:

            @testset "PostSelectOutputs SizeStack" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 3, name="v1")
                v2 = av(v0, 4, name="v2")
                v3 = av(v0, 5, name="v3")
                v4 = sv(v1,v2,v3, name = "v4")
                v5 = av(v4, 3, name="v5")

                remove_edge!(v2, v4, strategy=PostSelectOutputs(valuefun = v -> 1:nout_org(v)))

                @test inputs(v4) == [v1, v3]
                @test nin(v4) == nout.([v1, v3]) == [3,5]
                @test out_inds(op(v1)) == 1:3
                @test out_inds(op(v2)) == 1:4
                @test out_inds(op(v3)) == 1:5
                # This would be better if it was [1:3;8:12] but remove_vertex removes the edge before PostSelectOutputs has a chance to see it :(
                @test out_inds(op(v4)) == 1:8
                @test in_inds(op(v5)) == [1:8]
            end

Safer handling of non-constrained vertices when doing output selection

Not 100% sure why this happens but...

When doing output selection on the smaller delta-size graph the result is sometimes inconsistent w.r.t size.

Prime suspect is the case when a vertex is part of the graph only because its inputs are touched and it has value <= 0 for some of its output. This would then cause the optimizer to not select those outputs as nothing constrains if from doing so. As it is only the input which is relevant, its outputs are not part of the set of vertices which will see changes.

Short term solution is to ensure value metric is always positive.

Preferable solution would be to not make its outputs part of the MIP model while still keeping it as something which might need its inputs updated.

Improve syntax

Library is quite verbose (and probably difficult to use) due to all the layers of wrapped structs.

I guess adding convenience methods for most normal operations is a low hanging fruit.

Remove/refactor MutationOps

The design of MutationOps (contents of op.jl) are a result of now obsolete design choices. In the current design (post #37) they are confusing and add bloat.

Not sure exactly how to clean them up as some kinds of "future size" metadata is still needed for output selection, mostly due to #40 which I think prevents implementing something like "select or insert outputs based on this new desired size".

Currently the library also tries to cater for the case when one does not want to prune an existing model but instead just change the sizes of some architecture "template" or "specification" which generates new models from scratch.

The current design kinda silently handles both without making any assumptions on which one the user wants to do. However, applying a size only mutation to an actual network might cause severe performance degradations as outputs are then misaligned with inputs.

If some "select or insert outputs based on this new desired size" can be created I hope this would allow for just letting the same API call (ie deltaN{in,out}) perform size change or outputs selection based on what the vertex represents (e.g. an actual layer with existing weights or an architecture spec).

AlignNinToNout leaves undefined references

The code in vertexconstraints!(v::AbstractVertex, s::AlignNinToNout, data) kinda secretly assumes that every output vertext of a vertex in the set of vertices to solve the problem for is part of that set, and that is not the case.

If this assumption is not true for a vertex, it will have undefined references in its array of new nins.

MWE to run in edge testset in structure.jl below. Vertex vh is not in set as it is not affected by any size change and will therefore have an undef nin-variable.

            @testset "Add with hidden SizeStack" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 5, name="v1")
                v2 = av(v0, 4, name="v2")
                vh = av(v0, 5, name="vh")
                v3 = sv(v1, name = "v3")
                v4 = av(v3, 3, name="v4")
                v5 = sv(v4, vh, name="v5")
                v6 = av(v2, 2, name="v6")


                @test inputs(v3) == [v1]
                create_edge!(v2, v3)

                @test inputs(v3) == [v1, v2]
                @test nin(v4) == [nout(v3)] == [nout(v1) + nout(v2)] == [9]

                @test outputs(v2) == [v6, v3]
                @test inputs(v6) == [v2]
                @test nin(v6) == [nout(v2)] == [4]

                @test outputs(vh) == [v5]
                @test inputs(v5) == [v4, vh]
                @test nin(v5) == [nout(v4), nout(vh)] == [3, 5]
            end

Add possibility to wrap conc and elem ops

The sugar for concatenating or doing element wise operations on vertex outputs imposes a limitation in the sense that one can not do anything else in the same vertex. One example of what one might want to do is to log output (although this maybe is better to do in a vertex so it is possible to also log the name)or calculate some neuron value metric.

Replace CBC with HIGHS

HiGHS seem to be better maintained and is often recommended when people run into issues with CBC (e.g. #99 😄 ).

This seems to be the major stopper for now. Even increasing the sizes so the test takes several minutes does not solve it. Worst case I can just skip testing or maybe use CBC only for that test but I'd rather not.

Revert of remove_edge! always adds back edge at last input position

This obviously does not revert the operation...

mwe to run in structure tests:

            @testset "Revert remove edge SizeStack" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 5, name="v1")
                v2 = av(v0, 4, name="v2")
                v3 = sv(v0, v1, v2, name = "v3")
                v4 = av(v3, 3, name="v4")
                v5 = av(v2, 2, name="v5")

                @test inputs(v3) == [v0, v1, v2]
                @test nin(v3) == nout.([v0, v1, v2]) == [3,5,4]
                @test [nout(v3)] == nin(v4) == [3+5+4]
                @test nin_org(v3) == nout_org.([v0, v1, v2]) == [3,5,4]
                @test [nout_org(v3)] == nin_org(v4) == [3+5+4]

                struct RevertPost <: AbstractAlignSizeStrategy end
                NaiveNASlib.postalignsizes(::RevertPost, vin, vout) = NaiveNASlib.postalignsizes(FailAlignSizeRevert(), vin, vout)

                remove_edge!(v1, v3, strategy=RevertPost())

                @test inputs(v3) == [v0, v1, v2]
                @test nin(v3) == nout.([v0, v1, v2]) == [3,5,4]
                @test [nout(v3)] == nin(v4) == [3+5+4]
                @test nin_org(v3) == nout_org.([v0, v1, v2]) == [3,5,4]
                @test [nout_org(v3)] == nin_org(v4) == [3+5+4]
            end

Create proper docs

Readme is not easy to keep in sync with library updates as it is always the latest version which is shown.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.