drchainsaw / naivenaslib.jl Goto Github PK

Relentless mutation!!

License: MIT License

Julia 100.00%

deep-learning machine-learning neural-networks transfer-learning morphisms architecture-search hyperparameter-optimization mutation

naivenaslib.jl's People

Contributors

Stargazers

Watchers

Forkers

simeonschaub

naivenaslib.jl's Issues

Work out the pruning use case

Pruning is basically supported, but it would probably need some more quality of life kinda-functions as the process is probably a bit awkward, especially if one wants to automate it.

Pain points:

One needs to create a copy of the graph with IoInidices
Size changes are not really visible after the metadata has been changed. Add some MutationOp to spy on them when they happen?
There exists no simple way right now to translate a "size changed" graph into a pruned one.

Especially concatenations have a tendency to differ as sizes are just split while indices can not be "split". Would be good to have a function which somehow helps getting the right number of indicies from each vertex.

What to do if a whole branch/tower/path gets pruned (no indices from it is selected)? Crash? Remove is silently (or with warning)? All vertices in it? Or try to reconnect it to some size absorbing vertex in a neighbouring branch/path/tower... ugh...

Remove legacy size change algs

After #37, the legacy algs for changing size shall be removed as the new way has proven to handle more cases with less code.

Help with doing parameter selection in nin direction

See testset "Mutate-prune" - "Merge two vertices".

Basic issue is that when selecting indices which might propagate through a SizeStack vertex after having changed the size with an IoChange mutation, one must be careful to not select "out of range".

Either some improvement to allow for the param selection to override the size or some helper function to see what are the valid indices, or maybe both would be useful.

For now, one is better of selecting parameters from the bottom in the "nout" direction.

FailAlignSizeRevert fails silently if same vertex is input multiple times

Does not seem like an important use case except that in a non-human search policy this might happen and cause graphs to become inconsistent w.r.t size.

flatten does not give topological order for concatenations with same input vertex

Example:

in is input to p1,p2 and p3 and c has p1,p2 and p3 as input.

Problem is that dfs finds in though p1 before pushing p2 and p3.

Checklist for 2.0

With #40 potentially being solved it might be feasible to go for #41. Since #41 is breaking I'll just list a few things in there which I would also like to change for 2.0 to minimize risk of forgetting them.

By judging from the influx of issues I'm probably the only user of this package and therefore I'm planning to be quite liberal with breakage. Hopefully the end result is a leaner, more idiomatic API which is easier to use. If you are user of this package and don't want unnecessary breakage then please let me know.

Remove size state
Remove the min delta factor functions
Add bangs to mutating methods
Add support for Functors.jl
Remove most of the pretty printing and vertex formatting stuff
Remove dependencies to LightGraphs and MetaGraphs
Remove export of internal/special purpose functions

Loosen up output insert constraints

Current way to model the "rules" for inserting new outputs put uneccessary constraints on the model:

There is a hard cap on the maximum number of inserts
The number of inserts plus the number of selected outputs must be equal to the hard cap of 1.

Reason for 1. is that insertions are modeled in the same ways as selections, with a binary variable per possible insertion position where 'true' means 'insert' and 'false' means 'dont insert' and the number of binary variables must ofc be limited to something.

Reason for 2. is to prevent impossible outcomes such as 'keep current outputs number 1,3 and 5 and insert a new output at position 10'. What to do with outputs 4-9 in this case? There is absolutely no guarantee that the result is feasible if one just squash the output and insert a new output at position 4.

A potential way to relax 1 could be to replace the binary contraint on the insertion variable with a >= 0 constraint and let the outcome represent how many consecutive outputs to insert starting at that position. If variable has the same size as the (current) number of outputs it should cover all possible ways to insert new neurons, or?

Relaxing 2 seems a fair bit trickier.

One thing that maybe can be exploited: I think that things work out if any deficits are strictly confined to the end of the inservar. I.o.w. selectvar = [1, 1, 0, 0, 1, 0], insertvar = [1 ,0, 0, 0, 1, 0, 0] is feasible as this corresponds to "select indices 1,2 and 5 and insert new outputs at position 1 and 5" as this results in [new, 1, 2, 5, new, ?, ?] and last two indices can simply be dropped, while selectvar = [1, 1, 0, 0, 1, 0], insertvar = [1 ,0, 0, 0, 0, 1, 0] is not feasible as this corresponds to "select indices 1, 2 and 5 and insert a new output at position 1 and position 6" resulting in [new, 1, 2, 5, ?, new, ?] .

Here is a short snippet with a binary variable constrained to be the consecutive zeros at the end of another binary variable using MIP formulations of not and and:

    using Test
    @testset "Last consecutive zeros" begin

        import JuMP
        import Cbc
        import JuMP: @variable, @constraint, @objective, @expression

        model =  JuMP.Model(JuMP.with_optimizer(Cbc.Optimizer, loglevel=0))

        x = JuMP.@variable(model, x[1:10], Bin)
        a = JuMP.@variable(model, a[1:10], Bin)

        JuMP.@constraint(model, x[6] == 1)
        JuMP.@constraint(model, a[1] == 1 - x[10])
        JuMP.@constraint(model, [i=1:9], 0 <= 1 - x[10-i] + a[i] - 2 * a[i+1] <= 1)

        JuMP.@objective(model, Max, sum(a))

        JuMP.optimize!(model)

        @test JuMP.termination_status(model) == JuMP.MOI.OPTIMAL
        @test JuMP.value.(x) == [0, 0, 0, 0, 0, 1, 0, 0 ,0 ,0]
        @test JuMP.value.(a) == [1, 1, 1, 1, 0, 0, 0, 0, 0, 0]

    end

What is missing from the above is how to deal with selections. For instance, if sum(select) + sum(insert) + sum(conseczeros) == length(insert) one is constrained to either insert or select but not both. Would it be enough to change the equality to an inequality or do I need to have some kind of and relation between select and conceczeros?

Not top prio right now as things work ok with current formulation. Will revisit when time permits.

Segfault during testing

PkgEval is running into the following when testing this package:

[ Info: Testing computation
[ Info: Testing gradients
[ Info: Testing mutation
[ Info: Testing size mutation

signal (11): Segmentation fault
in expression starting at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/test/mutation/size.jl:30
_ZN11ClpPresolve20gutsOfPresolvedModelEP10ClpSimplexdbibbPKcS3_ at /home/pkgeval/.julia/artifacts/e4a36d92f6628275dd9546eabfde4e94b1ffb986/lib/libClp.so (unknown line)
_ZN21OsiClpSolverInterface7resolveEv at /home/pkgeval/.julia/artifacts/e4a36d92f6628275dd9546eabfde4e94b1ffb986/lib/libOsiClp.so (unknown line)
_Z8CbcMain1iPPKcR8CbcModelPFiPS2_iER19CbcSolverUsefulData at /home/pkgeval/.julia/artifacts/1263af5e59820ee3b62d2f59e030cdcc86380f82/lib/libCbcSolver.so (unknown line)
Cbc_solve at /home/pkgeval/.julia/artifacts/1263af5e59820ee3b62d2f59e030cdcc86380f82/lib/libCbcSolver.so (unknown line)
Cbc_solve at /home/pkgeval/.julia/packages/Cbc/dIPfi/src/gen/libcbc_api.jl:306 [inlined]
optimize! at /home/pkgeval/.julia/packages/Cbc/dIPfi/src/MOI_wrapper/MOI_wrapper.jl:521
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/MathOptInterface.jl:85 [inlined]
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x7fc30e5fc362)
unknown function (ip: 0x7fc30e5e29f9)
unknown function (ip: 0x7fc30e5e2905)
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/Bridges/bridge_optimizer.jl:376 [inlined]
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/MathOptInterface.jl:85 [inlined]
optimize! at /home/pkgeval/.julia/packages/MathOptInterface/vwZYM/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x7fc30e5e28d2)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
#optimize!#107 at /home/pkgeval/.julia/packages/JuMP/pQApG/src/optimizer_interface.jl:440
optimize! at /home/pkgeval/.julia/packages/JuMP/pQApG/src/optimizer_interface.jl:410 [inlined]
newsizes at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/mutation/size.jl:184
Δsize! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/mutation/size.jl:57
unknown function (ip: 0x7fc30e75f442)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
Δsize! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:62
Δsize! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:76
unknown function (ip: 0x7fc30e73f24d)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
Δnin! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:168
Δnin! at /home/pkgeval/.julia/packages/NaiveNASlib/6yDVa/src/api/size.jl:170

Can be reproduced by checking out PkgEval and doing:

julia --project PkgEval/bin/test_package.jl --julia=stable --name=NaiveNASlib

I'm not sure whether this is a bug with this package or with MathOptInterface.jl, but since it's being flagged as a crash in PkgEval reports it would be good to fix 🙂

select outputs and apply mutation for create edge case

Mostly about convenience as reverting can still be a bit tricky. Hopefully easier than rolling an own variant using internals of NaiveNASlib.

Different changes to inputs to SizeStack causes exact OutSelect to be infeasible

In particular, this happens when (at least) one input increases in size (outputs shall be added) while others decrease so that the total output of the SizeStack is decreased.

Reason is that the constraints for the size only target the output "as a whole" and they say that "no new outputs shall be added". This is indeed infeasible since (at least) one of the inputs have to add new outputs (combined with the nature of SizeStack where outputs are the concatenation of its inputs).

This might be related to #40, but I think it is neither sufficient nor necessary. Simplest I can think of is to reformulate the size constraint for SizeStack to look at each input individually instead of the total output.

For example, if one input has increased size from N1 to N1+d1 while another has decreased from N2 to N2-d2, then for an exact solution, one shall select N1 + N2 - d2 existing outputs and add d1 new outputs. The input-to-output mapping constraints should take care of the rest.

Add functionality for edge removal/addition

Haven't thought so much on what this would entail.

Might open up a can of messy corner cases when sizes shall be aligned...

Negative values without size change does not cause output selection to reduce size

Not even with OutSelectRelaxed.

Reason is that it puts high emphasis on not changing the size. I seems reasonable though to assume that neurons with negative value are not wanted.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Revert of remove_edge! after PostSelectOutputs does not revert sizes correctly

Happens when output selection fails with size stack.

Here is roughly the sequence of events:

PostSelectOutput applies NoutRevert, reverting all vertices to their original size
After this FailAlignSizeRevert is applied meaning that vout is added back as output to vin with the strategy NoSizeChange
This will add nout(vi) to the output size of vo without any further size changes, causing it to be too big

mwe to run in structure.jl:

            @testset "PostSelectOutputs fail select" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 3, name="v1")
                v2 = av(v0, 4, name="v2")
                v3 = av(v0, 5, name="v3")
                v4 = sv(v1,v2,v3, name = "v4")
                v5 = av(v4, 3, name="v5")

                @test_logs (:warn, r"Could not align size") remove_edge!(v2, v4, strategy=PostSelectOutputs(
                align = PostAlignJuMP(), select = NoutRevert(), fallback=FailAlignSizeWarn()))

                @test inputs(v4) == [v1, v2, v3]
                @test nin(v4) == nout.([v1, v2, v3]) == [3,4,5]
                @test nin_org(v4) == nout_org.([v1, v2, v3]) == [3,4,5]
                @test [nout(v4)] == nin(v5) == [3+4+5]
                @test [nout_org(v4)] == nin_org(v5) == [3+4+5]
            end

Add functionality to add layers

Much easier than removing or changing size if one imposes the constraing that new layers are initalized with nin=nout.

Not sure if more is needed. I guess one can always mutate the size afterwards

QoL improvements when declaring strategies

A pattern of chaining strategies like has emerged in this form:
``` PrimaryStrategy(params..., FallBackStrategy1(params..., FallBackStrategy2(params... etc````

While I do like the flexibility this offers functionally, it tends to look like dogbarf in the code which declares it.

Real life horror example:

(RemoveStrategy(CheckAligned(CheckNoSizeCycle(ApplyMutation(SelectOutputs(select = SelectDirection(OutSelect{NaiveNASlib.Exact}(NaiveNASlib.LogSelectionFallback("Reverting...", NoutRevert()))), valuefun = default_neuronselect, align=IncreaseSmaller(DecreaseBigger(AlignSizeBoth(FailAlignSizeWarn()))))), FailAlignSizeWarn(msgfun = (vin,vout) -> "Can not remove vertex $(name(vin))! Size cycle detected!"))))

One small thing which could give a little payoff is to just allowing setting the last fallback strategy (usually either throw error or do nothing).

Select outputs after creating an edge to size stack at position other than last fails

Root cause is that variable array for selecting existing outputs can only select from previous existing outputs.

The logic to deal with this in inoutconstraint!(s, ::SizeStack, v, model, vardict::Dict) assumes that any misaligned inputs are last in inputs(v).

It might be possible to avoid aligning select vars but do align insertion vars as they can be assumed to have the same size. MWE below fails to align sizes and reverts the edge addition when this change is made though.

MWE testset which runs in select.jl:

    @testset "Create edge to SizeStack pos 1" begin
        inpt = iv(3)
        v1 = av(inpt, 2, "v1")
        v2 = av(inpt, 3, "v2")
        v3 = cc(v1, name="v3")
        v4 = av(v3, 3, "v4")

        g = CompGraph(inpt, v3)
        @test size(g(ones(Float32, 1,3))) == (1, nout(v3))

        create_edge!(v2, v3, pos=1)
        Δoutputs(v3, v -> 1:nout_org(v))

        @test in_inds(op(v4)) == [-1,-1,-1,1,2]

        @test size(g(ones(Float32, 1,3))) == (1, nout(v3))
    end

Size cycles when adding edge

There is currently no check for size cycles (#34) when adding a new edges. Current handling only works for removal due to

It actually does not check anything unless vin==vout and vin/vout is to be removed
It checks before the edge has been made, so if adding the edge creates the cycle it wouldn't find it even without the above check

Handle "size loops" when changing structure

Removing a vertex which happens to be the only SizeAbsorb vertex in a fork-path which is eventually joined by a SizeInvariant vertex is makes the graph invalid if there is a SizeStack with more than one input after the removed vertex.

Reason is that this results in the impossible situation where the output size of the SizeStack must be equal to its own output size (due to SizeInvariant forcing this on the path where the SizeAbsorb vertex was removed) plus a non-zero term (from the other inputs).

Possible strategies to deal with this are:
~~1. Detect the size-cycle and don't remove such vertices (probably easy).~~ done in #35
2. Remove the whole path (might be hard).
3. Remove the vertex and connect the loose ends to some "nearby" vertices (another path maybe) for which the situation does not occur (head hurts).

Remove AbsorbVertex, StackingVertex, InvariantVertex

Methods are obsolete by now and only clutter code (and API).

Best to remove them completely, but if too much effort they can be moved to testutils.

Change everything to MIP

As a MIP solver is anyways used for selection, one might as well rewrite all types of size changing ops as a MIP program to

Reduce code base size
Improve program correctness (current traversal alg seems to fail in some yet-to-figure-out-why cases)

Almost done WIP in #37

Add option to change (decorating) MutationTrait

The fancy debugging stuff would be alot more useful if it would be possible to plug it in to an existing model.

MutationVertex is already prepared for it, so it should be a matter of

Exposing it from to top level API (copy)
Handle it (by ignoring it) for all vertices which don't have a MutationTrait

Support for flow control

There is currently very limited (at best) support for controlling the program flow (e.g. compute this x-times or compute this if y, else that, neural ODEs etc.).

A possible approach is to implement some kind of "graph in vertex" concept, where the computation of a vertex may consist of one or more graphs (which may in turn be mutated just as any other graph).

Some of the things needed of the top of my head:

Not too cumbersome way to combine graph(s) and control flow so that graphs are accessible for mutation.
-Maybe a struct with graphs and computation (which uses those graphs in an arbitrary way) as separate fields is enough.
Ability to tie sizes of arbitrary vertices together.
-For example, looping the output back as input obviously requires that output size is same as input size.
-A use graph1 if x, else graph2 type of vertex also requires that both graph1 and graph2 are aligned with the input/output size of the vertex they are inside.

Incorrect outputs selected by PostSelectOutputs and remove_edge! of SizeStack

Reason is basically that the edge is removed before outputs are selected.

Mwe to run in structure.jl:

            @testset "PostSelectOutputs SizeStack" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 3, name="v1")
                v2 = av(v0, 4, name="v2")
                v3 = av(v0, 5, name="v3")
                v4 = sv(v1,v2,v3, name = "v4")
                v5 = av(v4, 3, name="v5")

                remove_edge!(v2, v4, strategy=PostSelectOutputs(valuefun = v -> 1:nout_org(v)))

                @test inputs(v4) == [v1, v3]
                @test nin(v4) == nout.([v1, v3]) == [3,5]
                @test out_inds(op(v1)) == 1:3
                @test out_inds(op(v2)) == 1:4
                @test out_inds(op(v3)) == 1:5
                # This would be better if it was [1:3;8:12] but remove_vertex removes the edge before PostSelectOutputs has a chance to see it :(
                @test out_inds(op(v4)) == 1:8
                @test in_inds(op(v5)) == [1:8]
            end

Safer handling of non-constrained vertices when doing output selection

Not 100% sure why this happens but...

When doing output selection on the smaller delta-size graph the result is sometimes inconsistent w.r.t size.

Prime suspect is the case when a vertex is part of the graph only because its inputs are touched and it has value <= 0 for some of its output. This would then cause the optimizer to not select those outputs as nothing constrains if from doing so. As it is only the input which is relevant, its outputs are not part of the set of vertices which will see changes.

Short term solution is to ensure value metric is always positive.

Preferable solution would be to not make its outputs part of the MIP model while still keeping it as something which might need its inputs updated.

Improve syntax

Library is quite verbose (and probably difficult to use) due to all the layers of wrapped structs.

I guess adding convenience methods for most normal operations is a low hanging fruit.

Outputs selection fails for SizeStack if other than last vertex was originally misaligned

In normal operations this should not occur, barring #59, but it can happen under somewhat unfortunate circumstances when doing things like removing vertices with NoSizeChange because one knows that a new edge will be added and one does not wish to replace the neurons in the next layer.

Remove/refactor MutationOps

The design of MutationOps (contents of op.jl) are a result of now obsolete design choices. In the current design (post #37) they are confusing and add bloat.

Not sure exactly how to clean them up as some kinds of "future size" metadata is still needed for output selection, mostly due to #40 which I think prevents implementing something like "select or insert outputs based on this new desired size".

Currently the library also tries to cater for the case when one does not want to prune an existing model but instead just change the sizes of some architecture "template" or "specification" which generates new models from scratch.

The current design kinda silently handles both without making any assumptions on which one the user wants to do. However, applying a size only mutation to an actual network might cause severe performance degradations as outputs are then misaligned with inputs.

If some "select or insert outputs based on this new desired size" can be created I hope this would allow for just letting the same API call (ie deltaN{in,out}) perform size change or outputs selection based on what the vertex represents (e.g. an actual layer with existing weights or an architecture spec).

AlignNinToNout leaves undefined references

The code in vertexconstraints!(v::AbstractVertex, s::AlignNinToNout, data) kinda secretly assumes that every output vertext of a vertex in the set of vertices to solve the problem for is part of that set, and that is not the case.

If this assumption is not true for a vertex, it will have undefined references in its array of new nins.

MWE to run in edge testset in structure.jl below. Vertex vh is not in set as it is not affected by any size change and will therefore have an undef nin-variable.

            @testset "Add with hidden SizeStack" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 5, name="v1")
                v2 = av(v0, 4, name="v2")
                vh = av(v0, 5, name="vh")
                v3 = sv(v1, name = "v3")
                v4 = av(v3, 3, name="v4")
                v5 = sv(v4, vh, name="v5")
                v6 = av(v2, 2, name="v6")


                @test inputs(v3) == [v1]
                create_edge!(v2, v3)

                @test inputs(v3) == [v1, v2]
                @test nin(v4) == [nout(v3)] == [nout(v1) + nout(v2)] == [9]

                @test outputs(v2) == [v6, v3]
                @test inputs(v6) == [v2]
                @test nin(v6) == [nout(v2)] == [4]

                @test outputs(vh) == [v5]
                @test inputs(v5) == [v4, vh]
                @test nin(v5) == [nout(v4), nout(vh)] == [3, 5]
            end

Add logging and validation of mutation actions

Preferably as decorating traits...

Export DecoratingTrait

Nuff said...

Add possibility to wrap conc and elem ops

The sugar for concatenating or doing element wise operations on vertex outputs imposes a limitation in the sense that one can not do anything else in the same vertex. One example of what one might want to do is to log output (although this maybe is better to do in a vertex so it is possible to also log the name)or calculate some neuron value metric.

Replace CBC with HIGHS

HiGHS seem to be better maintained and is often recommended when people run into issues with CBC (e.g. #99 😄 ).

This seems to be the major stopper for now. Even increasing the sizes so the test takes several minutes does not solve it. Worst case I can just skip testing or maybe use CBC only for that test but I'd rather not.

Revert of remove_edge! always adds back edge at last input position

This obviously does not revert the operation...

mwe to run in structure tests:

            @testset "Revert remove edge SizeStack" begin
                v0 = inpt(3, "v0")
                v1 = av(v0, 5, name="v1")
                v2 = av(v0, 4, name="v2")
                v3 = sv(v0, v1, v2, name = "v3")
                v4 = av(v3, 3, name="v4")
                v5 = av(v2, 2, name="v5")

                @test inputs(v3) == [v0, v1, v2]
                @test nin(v3) == nout.([v0, v1, v2]) == [3,5,4]
                @test [nout(v3)] == nin(v4) == [3+5+4]
                @test nin_org(v3) == nout_org.([v0, v1, v2]) == [3,5,4]
                @test [nout_org(v3)] == nin_org(v4) == [3+5+4]

                struct RevertPost <: AbstractAlignSizeStrategy end
                NaiveNASlib.postalignsizes(::RevertPost, vin, vout) = NaiveNASlib.postalignsizes(FailAlignSizeRevert(), vin, vout)

                remove_edge!(v1, v3, strategy=RevertPost())

                @test inputs(v3) == [v0, v1, v2]
                @test nin(v3) == nout.([v0, v1, v2]) == [3,5,4]
                @test [nout(v3)] == nin(v4) == [3+5+4]
                @test nin_org(v3) == nout_org.([v0, v1, v2]) == [3,5,4]
                @test [nout_org(v3)] == nin_org(v4) == [3+5+4]
            end

Create proper docs

Readme is not easy to keep in sync with library updates as it is always the latest version which is shown.