ctuavastlab / mill.jl Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 8.0 7.3 MB

Prototype flexible hierarchical multi-instance learning models.

Home Page: https://ctuavastlab.github.io/Mill.jl/stable/

License: MIT License

Julia 100.00%

flux hierarchical-data json julia machine-learning multi-instance-learning

mill.jl's People

Contributors

Stargazers

Watchers

Forkers

racinmat pawbz dhairyalgandhi tlauli playfloor masenka31 rektomar ahadzic7

mill.jl's Issues

About converting CPU training to GPU

I am trying to speed up the training process by using GPU:

# implements the multiple-instance learning model using Neural Networks, as described in
# https://arxiv.org/abs/1609.07257
# Using Neural Network Formalism to Solve Multiple-Instance Problems, Tomas Pevny, Petr Somol
using FileIO, JLD2, Statistics, Mill, Flux
using Flux: throttle, @epochs
using Mill: reflectinmodel
using Base.Iterators: repeated
using CUDAapi, CUDAdrv, CUDAnative

gpu_id = 0

if has_cuda_gpu() && gpu_id >=0
    device!(gpu_id)
    device = Flux.gpu
    @info "Training on GPU-$(gpu_id)"
else
    device = Flux.cpu
    @info "Training on CPU"
end

# load the musk dataset
fMat = load("example/musk.jld2", "fMat")            # matrix with instances, each column is one sample
bagids = load("example/musk.jld2", "bagids")        # ties instances to bags
x = BagNode(ArrayNode(fMat), bagids)        # create BagDataset
y = load("example/musk.jld2", "y")                  # load labels
y = map(i -> maximum(y[i]) + 1, x.bags)     # create labels on bags
y_oh = Flux.onehotbatch(y, 1:2)             # one-hot encoding

# create the model
model = BagModel(
    ArrayModel(Dense(166, 10, Flux.tanh)),                      # model on the level of Flows
    SegmentedMeanMax(10),                                       # aggregation
    ArrayModel(Chain(Dense(20, 10, Flux.tanh), Dense(10, 2)))) |> device  # model on the level of bags

# define loss function
loss(x, y_oh) = Flux.logitcrossentropy(model(x |> device).data, y_oh |> device)

# the usual way of training
evalcb = throttle(() -> @show(loss(x |> device, y_oh |> device)), 1)
opt = Flux.ADAM()
@epochs 10 Flux.train!(loss, params(model), repeated((x, y_oh), 1000), opt, cb=evalcb)

# calculate the error on the training set (no testing set right now)
mean(mapslices(argmax, model(x |> device).data, dims=1)' .!= y)

But an error raised:

ArgumentError: cannot take the CPU address of a CuArrays.CuArray{Float32,2,Nothing}

Stacktrace:
 [1] unsafe_convert(::Type{Ptr{Float32}}, ::CuArrays.CuArray{Float32,2,Nothing}) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/CuArrays/YFdj7/src/array.jl:226
 [2] gemm!(::Char, ::Char, ::Float32, ::CuArrays.CuArray{Float32,2,Nothing}, ::Array{Float32,2}, ::Float32, ::Array{Float32,2}) at /home/buildbot/build-worker/worker/juliapro-release-centos7-0_6/build/tmp_julia/share/julia/stdlib/v1.4/LinearAlgebra/src/blas.jl:1167
 [3] gemm_wrapper!(::Array{Float32,2}, ::Char, ::Char, ::CuArrays.CuArray{Float32,2,Nothing}, ::Array{Float32,2}, ::LinearAlgebra.MulAddMul{true,true,Bool,Bool}) at /home/buildbot/build-worker/worker/juliapro-release-centos7-0_6/build/tmp_julia/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:597
 [4] mul! at /home/buildbot/build-worker/worker/juliapro-release-centos7-0_6/build/tmp_julia/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:169 [inlined]
 [5] mul! at /home/buildbot/build-worker/worker/juliapro-release-centos7-0_6/build/tmp_julia/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:208 [inlined]
 [6] * at /home/buildbot/build-worker/worker/juliapro-release-centos7-0_6/build/tmp_julia/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:160 [inlined]
 [7] adjoint at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/lib/array.jl:310 [inlined]
 [8] _pullback at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [9] Dense at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/layers/basic.jl:122 [inlined]
 [10] Dense at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/layers/basic.jl:133 [inlined]
 [11] applychain at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/layers/basic.jl:36 [inlined]
 [12] Chain at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/layers/basic.jl:38 [inlined]
 [13] #159 at /data/zhangzhi/nn-rebuttal/Mill.jl/src/modelnodes/arraymodel.jl:14 [inlined]
 [14] mapdata at /data/zhangzhi/nn-rebuttal/Mill.jl/src/datanodes/datanode.jl:57 [inlined]
 [15] mapdata at /data/zhangzhi/nn-rebuttal/Mill.jl/src/datanodes/arraynode.jl:18 [inlined]
 [16] ArrayModel at /data/zhangzhi/nn-rebuttal/Mill.jl/src/modelnodes/arraymodel.jl:14 [inlined]
 [17] _pullback(::Zygote.Context, ::ArrayModel{…}, ::ArrayNode{…}) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [18] BagModel at /data/zhangzhi/nn-rebuttal/Mill.jl/src/modelnodes/bagmodel.jl:28 [inlined]
 [19] _pullback(::Zygote.Context, ::BagModel{…}, ::BagNode{…}) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [20] loss at ./In[12]:36 [inlined]
 [21] _pullback(::Zygote.Context, ::typeof(loss), ::BagNode{…}, ::Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [22] adjoint at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/lib/lib.jl:179 [inlined]
 [23] _pullback at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [24] #17 at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/optimise/train.jl:89 [inlined]
 [25] _pullback(::Zygote.Context, ::Flux.Optimise.var"#17#25"{typeof(loss),Tuple{BagNode{…},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [26] pullback(::Function, ::Zygote.Params) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/compiler/interface.jl:172
 [27] gradient(::Function, ::Zygote.Params) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Zygote/1GXzF/src/compiler/interface.jl:53
 [28] macro expansion at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/optimise/train.jl:88 [inlined]
 [29] macro expansion at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Juno/f8hj2/src/progress.jl:134 [inlined]
 [30] train!(::typeof(loss), ::Zygote.Params, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{BagNode{…},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}}, ::ADAM; cb::Flux.var"#throttled#20"{Flux.var"#throttled#16#21"{Bool,Bool,var"#24#25",Int64}}) at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/optimise/train.jl:81
 [31] top-level scope at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Flux/Fj3bt/src/optimise/train.jl:122
 [32] top-level scope at /home/zhangzhi/.juliapro/JuliaPro_v1.4.1-1/packages/Juno/f8hj2/src/progress.jl:134
 [33] top-level scope at In[12]:41

By checking the code, I found that the encapsulation of datanodes, arraymodel and arraynode affects the operation of |> device. I need to hack in the source code of Mill.jl to manually migrate the data to the gpu. In your architecture, do you have any other suggestions to implement cuda operations more simply?

Thanks!

Stop using LearnBase = 0.5 until it's supported by MLDataPattern

Currently, we have LearnBase = "0.4, 0.5" in compat. But we can see only LearnBase=0.4.1 is used in tests
https://github.com/CTUAvastLab/Mill.jl/runs/3668153165 and LearnBase=0.5 can not be used because MLDataPattern.jl supports LearnBase=0.4, but not LearnBase=0.5. In order to prevent many problems once they will actually start supporting it, we should remove it until then. Similar mistake has already caused several headaches to our team, when we added it to compat regardless of JuliaML/MLDataPattern.jl#45.
The code has never ran on LearnBase=0.5 and thus should not be in compat.
cc @simonmandlik

Error in reflectinmodel

julia> ds = BagNode(ArrayNode(rand(1, 1)), [1])
BagNode with 1 bag(s)
  └── ArrayNode(1, 1)

julia> reflectinmodel(x, d -> Dense(d, 1, relu), d -> SegmentedMax(d))
ERROR: MethodError: no method matching typemin(::Type{Tracker.TrackedReal{Float32}})
Closest candidates are:
  typemin(::Type{Bool}) at bool.jl:6
  typemin(::Type{Int8}) at int.jl:665
  typemin(::Type{UInt8}) at int.jl:667
  ...
Stacktrace:
 [1] segmented_max_forw(::TrackedArray{…,Array{Float32,2}}, ::Array{Float32,1}, ::AlignedBags) at /home/cisco/.julia/packages/Mill/PcHi7/src/aggregations/segmented_max.jl:20
 [2] (::SegmentedMax{Array{Float32,1}})(::TrackedArray{…,Array{Float32,2}}, ::AlignedBags, ::Nothing) at /home/cisco/.julia/packages/Mill/PcHi7/src/aggregations/segmented_max.jl:13 (repeats 2 times)
 [3] (::getfield(Mill, Symbol("##95#96")){SegmentedMax{Array{Float32,1}},Tuple{AlignedBags}})(::TrackedArray{…,Array{Float32,2}}) at /home/cisco/.julia/packages/Mill/PcHi7/src/aggregations/segmented_max.jl:12
 [4] mapdata(::getfield(Mill, Symbol("##95#96")){SegmentedMax{Array{Float32,1}},Tuple{AlignedBags}}, ::ArrayNode{TrackedArray{…,Array{Float32,2}},Nothing}) at /home/cisco/.julia/packages/Mill/PcHi7/src/datanodes/arrays.jl:16
 [5] (::SegmentedMax{Array{Float32,1}})(::ArrayNode{TrackedArray{…,Array{Float32,2}},Nothing}, ::AlignedBags) at /home/cisco/.julia/packages/Mill/PcHi7/src/aggregations/segmented_max.jl:12
 [6] (::BagModel{ArrayModel{Dense{typeof(relu),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}}},SegmentedMax{Array{Float32,1}},ArrayModel{typeof(identity)}})(::BagNode{ArrayNode{Array{Float64,2},Nothing},AlignedBags,Nothing}) at /home/cisco/.julia/packages/Mill/PcHi7/src/modelnodes/bagmodel.jl:29
 [7] _reflectinmodel(::BagNode{ArrayNode{Array{Float64,2},Nothing},AlignedBags,Nothing}, ::Function, ::getfield(Main, Symbol("##28#30")), ::Dict{Any,Any}, ::Dict{Any,Any}, ::String) at /home/cisco/.julia/packages/Mill/PcHi7/src/modelnodes/modelnode.jl:18
 [8] #reflectinmodel#133(::Dict{Any,Any}, ::Dict{Any,Any}, ::typeof(reflectinmodel), ::BagNode{ArrayNode{Array{Float64,2},Nothing},AlignedBags,Nothing}, ::Function, ::Function) at /home/cisco/.julia/packages/Mill/PcHi7/src/modelnodes/modelnode.jl:12
 [9] reflectinmodel(::BagNode{ArrayNode{Array{Float64,2},Nothing},AlignedBags,Nothing}, ::Function, ::Function) at /home/cisco/.julia/packages/Mill/PcHi7/src/modelnodes/modelnode.jl

Mill version: v1.0.0

MaybeHotMatrix does not support `Flux.onecold`

We use Flux.onecold as an inversion to onehot encoding.
This works for OneHotMatrix, but not for MaybeHotMatrix. See

julia> t = Flux.onehotbatch(1:3, 1:10)
10×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅

julia> t2 = maybehotbatch(1:3, 1:10)
10×3 MaybeHotMatrix{UInt32, Int64, Bool}:
 1  0  0
 0  1  0
 0  0  1
 0  0  0
 0  0  0
 0  0  0
 0  0  0
 0  0  0
 0  0  0
 0  0  0

julia> Flux.onecold(t)
3-element Vector{Int64}:
 1
 2
 3

julia> Flux.onecold(t2)
ERROR: LoadError: MethodError: no method matching _getindex(::MaybeHotMatrix{UInt32, Int64, Bool}, ::Int64, ::CartesianIndex{1})
Closest candidates are:
  _getindex(::MaybeHotMatrix, ::Union{Integer, AbstractVector{T} where T}, ::Integer) at C:\Users\racinsky\.julia\packages\Mill\f48u2\src\special_arrays\maybe_hot_matrix.jl:32
  _getindex(::MaybeHotMatrix, ::Integer, ::Colon) at C:\Users\racinsky\.julia\packages\Mill\f48u2\src\special_arrays\maybe_hot_matrix.jl:33
  _getindex(::MaybeHotMatrix, ::CartesianIndex{2}) at C:\Users\racinsky\.julia\packages\Mill\f48u2\src\special_arrays\maybe_hot_matrix.jl:34
  ...
Stacktrace:
 [1] getindex(::MaybeHotMatrix{UInt32, Int64, Bool}, ::Int64, ::CartesianIndex{1})
   @ Mill C:\Users\racinsky\.julia\packages\Mill\f48u2\src\special_arrays\maybe_hot_matrix.jl:31
 [2] findminmax!(f::typeof(Base.isgreater), Rval::Matrix{Bool}, Rind::Matrix{CartesianIndex{2}}, A::MaybeHotMatrix{UInt32, Int64, Bool})
   @ Base .\reducedim.jl:928
 [3] _findmax(A::MaybeHotMatrix{UInt32, Int64, Bool}, region::Int64)
   @ Base .\reducedim.jl:1048
 [4] #findmax#726
   @ .\reducedim.jl:1038 [inlined]
 [5] #argmax#728
   @ .\reducedim.jl:1103 [inlined]
 [6] _fast_argmax
   @ C:\Users\racinsky\.julia\packages\Flux\ZnXxS\src\onehot.jl:211 [inlined]
 [7] onecold(y::MaybeHotMatrix{UInt32, Int64, Bool}, labels::UnitRange{Int64}) (repeats 2 times)
   @ Flux C:\Users\racinsky\.julia\packages\Flux\ZnXxS\src\onehot.jl:205
 [8] top-level scope
   @ c:\Projects\others\JsonGrinder.jl\examples\recipes.jl:70
in expression starting at c:\Projects\others\JsonGrinder.jl\examples\recipes.jl:70

It works as

julia> Flux.onecold(Flux.onehotbatch(t2))
3-element Vector{Int64}:
 1
 2
 3

but that feels cumbersome.

Possible speedups

Noting down some areas where significant speedups may be achieved:

vcat in ProductNodes leads to a lot of copying
data deduplication in leaves may lead to lower memory requirements and also to saving some compute (computing ngrams only once for identical strings in NGramMatrix multiplication)
deduplicating instances in BagNodes in a similar fashion

Registering Nodes with Flux

If we want to calculate gradients with respect to input, we should start by adding

Flux.@functor ArrayNode
Flux.@functor BagNode
Flux.@functor TreeNode

This allow Flux.params to return data nodes and we can calculate gradient with respect to it.

Not being able to catobs missing bag with weighted bag

Following code

using Mill, JsonGrinder
e1 = ExtractCategorical(["Olda", "Tonda", "Milda"])
node11 = e1("Olda")
n1 = BagNode(missing, AlignedBags([0:-1]))
n2 = WeightedBagNode(node11, [1:nobs(node11)], ones(4))
reduce(catobs, [n1, n2])

produces
ERROR: UndefVarError: B not defined Stacktrace: [1] reduce(::typeof(catobs), ::Array{AbstractBagNode,1}) at C:\Users\racinsky\.julia\packages\Mill\EkuQf\src\datanodes\datanode.jl:38 [2] top-level scope at none:0

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Use Preferences.jl instead of Ref switches

See https://github.com/JuliaPackaging/Preferences.jl

Documentation of the design

We need to document the design of the library. Particuarly why do we have MillModel, MillFunction, and Aggregation.

Also, we should describe, how one can add the custom data type and how to extend reflection in a model.

The question is, if we really need MillFunction & friends.

Add convenience functions

Allow model to accept AbstractVector{<:AbstractMillNode} so user wouldn't have to call model(reduce(catobs, x)) and could just call model(x) instead.
Add

using Zygote
(m::AbstractMillModel)(x::AbstractVector{<:AbstractMillNode}) = m(Zygote.@ignore(reduce(catobs, x)))

so reduction can be used in gradient computation.
Now it raises error and we don't need to derive through that, so we can stop gradient here and it simplifies the usage.

Editing of the model

We have the problem that models cannot be edited, since they are all static. A following replaceinmodel adds functionality to replace parts of the model and update upstream parts. The current limitation is that we cannot change dimensionality.


replaceinmodel(x, oldnode, newnode) = x
replaceinmodel(x::Mill.ArrayModel, oldnode, newnode) = x == oldnode ? oldnode : Mill.ArrayModel(replaceinmodel(x.m, oldnode, newnode))
function replaceinmodel(x::Mill.BagModel, oldnode, newnode)
	if x == oldnode 
		return(newnode)
	else
		return(BagModel(replaceinmodel(x.im, oldnode, newnode),
			replaceinmodel(x.a, oldnode, newnode),
			replaceinmodel(x.bm, oldnode, newnode)))
	end
end

function replaceinmodel(x::Mill.ProductModel, oldnode, newnode)
	if x == oldnode 
		return(newnode)
	else
		return(ProductModel(tuple([replaceinmodel(m, oldnode, newnode) for m in x.ms]...),
			replaceinmodel(x.m, oldnode, newnode)))
	end
end

Unbearably slow reflection of model (again)

While writing readme, I have found (again) extremely slow creation of model (again.)

using Mill, Flux
julia> ds = BagNode(
    TreeNode(
        (BagNode(ArrayNode(randn(4,10)),[1:2,3:4,5:5,6:7,8:10]),
        ArrayNode(randn(3,5)),
        BagNode(
            BagNode(ArrayNode(randn(2,30)),[i:i+1 for i in 1:2:30]),
            [1:3,4:6,7:9,10:12,13:15]),
        ArrayNode(randn(2,5)))),
    [1:1,2:3,4:5])

m, k = reflectinmodel(ds, d -> Dense(d, 3, relu), d -> SegmentedMeanMax())

It seems to be problem with inference, since the second time this is invoked, it is superfast.
So the problem is in printing. If we do

m, k = reflectinmodel(ds, d -> Dense(d, 3, relu), d -> SegmentedMeanMax());

then it is fast but then just printing the model

is slow.

Reducing bags with some bags missing is not type consistent, resulting in Vector{Any} and losing its type.

When I have vector of bags [bag with treenodes, bag with treenodes, missing bag], the reduction gets broken, causing

┌ Error: cannot reduce Any
└ @ Mill ...\Mill\aKR6u\src\datanodes\datanode.jl:36

Update Flux compat?

Any reason Flux.jl compat version should not be updated to latest 0.11.6?

Segmented_sum

We should add segmented_sum for completness and for the use-case where number of instances in a sample matters.

Rename MillModel

to AbstractMillModel so that it is clear that it is an abstract type.

TODO: solve missing values systematically

catobs of arrays

An intesting question, what should be the output of this?

julia> reduce(catobs, [Matrix{Union{Missing, Float64}}(undef,1,0),[2.3 1.0]])
1×2 Array{Union{Missing, Float64},2}:
 2.3  1.0

Should it be a Matrix of Unions or that of Float64
?

Crashing CatObs

This is part of testing from JsonGrinder

j2 = JSON.parse("""{"c": { "a": {"a":[2,3],"b":[5,6]}}}""")
j3 = JSON.parse("""{"b": {"a":[1,2,3],"b": 1}}""")
j4 = JSON.parse("""{"b": {}}""")
j5 = JSON.parse("""{"b": {}}""")
j6 = JSON.parse("""{}""")

sch = JsonGrinder.schema([j2,j3])
extractor = suggestextractor(sch)
dss = map(extractor, [j2,j3,j4,j5,j6])
dss = map(s -> s[:c], dss)
dss = map(s -> s[:a], dss)
dss = map(s -> s[:a], dss)
dss = map(s -> s.data, dss)
ds = reduce(catobs, dss)
end

which crashes julia. I think that the problem is with bad promotion to any in catobs, when we are handling missings. I hope we have not opened can of worms with that.

It somehow crashes on reduce(catobs of this

5-element Array{Array,1}:
 Float32[2.0; 3.0]
 [missing, missing]
 [missing, missing]
 [missing, missing]
 [missing, missing]

but when I create the above type manually, it does not crashes (but produces a wrong output of concatenating vectors to a single vector.

Terseprint behaving funky again, breaking methods

When terseprint is false, methods are broken.

function experiment(ds::LazyNode{T}) where {T<:Symbol}
	@show ds
	@show T
end
julia> Mill.terseprint(true)
true
julia> methods(experiment)
# 1 method for generic function "experiment":
[1] experiment(ds::LazyNode{…}) where T<:Symbol in Main at C:\Projects\others\Mill.jl\test\lazynode.jl:30
julia> Mill.terseprint(false)
false
julia> methods(experiment)
# 1 method for generic function "experiment":
[1] Error showing value of type Base.MethodList:
ERROR: type DataType has no field var
Stacktrace:
 [1] getproperty at .\Base.jl:28 [inlined]
 [2] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Type{
SYSTEM (REPL): showing an error caused an error
ERROR: type DataType has no field var
Stacktrace:
 [1] getproperty at .\Base.jl:28 [inlined]
 [2] show(::IOContext{REPL.Terminals.TTYTerminal}, ::Type{
SYSTEM (REPL): caught exception of type ErrorException while trying to handle a nested exception; giving up

consistently, when terse printing, it works well, without terse print it breaks terribly

Add a high-level description of a package and also specify tags

Such as "julia", "julia-package", "multi-instance learning", etc..

Add tests for unicode inputs for ngram matrix

Now the unicode input is not tested by ngram matrix. It would be good to have this covered. Look e.g. at the tests in https://github.com/JuliaLogging/LoggingFormats.jl/blob/master/test/runtests.jl#L8-L12

TODO: write tests for HierarchicalUtils.jl integration

reduce for bags

We should create a special version of reduce for Bags to prevent creating a large number of specialized function for different number of parameters.

Accessing nodes from model from show_traversal is not working in some cases.

For following model, accessing nodes is sometimes broken.
Following code, using file in stored here.

using Mill, JLD2, FileIO
@load "broken_model.jld2" model
show_traversal(model)
model["zE"]

causes
ERROR: BoundsError: attempt to access String at index [1:2] Stacktrace: [1] checkbounds at .\strings\basic.jl:193 [inlined] [2] getindex at .\strings\string.jl:247 [inlined]

Handling missing in input data

At the moment, Mill can handle missing values only in bags, but not in ArrayNodes, i.e. in terminal values.
The question is, if want (and should) add support for missing values in Strings, Categorical Arrays, and in Dense Arrays. Pevnak suggests

Missing in dense matrices will be stored in x = Matrix{Union{Missing, T}} where {T<:Number}. Before the multiplication, we substitute missing with some values (trainable), by which x will be converted to Matrix{T} and can be handled as the usual multiplication. I propose to handle substitution and multiplication in ImputingMatrix which will encapsulate regular matrix and the vector with values substituted for missing. Substituting parameters will be made trainable, therefore the network will have a freedom to insert whatever she wishes.
Missing in categorical matrices with k categories (factors in R) will be treated as another k+1 value, which means that if categorical matrix x of size k, n is multiplied from left with a weight matrix w of size o,k, we effectively add one more dimension, i.e. w would be of size o,k+1 and x would be of size k+1, n. The question is, if this "lifting" should be handled externally, e.g. in JsonGrinder or in Mill. For sake of consistency, I would recommend Mill, which means that we would have to create our own OneHotVector, since the default cannot store missing and the conversion. Overloading constructor also does not make much sense, because of the following ambiguity. Assume that OneHotVector(missing, 10) is converted to OneHotVector(11,11). What should I do with OneHotVector(1,10)? Was the original or overloaded variant desired?
Missing strings in NgramMatrix would be handled exactly the same as in categorical matrices. We will extend weight matrix by one more column, which will be used to signal missing.

Notice, that handling missing in Strings and Categorical matrices differs from that in dense matrices as in former cases we are substituting outputs whereas in the latter we are substituting inputs.

ArrayModels are too restrictive

ArrayModel(::T) where T<:Union{Function, Chain, Dense} doesn't allow custom functors for example. We should replace with Base.Callable or with Any.

Refactor readme: PathNode example to use LazyNode

Remove onehot matrix multiplication if the speedup is not large enough

I think we could remove https://github.com/CTUAvastLab/Mill.jl/blob/master/src/util.jl#L30-L41 since FluxML/Flux.jl#1756 has been merged and the speedup compared to Flux 0.12.8 may not be negligible for our datasets.

Bags refactoring

rename AlignedBags to ContigousBags
maybe we do not need structs for bags at all?

TODO: add map-like function to be able to work with individual samples

Unit tests for replacein functions

Return plain arrays instead of ArrayNodes

Maybe we should get rid of wrapping outputs of models into ArrayNode and return them as plain Arrays.

Make whole model application type stable

Currently, @inferred test fails for ProductNodes.

TODO: reflect HierarchicalUtils.jl in README

and also in docs

Float32

Verify that all layers can correctly handle computation in Float32

Constructing model with Matrix with missing values it fails

using Mill
x = ArrayNode([1f0 2f0; missing missing])
reflectinmodel(x, d -> Chain(Dense(d,10, selu), Dense(10, 10)))

crashes

but

using Mill
x = ArrayNode([1f0 2f0; missing missing])
reflectinmodel(x, d ->Dense(d, 10))

works, which suggest the problem in make_imputing

Just questions for attention

I will abuse the system of issues and post some questions for @pevnak for his attention implementation.

Why do we need a segmented sum? Why can't we just use a normal one?
https://github.com/pevnak/Mill.jl/blob/2cec5076a6350aa6a292edf59edd8be3e9e7f5b5/example/attention.jl#L11

Do the 4 in the Dense(d, 4, selu) have some relation to the 4 in SegmentedSum(4)?
https://github.com/pevnak/Mill.jl/blob/2cec5076a6350aa6a292edf59edd8be3e9e7f5b5/example/attention.jl#L34

Error in reduce catobs if treenode has vector instead of tuple

reduce(catobs, nodes) fails for following structure and data, where TreeNode contains vector instead of tuple.

using Mill, JsonGrinder
e1 = ExtractCategorical(["Olda", "Tonda", "Milda"])
e2 = ExtractCategorical(collect(1:10))
node11 = e1("Olda")
node12 = e2([1, 2, 5])
node21 = e1("Tonda")
node22 = e2(4)
t1 = TreeNode([node11, node12])
t2 = TreeNode([node21, node22])
reduce(catobs, [t1, t2])

produces
ERROR: MethodError: no method matching _cattuples(::Array{Array{ArrayNode{SparseMatrixCSC{Float32,Int64},Nothing},1},1})

Add safe defaults

In Mill 2.4.1 the model created by default cannot handle missing values unless they are in the sample on which we are creating the model. This is very confusing to almost everyone. It can be trivially fixed by adding these

Mill._make_imputing(x::MaybeHotVector, t::Dense) = Mill.postimputing_dense(t)
Mill._make_imputing(x::MaybeHotMatrix, t::Dense) = Mill.postimputing_dense(t)
Mill._make_imputing(x::NGramMatrix, t::Dense) = Mill.postimputing_dense(t)

therefore I vote for adding them asap and add a possible control later. Almost everyone is caught by this nuance, which requires super high understanding of Julia, which seems to be generally missing.

methods(Base.show) is broken although our Base.show is commented out

after commenting out our magic around Base.show, methods(Base.show) is still broken
methods(Base.show) works, but

using Mill
methods(Base.show)

is broken

Crashing comparison of arrays with missings

ArrayNode([0.0f0 missing 0.0f0 0.0f0 1.0f0]) == ArrayNode([0.0f0 missing 0.0f0 0.0f0 1.0f0]) causes

ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] ==(::ArrayNode{Array{Union{Missing, Float32},2},Nothing}, ::ArrayNode{Array{Union{Missing, Float32},2},Nothing}) at C:\Projects\others\Mill.jl\src\datanodes\arraynode.jl:45
 [2] top-level scope at none:1

Does it make sense to vcat metadata

function Base.vcat(as::Mill.ArrayNode...)
    data = vcat([a.data for a in as]...)
    metadata = Zygote.@ignore reduce(vcat, [a.metadata for a in as])
    Mill.ArrayNode(data, metadata)
end

terseprint breaks internal julia methods

For example

julia> using Mill
[ Info: Precompiling Mill [1d0525e4-8992-11e8-313c-e310e1f6ddea]
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)

julia> methods(ArrayNode)
# 2 methods for type constructor:
[1] Error showing value of type Base.MethodList:
ERROR: type UnionAll has no field name
Stacktrace:
 [1] getproperty at ./Base.jl:15 [inlined]
 [2] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Type{
SYSTEM (REPL): showing an error caused an error
ERROR: type UnionAll has no field name
Stacktrace:
 [1] getproperty at ./Base.jl:15 [inlined]
 [2] show(::IOContext{REPL.Terminals.TTYTerminal}, ::Type{
SYSTEM (REPL): caught exception of type ErrorException while trying to handle a nested exception; giving up

julia> Mill.terseprint(false)
false

julia> methods(ArrayNode)
# 2 methods for type constructor:
[1] Error showing value of type Base.MethodList:
ERROR: MethodError: no method matching show_datatype(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Type{
Stacktrace:
 [1] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Type{
SYSTEM (REPL): showing an error caused an error
ERROR: MethodError: no method matching show_datatype(::IOContext{REPL.Terminals.TTYTerminal}, ::Type{
Stacktrace:
 [1] show(::IOContext{REPL.Terminals.TTYTerminal}, ::Type{
SYSTEM (REPL): caught exception of type MethodError while trying to handle a nested exception; giving up

Performance tests in continuous integration

We should include some basic performance checks

Explain more thoroughly missing values

And especially how training works, and how are missing values then used during inference.

Support of convolutions

Convolution needs to implement gradient with respect to input.

ArrayNode reflectinmodel sometimes results in weird error

The gradient computation crashes on following code, using following data:
https://ufile.io/8wl0eit1

using JLD2, FileIO, Flux, Mill
@load "weird_node.jld2" x1 y
model = reflectinmodel(x1, d -> Chain(Dense(d, settings.k, relu),),
	d -> SegmentedMeanMax(d),
	b = Dict("" =>  d -> Chain(Dense(d, 2),)))
ps = Flux.params(model)
loss = (model, x, y) -> Flux.logitcrossentropy(model(x).data,y)
loss(model, x1,y)
Flux.logitcrossentropy(model(x1).data,y)
gradient(() -> loss(model, x1,y), ps)

with error
ERROR: MethodError: no method matching zero(::Type{Any}) Closest candidates are: zero(::Type{Union{Missing, T}}) where T at missing.jl:105 zero(::Type{Missing}) at missing.jl:103 zero(::Type{LibGit2.GitHash}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\LibGit2\src\oid.jl:220

MIME"text/html" show overload

Use functionality made by @racinmat, also in ExplainMill and MillExtensions?