lcsb-biocore / gigasom.jl Goto Github PK

Huge-scale, high-performance flow cytometry clustering in Julia

License: Apache License 2.0

Shell 0.70% Julia 97.87% Dockerfile 0.24% TeX 1.19%

artifical-neural-network artificial-intelligence clustering clustering-methods cytof cytometry flow-cytometry huge-scale immunology large-scale mass-cytometry neural-networks self-organizing-map som

gigasom.jl's People

Contributors

Stargazers

Watchers

Forkers

ohunewald vascoverissimo exaexa juliatagbot laurentheirendt standardgalactic

gigasom.jl's Issues

Issue with parallel computation

The following statement is posted under "High-Level Overview" section of the doc:

And I tried with the following funciton:

function test()
    addprocs(4)
    d = [1 2 1 4 5;3 2 1 6 5;3 1 1 7 4]
    som = initGigaSOM(d, 3, 3)
    som = trainGigaSOM(som, d)
    mapToGigaSOM(som, d)
    e = embedGigaSOM(som,d)
end

And lead to a problem with "worker 2":

ERROR: On worker 2:
KeyError: key GigaSOM [a03a9c34-069e-5582-a11c-5c984cab887c] not found
getindex at .\dict.jl:467 [inlined]
root_module at .\loading.jl:968 [inlined]
deserialize_module at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:953
handle_deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:855
deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:773
deserialize_datatype at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:1251
handle_deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:826
deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:773
handle_deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:833
deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:773 [inlined]
deserialize_msg at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\messages.jl:99
#invokelatest#1 at .\essentials.jl:710 [inlined]
invokelatest at .\essentials.jl:709 [inlined]
message_handler_loop at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:185
process_tcp_streams at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:142
#99 at .\task.jl:356
Stacktrace:
 [1] #remotecall_fetch#143 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:394 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Distributed.RRID) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:386
 [3] #remotecall_fetch#146 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:421 [inlined]
 [4] remotecall_fetch at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:421 [inlined]
 [5] call_on_owner at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:494 [inlined]
 [6] fetch(::Future) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:533
 [7] distribute_array(::Symbol, ::Array{Float64,2}, ::Array{Int64,1}; dim::Int64) at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\base\distributed.jl:88
 [8] distribute_array at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\base\distributed.jl:78 [inlined]
 [9] trainGigaSOM(::Som, ::Array{Int64,2}; kernelFun::Function, metric::Distances.Euclidean, somDistFun::Function, knnTreeFun::Type{T} where T, rStart::Float64, rFinal::Float64, radiusFun::Function, epochs::Int64) at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\analysis\core.jl:159
 [10] trainGigaSOM at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\analysis\core.jl:156 [inlined]
 [11] test() at D:\...\runSOM.jl:27
 [12] top-level scope at none:1

Any thoughts?

possible test failure in upcoming Julia version 1.5

A PkgEval run for a Julia pull request which changes the generated numbers for rand(a:b) indicates that the tests of this package might fail in Julia 1.5 (and on Julia current master branch).

Also, you might be interested in using the new StableRNGs.jl registered package, which provides guaranteed stable streams of random numbers across Julia releases.

Apologies if this is a false positive. Cf. https://github.com/JuliaCI/NanosoldierReports/blob/ab6676206b210325500b4f4619fa711f2d7429d2/pkgeval/by_hash/52c2272_vs_47c55db/logs/GigaSOM/1.5.0-DEV-87d2a04de3.log

add sample_id to distributed file splitting

Adding a sample_id column to the splitted files or in a separated vector in each worker to re-construct the relation between the training data after splitting.

Error tagging new release

The REQUIRE file could not be found.
cc: @laurentheirendt

add wrapper for sub-clustering

Adds a user friendly feature for extracting sub-populations and re-clustering

Accessing the intermediate SOM states during the training

Is your feature request related to a problem? Please describe.
I trained my SOM for 2000 epochs, and would like to store intermediate results (each 500 epochs), something like:

datainfo = loadCSVSet(:test,files,header=false)
som = initGigaSOM(datainfo, 20, 20, seed=seed)
radius_list = [10, 8.9, 7.8, 6.7, 5.6, 4.5, 3.4, 2.3, 1.2, 0.5, 0.1]
for i in 1:10:
    som = trainGigaSOM(som, datainfo, rStart=radius_list[i], rFinal=radius_list[i+1], epochs=200, radiusFun=linearRadius)
    e = embedGigaSOM(som, datainfo)
    e2 = distributed_collect(e)
    writedlm(string("GigaSOM_iker_1400k_embed_seed",seed,"_epochs",epochs,".tsv"),e2,'\t')
    open(f -> serialize(f, som), ("partly_trained_%s.jls", i), "w");
end

I want to assess if I did enough training. Problem is that in with this strategy I can only use a linearRadius, or do a very ugly hack
inputing specific radius function.

Describe the solution you'd like
Perhaps one could input starting/ending epoch/iteration to the train function (here:

GigaSOM.jl/src/analysis/core.jl

Line 130 in f4e712b

for j = 1:epochs

Describe alternatives you've considered
Allow to "do something" (call a function) each X epochs in order to serialize the som object, or save the coordinates.

Additional context
none

Adding CSV file import

Although GigaSOM was developed with its main focus on mass cytometry data, it could be used for any kind of multi-dimensional data clustering.

A commun file format as CSV would allow for other data to be imported.

fix seed for random initialization of the SOM

Summary
fix seed for random initialization of the SOM

Expected behavior
re-producibility of the som clustering

implement consensus clustering in pure Julia

Julia implementation of the consensus clustering to avoid the usage of RCall wrapper function to the ConsensusClusterPlus package.

support for different distance metrics

It'd be nice if there were better support for different distance metrics for training the SOM and mapping winners. It looks like this can be implemented easily by pass-through of a metric argument directly to the NearestNeighbors::BruteTree

Bump version v0.6.4

@JuliaRegistrator register

clim kwarg in expressionPalette or expressionColors

I would like to more precisely control the dynamic range of the expressionPalette. For example my expressions range from -2 to 7, but the interesting region lies from -1 to 1. I want to adjust my colour range so any expression values outside of the limits [-1,1] are clipped to the boundary colours.

It is possible to to this by pre-processing the input data, which would re-calculate the inputs each time I change the colour range which doesn't seem to be necessary. It makes more sense to have to colour representation change, not the underlying data.

As you can imagine, this will eventually be used in an interactive figure :)

Missing `dselect` and other functions from `dataops.jl`

Versions after 0.6.8 have the majority of functions erased from dataops.jl

Any

Steps to reproduce

1. Install latest GigaSOM
2. Import distributed dataset
3. Run dselect

Expected behavior
dselect works

Actual behavior
dselect won't be found

Additional information

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Bump version v0.6.6

@JuliaRegistrator register

Bump version v0.6.5

@JuliaRegistrator register

lcsb-biocore / gigasom.jl Goto Github PK

gigasom.jl's People

Contributors

Stargazers

Watchers

Forkers

gigasom.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org