Giter Site home page Giter Site logo

lcsb-biocore / gigasom.jl Goto Github PK

View Code? Open in Web Editor NEW
31.0 31.0 9.0 3.86 MB

Huge-scale, high-performance flow cytometry clustering in Julia

Home Page: http://git.io/GigaSOM.jl

License: Apache License 2.0

Shell 0.70% Julia 97.87% Dockerfile 0.24% TeX 1.19%
artifical-neural-network artificial-intelligence clustering clustering-methods cytof cytometry flow-cytometry huge-scale immunology large-scale mass-cytometry neural-networks self-organizing-map som

gigasom.jl's People

Contributors

exaexa avatar github-actions[bot] avatar juliatagbot avatar laurentheirendt avatar ohunewald avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gigasom.jl's Issues

remove deprecation warnings

These deprecation warnings are currently being thrown:

┌ Warning: `DataFrame(columns::AbstractMatrix)` is deprecated, use `DataFrame(columns, :auto)` instead.
│   caller = ip:0x0
└ @ Core :-1

Let's remove them 👍

Issue with parallel computation

The following statement is posted under "High-Level Overview" section of the doc:

image

And I tried with the following funciton:

function test()
    addprocs(4)
    d = [1 2 1 4 5;3 2 1 6 5;3 1 1 7 4]
    som = initGigaSOM(d, 3, 3)
    som = trainGigaSOM(som, d)
    mapToGigaSOM(som, d)
    e = embedGigaSOM(som,d)
end

And lead to a problem with "worker 2":

ERROR: On worker 2:
KeyError: key GigaSOM [a03a9c34-069e-5582-a11c-5c984cab887c] not found
getindex at .\dict.jl:467 [inlined]
root_module at .\loading.jl:968 [inlined]
deserialize_module at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:953
handle_deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:855
deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:773
deserialize_datatype at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:1251
handle_deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:826
deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:773
handle_deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:833
deserialize at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Serialization\src\Serialization.jl:773 [inlined]
deserialize_msg at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\messages.jl:99
#invokelatest#1 at .\essentials.jl:710 [inlined]
invokelatest at .\essentials.jl:709 [inlined]
message_handler_loop at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:185
process_tcp_streams at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:142
#99 at .\task.jl:356
Stacktrace:
 [1] #remotecall_fetch#143 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:394 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Distributed.RRID) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:386
 [3] #remotecall_fetch#146 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:421 [inlined]
 [4] remotecall_fetch at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:421 [inlined]
 [5] call_on_owner at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:494 [inlined]
 [6] fetch(::Future) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:533
 [7] distribute_array(::Symbol, ::Array{Float64,2}, ::Array{Int64,1}; dim::Int64) at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\base\distributed.jl:88
 [8] distribute_array at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\base\distributed.jl:78 [inlined]
 [9] trainGigaSOM(::Som, ::Array{Int64,2}; kernelFun::Function, metric::Distances.Euclidean, somDistFun::Function, knnTreeFun::Type{T} where T, rStart::Float64, rFinal::Float64, radiusFun::Function, epochs::Int64) at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\analysis\core.jl:159
 [10] trainGigaSOM at C:\Users\zli3\.julia\packages\GigaSOM\QfzjJ\src\analysis\core.jl:156 [inlined]
 [11] test() at D:\...\runSOM.jl:27
 [12] top-level scope at none:1

Any thoughts?

possible test failure in upcoming Julia version 1.5

A PkgEval run for a Julia pull request which changes the generated numbers for rand(a:b) indicates that the tests of this package might fail in Julia 1.5 (and on Julia current master branch).

Also, you might be interested in using the new StableRNGs.jl registered package, which provides guaranteed stable streams of random numbers across Julia releases.

Apologies if this is a false positive. Cf. https://github.com/JuliaCI/NanosoldierReports/blob/ab6676206b210325500b4f4619fa711f2d7429d2/pkgeval/by_hash/52c2272_vs_47c55db/logs/GigaSOM/1.5.0-DEV-87d2a04de3.log

Accessing the intermediate SOM states during the training

Is your feature request related to a problem? Please describe.
I trained my SOM for 2000 epochs, and would like to store intermediate results (each 500 epochs), something like:

datainfo = loadCSVSet(:test,files,header=false)
som = initGigaSOM(datainfo, 20, 20, seed=seed)
radius_list = [10, 8.9, 7.8, 6.7, 5.6, 4.5, 3.4, 2.3, 1.2, 0.5, 0.1]
for i in 1:10:
    som = trainGigaSOM(som, datainfo, rStart=radius_list[i], rFinal=radius_list[i+1], epochs=200, radiusFun=linearRadius)
    e = embedGigaSOM(som, datainfo)
    e2 = distributed_collect(e)
    writedlm(string("GigaSOM_iker_1400k_embed_seed",seed,"_epochs",epochs,".tsv"),e2,'\t')
    open(f -> serialize(f, som), ("partly_trained_%s.jls", i), "w");
end

I want to assess if I did enough training. Problem is that in with this strategy I can only use a linearRadius, or do a very ugly hack
inputing specific radius function.

Describe the solution you'd like
Perhaps one could input starting/ending epoch/iteration to the train function (here:

for j = 1:epochs
).

Describe alternatives you've considered
Allow to "do something" (call a function) each X epochs in order to serialize the som object, or save the coordinates.

Additional context
none

Adding CSV file import

Although GigaSOM was developed with its main focus on mass cytometry data, it could be used for any kind of multi-dimensional data clustering.

A commun file format as CSV would allow for other data to be imported.

support for different distance metrics

It'd be nice if there were better support for different distance metrics for training the SOM and mapping winners. It looks like this can be implemented easily by pass-through of a metric argument directly to the NearestNeighbors::BruteTree

clim kwarg in expressionPalette or expressionColors

I would like to more precisely control the dynamic range of the expressionPalette. For example my expressions range from -2 to 7, but the interesting region lies from -1 to 1. I want to adjust my colour range so any expression values outside of the limits [-1,1] are clipped to the boundary colours.

It is possible to to this by pre-processing the input data, which would re-calculate the inputs each time I change the colour range which doesn't seem to be necessary. It makes more sense to have to colour representation change, not the underlying data.

As you can imagine, this will eventually be used in an interactive figure :)

Missing `dselect` and other functions from `dataops.jl`

Versions after 0.6.8 have the majority of functions erased from dataops.jl

Any

Steps to reproduce

1. Install latest GigaSOM
2. Import distributed dataset
3. Run dselect

Expected behavior
dselect works

Actual behavior
dselect won't be found

Additional information

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.