Giter Site home page Giter Site logo

bioimage-io / jdll Goto Github PK

View Code? Open in Web Editor NEW
27.0 27.0 6.0 3.39 MB

The Java library to run Deep Learning models

Home Page: https://github.com/bioimage-io/JDLL/wiki

License: Apache License 2.0

Java 93.81% Python 6.10% C 0.09%
bioimage-io deep-learning imglib2 java onnx pytorch tensorflow

jdll's People

Contributors

axtimwalde avatar carlosuc3m avatar constantinpape avatar ctrueden avatar dependabot[bot] avatar djpbarry avatar hinerm avatar noam-dori avatar stefanhahmann avatar stephane-d avatar tinevez avatar tpietzsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jdll's Issues

Backend of the model-runner tensors

Hello everyone,
In this issue I want to propose the library nd4j as the new backend for the model runner tensors, at least temporarily. As we discussed previously, for convenience the backend was going to be the NDArrays from DJL. However, I found that these NDArrays need an underlying native library (or 'engine' as they call it) to work. Currently there are only 2 engines with the capability of supporting NDArrays.

This limitation might suppose conflicts because the native library dor those two engines will have to be always loaded to use these NDArrays. This is why I would recommend using another backend. At the moment I have continued developing the library on another branch using another library, called Nd4j, as the backend.

This library works in a similar manner to the DJL. It uses again Java-cpp as the backend to load C++ native libraries (openblas for example) and its arrays (called INDArrays) have predetermined operations such as mean and allows accessing positions via indexing in a similar manner to numpy arrays.

On the other hand, the memory management of this INDArrays is not the best and we should be quite careful with it.

If you agree with this, I can merge the branch into the main one, and we can keep this solution at least temporarily.
I think that at some point we should also move away from this library because it is quite heavy and because of its memory management but I think that it is the fastest and simplest transition at the moment.

I also looked at a JNI for Numpy, which seems quite nice (also using Java-cpp) but it is almost like writing C++ in JAva. It is not simple at all.
Regards,
Carlos

@Stephane-D @tinevez @tomburke-rse @petebankhead @KateMoreva @xion16lm

Dependency artifact not found

Hi, I was trying to use the library as an external dependency. I followed the
instructions in the documentation. However, I encountered the error below:

Could not find artifact org.bioimageanalysis.icy:dl-model-runner:pom:1.0.0 in icy (https://icy-nexus.pasteur.fr/repository/Icy/)

I added several repositories, but the library was still not found. Is there a particular
repository that should be added? or is the library not available for the time being?

Confusion with axis order (F vs C order, ImgLib2 vs NumPy)

I think there is quite a bit of unnecessary back-and-forth transposition of axes currently.

Here is how it should work:

Assume Python wants a numpy.ndarray with axis CYX (in c order) and shape [2,3,4].
That is, 2 channels, height=3, width=4.
So in total 2 * 3 * 4 = 24 elements.

Let's make a flat buffer containing elements (0, 1, 2, ..., 23).
For Python, if we just reshape([2,3,4]) that, we get

[[[ 0.  1.  2.  3.]
  [ 4.  5.  6.  7.]
  [ 8.  9. 10. 11.]]

 [[12. 13. 14. 15.]
  [16. 17. 18. 19.]
  [20. 21. 22. 23.]]]

as desired.

For ImgLib2, the same flat buffer containing elements (0, 1, 2, ..., 23) wraps in f order with axis XYC and shape [4,3,2].

Translating that to EfficientSamJ:

SAM wants [3,h,w], so CYX in c order.
We could just wrap a flat array as XYC with dimensions [w,h,3] in f order in ImgLib2.
Then reshape the same flat array in python as [3,h,w] and be done.

Instead, currently:

  • We wrap the flat array as CYX (dimensions [3,h,w]) in f order in ImgLib2.
  • Then we use Views to transpose the axes to XYC (dimensions [w,h,3]) to be able to write to it "normally".
  • Because we wrapped as CYX in f order, that is then of course XYC (shape [w,h,3]) in c order in Python.
  • We have to do the np.transpose(im.astype('float32'), (2, 0, 1)) in order to pass it to pytorch.

We could avoid both the Views transposition and the np.transpose.

Also, in np.transpose(im.astype('float32'), (2, 0, 1)) the order (2,0,1) should be (2,1,0) probably.
I think we pass the images with X and Y axis flipped. SAM doesn't care of course. Probably there is a corresponding flip in the coordinates of the prompt point list etc (maybe even another explicit np.transpose?)

Consider to what extent JDLL should support (re)training

As of this writing, JDLL supports running models, but not training / retraining / fine tuning. There is an open question about whether it should do so in an engine-agnostic way, and if so, to what extent, and what sorts of training patterns and configurations to accommodate.

@axtimwalde suggests that there are some very common patterns that could be supported pretty easily.

My perspective is that we should start by having some Java-based plugins that need to do training/tuning use Appose directly to invoke the deep learning framework training API of their choice, and then once we have several such plugins doing this, scrutinize them for commonalities and consider what sorts of API might be worth generalizing into JDLL, if any. My intuition is that it will be rather diverse, and most/all plugins will not need such an engine-agnostic API from Java itself, but nonetheless, JDLL could provide at least a subset of training functionalities in an engine-agnostic way, if there is value in doing so.

Finally, there is also the bioimage-io engine that runs models on the server side as services, and we could simply say that if you want to do training, you should rely on that mechanism rather than running it locally on your own hardware (bioimage-io engine can run locally also, although it is heavier weight than using an in-process or interprocess/Appose-based approach, due to the Hypha server backend).

DecodeNumpy via BufferAccess classes.

Here's an alternate version of the DecodeNumpy.build.

https://github.com/bioimage-io/model-runner-java/blob/ea4bec4616d81ce63c3fb1ed4e6c13aeb0e4c53c/src/main/java/io/bioimage/modelrunner/numpy/DecodeNumpy.java#L248

This may require a version of ImgLib2 with imglib/imglib2#299 . The earliest such version would be imglib2-5.13.0.

Currently, pom-scijava is at imglib2-5.12.0 but @ctrueden mentioned he was interested in release a pom-scijava with a version bump to imglib2-5.13.0.

The main reason for this change is so that it does not copy the ByteBuffer but rather uses it directly.

public static <T extends NativeType<T>> Img<T> build(ByteBuffer buf, ByteOrder byteOrder, String dtype, long[] shape) throws IllegalArgumentException
    {
    	buf.order(byteOrder);
    	if (dtype.equals("byte")) {
    		ByteAccess access = new ByteBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.bytes( access, shape );
    	} else if (dtype.equals("ubyte")) {
    		ByteAccess access = new ByteBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.unsignedBytes( access, shape );
    	} else if (dtype.equals("int16")) {
    		ShortAccess access = new ShortBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.shorts( access, shape );
    	} else if (dtype.equals("uint16")) {
    		ShortAccess access = new ShortBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.unsignedShorts( access, shape );
    	} else if (dtype.equals("int32")) {
    		IntAccess access = new IntBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.ints( access, shape );
    	} else if (dtype.equals("uint32")) {
    		IntAccess access = new IntBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.unsignedInts( access, shape );
    	} else if (dtype.equals("int64")) {
    		LongAccess access = new LongBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.longs( access, shape );
    	} else if (dtype.equals("float32")) {
    		FloatAccess access = new FloatBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.floats( access, shape );
    	} else if (dtype.equals("float64")) {
    		DoubleAccess access = new DoubleBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.doubles( access, shape );
    	} else {
            throw new IllegalArgumentException("Unsupported tensor type: " + dtype);
    	}
    }

Clarify relations of the java libraries

Hi @carlosuc3m,
great to see all the progress of the java libraries!
For adoption, I think it would be very helpful to have some overview which libraries exist and how they relate to each other.
I have created a small PR #4 to add this to the README (as far as I have an overview of the libraries), but I have two more questions:

java runner in Fiji

Hi!

This issue is to follow the discussion with @uschmidt83 in issue stardist/stardist#68 and see if there's something that would be nice to consider for all the plugins.

I include @ivan-ea @carlosuc3m @lmoyasans and @cfusterbarcelo as they are working on the integration of the library in deepImageJ and how to deal with the dependencies inside Fiji.

  • The java library can deal with different versions of TF, PyTorch and ONNX. Our plan was to provide all of them in a specific folder inside Fiji.
  • The java library is getting integrated inside deepImageJ but most probably in the future, it will be given as a dependency in the Update Sites (similar to the TF manager).

Please, note we're trying to be a bit quick with this to update the paper status&rebuttal, but it would be nice if in a bit long term we could think about this carefully to make things clean and easier for all.

Easier configuration/running

So, I got pulled in JDLL by the spanish people on Prague

I was looking at the readme and the first thing I thought is that I might help you, folks, with some automatization/template ready to clone and start playing from there

It's based on Kotlin (and some or all in Gradle), I'm pretty confident I could provide something along these lines:

// 0. Setting Up JDLL
// no need, just clone the template repo

// 1. Downloading a model (optional)
downloadModel {
     // enum, statically typed
    model = Model.`B. Sutilist bacteria segmentation - Widefield microscopy - 2D UNet`
    // set with some default value, but customizable
    //dst = projectDir / "models"
}

// 2. Installing DL engines
// we may also implement all the necessary logic expressing the compatibility among 
// the different DL framework, OS and Arch, printing errors if incompatible or warnings
// if a best effort try is being made  
framework {
    // if engine, cpu and gpu are not specified, then 
    // `EngineInstall::installEnginesForModelByNameinDir` will be called
    // engine = Tensorflow.`2.0` // also enum, statically typed
    // cpu = true
    // gpu = true
    // set with some default value, but customizable
    installationDir = projectDir / "engines"
}
// will automatically failed if `!installed`

// 3. Creating the tensors
val img1 = model.create<FloatType>() // [1, 512, 512, 1] inferred from `model`
tensor {
     input = build(model.inputs.bxyc, img1) // "input_1" might be inferred
     outputEmpty = buildEmptyTensor(model.outputs.bxyc) // "conv2d_19" might be inferred
     outputBlankTensor = buildBlankTensor<FloatType>(model.outputs.bxyc) // [1, 512, 512, 3] inferred
}

// 4. Loading the model
dlEngine { // or dlCompatibleEngine {
    framework = TensorFlow.`2.7.0`
    cpu = true
    gpu = true
    // engineDir inferred
}

// the rest of the step can be created and executed automatically
// everything gets inferred:
// - model load
// - model run
// - cleanup

Following the Gradle philosophy of "convention over configuration", we could assume conventions over framework and have that step completely optional as well. Something similar for cpu/gpu=true

Can JDLL run in Termux/Lunix emulator

Can JJDLL run in the Android app known as Termux?

Termux is a Linux terminal emulator for Android but its file structure is not identical to Linux. Libraries writtwn in pure Java run fine in such environment and seeing JDLL is only 0.1% written in C, I was wondering if I could do some binary classification deep learning without using the 0.1% code in C? Will any and all AI/deep learning from JDLL require using the C code? I am indeed trying to write, compile and execute the code using the JDLL library on my Android device (ARM 32 bit processor).

Support choosing which GPU to use

As suggested by @axtimwalde at today's BDV/ImgLib2 meeting: it would be nice if JDLL had engine-agnostic API for choosing which GPU to use, when a machine has access to multiple GPU options.

Removing the depency to org.bioimageanalysis.icy

In the pom.xml there is a dependency to org.bioimageanalysis.icy. It is useless to require the download of the complete icy package. This Java shouldn't have dependency to any particular software. Same for the naming of the package of the Java class, it should be something org.bioimageio.

update developer guidelines for the library

Hi @carlosuc3m @Stephane-D @constantinpape @oeway @lmoyasans @ivan-ea @tinevez

As discussed in previous meetings, it would be nice to have some minimal information about how to use this library. Just so others can integrate it as a backend. So, nothing really fancy but that is enough to start playing with it. How do you feel about it? could there be a way in which we could help collaboratively?

I have created an entry in the BioImage Model Zoo documentation [Resources for developers/Java BioImageIO library] that will display the readme file of this repository as soon as the repo becomes public.

@fjug @arrmunoz @dasv74 @constantinpape

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.