bioimage-io / jdll Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 6.0 3.39 MB

The Java library to run Deep Learning models

Home Page: https://github.com/bioimage-io/JDLL/wiki

License: Apache License 2.0

Java 93.81% Python 6.10% C 0.09%

bioimage-io deep-learning imglib2 java onnx pytorch tensorflow

jdll's People

Contributors

Stargazers

Watchers

Forkers

tpietzsch ivan-ea axtimwalde stefanhahmann noam-dori djpbarry

jdll's Issues

Backend of the model-runner tensors

Hello everyone,
In this issue I want to propose the library nd4j as the new backend for the model runner tensors, at least temporarily. As we discussed previously, for convenience the backend was going to be the NDArrays from DJL. However, I found that these NDArrays need an underlying native library (or 'engine' as they call it) to work. Currently there are only 2 engines with the capability of supporting NDArrays.

This limitation might suppose conflicts because the native library dor those two engines will have to be always loaded to use these NDArrays. This is why I would recommend using another backend. At the moment I have continued developing the library on another branch using another library, called Nd4j, as the backend.

This library works in a similar manner to the DJL. It uses again Java-cpp as the backend to load C++ native libraries (openblas for example) and its arrays (called INDArrays) have predetermined operations such as mean and allows accessing positions via indexing in a similar manner to numpy arrays.

On the other hand, the memory management of this INDArrays is not the best and we should be quite careful with it.

If you agree with this, I can merge the branch into the main one, and we can keep this solution at least temporarily.
I think that at some point we should also move away from this library because it is quite heavy and because of its memory management but I think that it is the fastest and simplest transition at the moment.

I also looked at a JNI for Numpy, which seems quite nice (also using Java-cpp) but it is almost like writing C++ in JAva. It is not simple at all.
Regards,
Carlos

@Stephane-D @tinevez @tomburke-rse @petebankhead @KateMoreva @xion16lm

Tf 1 does not work on Apple M1

Tf 1 crashes the JVM when trying to use it on Apple M1.
Try using the workaround proposed in:
tensorflow/java#394
To see if it helps

Dependency artifact not found

Hi, I was trying to use the library as an external dependency. I followed the
instructions in the documentation. However, I encountered the error below:

Could not find artifact org.bioimageanalysis.icy:dl-model-runner:pom:1.0.0 in icy (https://icy-nexus.pasteur.fr/repository/Icy/)

I added several repositories, but the library was still not found. Is there a particular
repository that should be added? or is the library not available for the time being?

Confusion with axis order (F vs C order, ImgLib2 vs NumPy)

I think there is quite a bit of unnecessary back-and-forth transposition of axes currently.

Here is how it should work:

Assume Python wants a numpy.ndarray with axis CYX (in c order) and shape [2,3,4].
That is, 2 channels, height=3, width=4.
So in total 2 * 3 * 4 = 24 elements.

Let's make a flat buffer containing elements (0, 1, 2, ..., 23).
For Python, if we just reshape([2,3,4]) that, we get

[[[ 0.  1.  2.  3.]
  [ 4.  5.  6.  7.]
  [ 8.  9. 10. 11.]]

 [[12. 13. 14. 15.]
  [16. 17. 18. 19.]
  [20. 21. 22. 23.]]]

as desired.

For ImgLib2, the same flat buffer containing elements (0, 1, 2, ..., 23) wraps in f order with axis XYC and shape [4,3,2].

Translating that to EfficientSamJ:

SAM wants [3,h,w], so CYX in c order.
We could just wrap a flat array as XYC with dimensions [w,h,3] in f order in ImgLib2.
Then reshape the same flat array in python as [3,h,w] and be done.

Instead, currently:

We wrap the flat array as CYX (dimensions [3,h,w]) in f order in ImgLib2.
Then we use Views to transpose the axes to XYC (dimensions [w,h,3]) to be able to write to it "normally".
Because we wrapped as CYX in f order, that is then of course XYC (shape [w,h,3]) in c order in Python.
We have to do the np.transpose(im.astype('float32'), (2, 0, 1)) in order to pass it to pytorch.

We could avoid both the Views transposition and the np.transpose.

Also, in np.transpose(im.astype('float32'), (2, 0, 1)) the order (2,0,1) should be (2,1,0) probably.
I think we pass the images with X and Y axis flipped. SAM doesn't care of course. Probably there is a corresponding flip in the coordinates of the prompt point list etc (maybe even another explicit np.transpose?)

couldn't find the JDLL plug in for FIJI and icy

Hi, I can't find the icy online plugin for JDLL. So the same as FIJI, I download 64bit newest version for Fiji

Consider to what extent JDLL should support (re)training

As of this writing, JDLL supports running models, but not training / retraining / fine tuning. There is an open question about whether it should do so in an engine-agnostic way, and if so, to what extent, and what sorts of training patterns and configurations to accommodate.

@axtimwalde suggests that there are some very common patterns that could be supported pretty easily.

My perspective is that we should start by having some Java-based plugins that need to do training/tuning use Appose directly to invoke the deep learning framework training API of their choice, and then once we have several such plugins doing this, scrutinize them for commonalities and consider what sorts of API might be worth generalizing into JDLL, if any. My intuition is that it will be rather diverse, and most/all plugins will not need such an engine-agnostic API from Java itself, but nonetheless, JDLL could provide at least a subset of training functionalities in an engine-agnostic way, if there is value in doing so.

Finally, there is also the bioimage-io engine that runs models on the server side as services, and we could simply say that if you want to do training, you should rely on that mechanism rather than running it locally on your own hardware (bioimage-io engine can run locally also, although it is heavier weight than using an in-process or interprocess/Appose-based approach, due to the Hypha server backend).

DecodeNumpy via BufferAccess classes.

Here's an alternate version of the DecodeNumpy.build.

https://github.com/bioimage-io/model-runner-java/blob/ea4bec4616d81ce63c3fb1ed4e6c13aeb0e4c53c/src/main/java/io/bioimage/modelrunner/numpy/DecodeNumpy.java#L248

This may require a version of ImgLib2 with imglib/imglib2#299 . The earliest such version would be imglib2-5.13.0.

Currently, pom-scijava is at imglib2-5.12.0 but @ctrueden mentioned he was interested in release a pom-scijava with a version bump to imglib2-5.13.0.

The main reason for this change is so that it does not copy the ByteBuffer but rather uses it directly.

public static <T extends NativeType<T>> Img<T> build(ByteBuffer buf, ByteOrder byteOrder, String dtype, long[] shape) throws IllegalArgumentException
    {
    	buf.order(byteOrder);
    	if (dtype.equals("byte")) {
    		ByteAccess access = new ByteBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.bytes( access, shape );
    	} else if (dtype.equals("ubyte")) {
    		ByteAccess access = new ByteBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.unsignedBytes( access, shape );
    	} else if (dtype.equals("int16")) {
    		ShortAccess access = new ShortBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.shorts( access, shape );
    	} else if (dtype.equals("uint16")) {
    		ShortAccess access = new ShortBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.unsignedShorts( access, shape );
    	} else if (dtype.equals("int32")) {
    		IntAccess access = new IntBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.ints( access, shape );
    	} else if (dtype.equals("uint32")) {
    		IntAccess access = new IntBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.unsignedInts( access, shape );
    	} else if (dtype.equals("int64")) {
    		LongAccess access = new LongBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.longs( access, shape );
    	} else if (dtype.equals("float32")) {
    		FloatAccess access = new FloatBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.floats( access, shape );
    	} else if (dtype.equals("float64")) {
    		DoubleAccess access = new DoubleBufferAccess(buf, true);
    		return (Img<T>) ArrayImgs.doubles( access, shape );
    	} else {
            throw new IllegalArgumentException("Unsupported tensor type: " + dtype);
    	}
    }

Create Java object from model.yaml

A ModelInfo class created from the model.yaml file would be very useful!

Clarify relations of the java libraries

Hi @carlosuc3m,
great to see all the progress of the java libraries!
For adoption, I think it would be very helpful to have some overview which libraries exist and how they relate to each other.
I have created a small PR #4 to add this to the README (as far as I have an overview of the libraries), but I have two more questions:

Is there any dependency on https://github.com/bioimage-io/core-bioimage-io-python left or are the new java libraries fully independent of it?
What is the role of https://github.com/bioimage-io/tensorflow-2-java-interface-0.2.0?

java runner in Fiji

Hi!

This issue is to follow the discussion with @uschmidt83 in issue stardist/stardist#68 and see if there's something that would be nice to consider for all the plugins.

I include @ivan-ea @carlosuc3m @lmoyasans and @cfusterbarcelo as they are working on the integration of the library in deepImageJ and how to deal with the dependencies inside Fiji.

The java library can deal with different versions of TF, PyTorch and ONNX. Our plan was to provide all of them in a specific folder inside Fiji.
The java library is getting integrated inside deepImageJ but most probably in the future, it will be given as a dependency in the Update Sites (similar to the TF manager).

Please, note we're trying to be a bit quick with this to update the paper status&rebuttal, but it would be nice if in a bit long term we could think about this carefully to make things clean and easier for all.

Make output tensors accept pre-allocated data

Make it possible to build the output Tensor with a pre-allocated RandomAccessibleInteral

Easier configuration/running

So, I got pulled in JDLL by the spanish people on Prague

I was looking at the readme and the first thing I thought is that I might help you, folks, with some automatization/template ready to clone and start playing from there

It's based on Kotlin (and some or all in Gradle), I'm pretty confident I could provide something along these lines:

// 0. Setting Up JDLL
// no need, just clone the template repo

// 1. Downloading a model (optional)
downloadModel {
     // enum, statically typed
    model = Model.`B. Sutilist bacteria segmentation - Widefield microscopy - 2D UNet`
    // set with some default value, but customizable
    //dst = projectDir / "models"
}

// 2. Installing DL engines
// we may also implement all the necessary logic expressing the compatibility among 
// the different DL framework, OS and Arch, printing errors if incompatible or warnings
// if a best effort try is being made  
framework {
    // if engine, cpu and gpu are not specified, then 
    // `EngineInstall::installEnginesForModelByNameinDir` will be called
    // engine = Tensorflow.`2.0` // also enum, statically typed
    // cpu = true
    // gpu = true
    // set with some default value, but customizable
    installationDir = projectDir / "engines"
}
// will automatically failed if `!installed`

// 3. Creating the tensors
val img1 = model.create<FloatType>() // [1, 512, 512, 1] inferred from `model`
tensor {
     input = build(model.inputs.bxyc, img1) // "input_1" might be inferred
     outputEmpty = buildEmptyTensor(model.outputs.bxyc) // "conv2d_19" might be inferred
     outputBlankTensor = buildBlankTensor<FloatType>(model.outputs.bxyc) // [1, 512, 512, 3] inferred
}

// 4. Loading the model
dlEngine { // or dlCompatibleEngine {
    framework = TensorFlow.`2.7.0`
    cpu = true
    gpu = true
    // engineDir inferred
}

// the rest of the step can be created and executed automatically
// everything gets inferred:
// - model load
// - model run
// - cleanup

Following the Gradle philosophy of "convention over configuration", we could assume conventions over framework and have that step completely optional as well. Something similar for cpu/gpu=true

Improve memory efficiency of the ScaleRangeTransformation

Improve the memory management of said transformation. At the moment it seems that some memory is not correcly deleted, it seems to be related to the obtention of max and min values using the methods provided by Nd4j library.

https://github.com/bioimage-io/model-runner-java/blob/8a38d3ab9ec4fcf0a3620a3cf4ffe7952731a785/src/main/java/org/bioimageanalysis/icy/deeplearning/transformations/ScaleRangeTransformation.java#L105

Can JDLL run in Termux/Lunix emulator

Can JJDLL run in the Android app known as Termux?

Termux is a Linux terminal emulator for Android but its file structure is not identical to Linux. Libraries writtwn in pure Java run fine in such environment and seeing JDLL is only 0.1% written in C, I was wondering if I could do some binary classification deep learning without using the 0.1% code in C? Will any and all AI/deep learning from JDLL require using the C code? I am indeed trying to write, compile and execute the code using the JDLL library on my Android device (ARM 32 bit processor).

Support choosing which GPU to use

As suggested by @axtimwalde at today's BDV/ImgLib2 meeting: it would be nice if JDLL had engine-agnostic API for choosing which GPU to use, when a machine has access to multiple GPU options.

Make it easier to configure the Tensors

Once we have the ModelInfo class (#8) we can use this to automatically configure the input and output Tensors (at least partially).

Removing the depency to org.bioimageanalysis.icy

In the pom.xml there is a dependency to org.bioimageanalysis.icy. It is useless to require the download of the complete icy package. This Java shouldn't have dependency to any particular software. Same for the naming of the package of the Java class, it should be something org.bioimageio.

update developer guidelines for the library

Hi @carlosuc3m @Stephane-D @constantinpape @oeway @lmoyasans @ivan-ea @tinevez

As discussed in previous meetings, it would be nice to have some minimal information about how to use this library. Just so others can integrate it as a backend. So, nothing really fancy but that is enough to start playing with it. How do you feel about it? could there be a way in which we could help collaboratively?

I have created an entry in the BioImage Model Zoo documentation [Resources for developers/Java BioImageIO library] that will display the readme file of this repository as soon as the repo becomes public.

@fjug @arrmunoz @dasv74 @constantinpape