Giter Site home page Giter Site logo

Comments (8)

cocoa-xu avatar cocoa-xu commented on May 30, 2024

Could be a separate library? Perhaps yes. But since nx is one of the dependencies of this project, it should be fine if I do it here.

from evision.

cocoa-xu avatar cocoa-xu commented on May 30, 2024

To get an ROI of an image, we may use Access behaviour, the code will be somewhat similar to the following

img = Evision.imread!("image.jpg")
img[{10..30, 10..30, :all}]

from evision.

cocoa-xu avatar cocoa-xu commented on May 30, 2024

Checklist:

Required callbacks

  • constant
  • from_binary
  • eye
  • iota
  • random_uniform
  • random_normal
  • backend_deallocate
  • backend_copy
  • backend_transfer
  • to_batched
  • to_binary
  • inspect
  • as_type
  • bitcast
  • reshape
  • squeeze
  • broadcast
  • transpose
  • pad
  • reverse
  • dot
  • clip
  • slice
  • put_slice
  • take
  • take_along_axis
  • gather
  • concatenate
  • select
  • conv
  • all
  • any
  • sum
  • product
  • reduce_max
  • reduce_min
  • argmax
  • argmin
  • reduce
  • window_reduce
  • window_sum
  • window_product
  • window_max
  • window_min
  • map
  • sort
  • argsort
  • window_scatter_max
  • window_scatter_min
  • indexed_add
  • cholesky
  • lu
  • qr
  • triangular_solve
  • eigh
  • svd
  • add
  • subtract
  • multiply
  • power
  • remainder
  • divide
  • atan2
  • min
  • max
  • quotient
  • bitwise_and
  • bitwise_or
  • bitwise_xor
  • left_shift
  • right_shift
  • equal
  • not_equal
  • greater
  • less
  • greater_equal
  • less_equal
  • logical_and
  • logical_or
  • logical_xor
  • abs
  • bitwise_not
  • ceil
  • conjugate
  • floor
  • negate
  • round
  • sign
  • count_leading_zeros
  • population_count
  • real
  • imag

Optional callbacks

  • optional
  • solve
  • determinant
  • logical_not
  • cumulative_sum
  • cumulative_product
  • cumulative_min
  • cumulative_max

from evision.

vans163 avatar vans163 commented on May 30, 2024

To get an ROI of an image, we may use Access behaviour, the code will be somewhat similar to the following

img = Evision.imread!("image.jpg")
img[{10..30, 10..30, :all}]

I dont mind having a sane API. I feel python goes off the rails with their syntax and article explaining just that https://www.cigrainger.com/introducing-explorer/

By sane I mean

Evision.crop(img, %{x_begin: 10, x_end: 30, y_begin: 10, y_end: 30})

#(maybe instead of crop it could be called a generic mutate, since it will probably need to incorporate stride and other fun stuff)

from evision.

cocoa-xu avatar cocoa-xu commented on May 30, 2024

@vans163 thanks for the suggestion :) I agree that evision should have such kind of helper functions. I plan to put them in a dedicated module, maybe Evision.Mat.Image. PRs and/or ideas are welcomed!

from evision.

cocoa-xu avatar cocoa-xu commented on May 30, 2024

OpenCV does not support the following types

  • :s64
  • :u32
  • :u64

(Although it's possible to store values with those types using custom types, the resulting Mat/tensor will be incompatible with most existing functions in OpenCV)

The type inference function, Nx.Type.infer/1, in Nx.Backend returns {:s, 64} for integers by default. I wonder if it's possible to add an optional callback where the backend implementation can report what types are supported (or simply use the infer/1 function if it presents in a custom backend)?

cc @josevalim What do you think? If this sounds good, I can open a PR for this :)

from evision.

josevalim avatar josevalim commented on May 30, 2024

Unfortunately I think this won't be enough. :( For example, inside defn, we will automatically cast an int to s64 inside defn and by the time we execute defn, we don't know the compiler/backend yet. There is a rewrite_types functionality but that will push concern to the users.

I think the best option for now is for you to simply treat s64 as s32 and document that the maximum precision is s32, so everything gets downcast. I would perhaps raise for u32/u64 though.

from evision.

cocoa-xu avatar cocoa-xu commented on May 30, 2024

For the :u32, :u64 and :s64 types, I looked into OpenCV's source code and I found that it's much more difficult than I thought to have full support for these types. I'll document all my findings here (will use :s64 for all the examples below).

1. cv::Mat_<_Tp>

At the first glance, it's possible to instantiate an int64_t-type cv::Mat by using the template matrix class cv::Mat_<_Tp> with _Tp=int64_t.

template<typename _Tp> class Mat_ : public Mat
{
public:
    typedef _Tp value_type;
    typedef typename DataType<_Tp>::channel_type channel_type;
    typedef MatIterator_<_Tp> iterator;
    typedef MatConstIterator_<_Tp> const_iterator;
    // ... skipped
}

2. cv::DataType<_Tp>

In order to achive that, we need to have the corresponding specialized template class for DataType<_Tp>.

#define CV_64S 8

namespace cv {
template<> class DataType<int64_t>
{
public:
    typedef int64_t     value_type;
    typedef int64_t     work_type;
    typedef value_type  channel_type;
    typedef value_type  vec_type;
    enum { generic_type = 0,
           depth        = CV_64S,
           channels     = 1,
           fmt          = (int)'I',
           type         = CV_MAKETYPE(depth, channels)
    };
};
}

3. Adding the custom CV_64S macro

Of course, the CV_64S macro (shown in the code above) does not exist in OpenCV's source code (as of OpenCV 4.6.0), which should be defined by us.

Exisiting types (like CV_8U) are defined in modules/core/include/opencv2/core/hal/interface.h:

#define CV_CN_MAX     512
#define CV_CN_SHIFT   3
#define CV_DEPTH_MAX  (1 << CV_CN_SHIFT)

#define CV_8U   0
#define CV_8S   1
#define CV_16U  2
#define CV_16S  3
#define CV_32S  4
#define CV_32F  5
#define CV_64F  6
#define CV_16F  7

Here we see the first hard-coded thing: #define CV_CN_SHIFT 3. Because OpenCV has 8 pre-defined types, it uses the least-significant 3 bits in the cv::Mat's 32-bit flags member.

Bit 31-3 2-0
DDD
MSB                                 LSB
31............................| 2...0 |
|.............................| depth |
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxx| DDD   |

The other hard-coded thing is #define CV_CN_MAX 512, and 512 = 1 << 9, therefore, cv::Mat's channel information is stored from bit 3 to bit 11.

Bit 31-3 11-3 2-0
CCCCCCCCC DDD
MSB                                    LSB
31...................| 11......3 | 2...0 |
|....................| channels  | depth |
|xxxxxxxxxxxxxxxxxxxx| CCCCCCCCC | DDD   |

The channels infomation in the cv::Mat is used by some OpenCV functions (via CV_MAT_CN(mat.type())) for some sanity checks, for example, some functions that only works with 3-channel 2D images.

Now, let's suppose that we agreed we can reduce the number of bits for channels from 9 to 8, and use that saved 1 bit for depth:

Bit 31-3 11-4 3-0
CCCCCCCC DDDD
MSB                                    LSB
31...................| 11.....4 | 3...0 |
|....................| channels | depth |
|xxxxxxxxxxxxxxxxxxxx| CCCCCCCC | DDDD  |

Then we can make the following modifications to that header file

#define CV_CN_MAX     256
#define CV_CN_SHIFT   4
#define CV_DEPTH_MAX  (1 << CV_CN_SHIFT)

#define CV_8U   0
#define CV_8S   1
#define CV_16U  2
#define CV_16S  3
#define CV_32S  4
#define CV_32F  5
#define CV_64F  6
#define CV_16F  7
// add the custom `CV_64S` macro
#define CV_64S  8
// and since now we can have up to $2^4=16$ types
// so it's possible to add `CV_64U` (`:u64`) and `CV_32U` (`:u32`) as well
#define CV_64U  9
#define CV_32U  10

On the surface, this looks pretty legit, and in fact, if you made all the mentioned changes to OpenCV, you can compile a cv::Mat that is initialized with CV_64S as its type.

#include <iostream>
#include <opencv2/opencv.hpp>
#include <vector>

using namespace cv;

template <typename T, typename AS=T>
void print_data(cv::Mat& mat, const char * name) {
    for (int i = 0; i < 3; i++) {
        std::cout << name << '[' << i << "]: " << (AS)mat.template at<T>(i) << '\n';
    }
    std::cout << '\n';
}

int main() {
    std::vector<int64_t> data1 = {INT64_MAX, INT64_MAX - 1, INT64_MAX - 2};
    std::vector<int64_t> data2 = {0, 1, 2};
    std::vector<int> as_shape = {1, 1, 3};
    cv::Mat mat1((int)as_shape.size(), as_shape.data(), CV_64S, data1.data());
    cv::Mat mat2((int)as_shape.size(), as_shape.data(), CV_64S, data2.data());

    print_data<uint64_t>(mat1, "mat1");
    print_data<uint64_t>(mat2, "mat2");
}

The output is

mat1[0]: 9223372036854775807
mat1[1]: 9223372036854775806
mat1[2]: 9223372036854775805

mat2[0]: 0
mat2[1]: 1
mat2[2]: 2

4. The magic number -- 0x28442211

However, once we try to do some operation on them, even the simplest one, like adding two matrices, we would get an incorrect result:

int main() {
    // ... skipped
    print_data<uint64_t>(mat2, "mat2");

    // add `mat1` and `mat2`
    auto mat3 = cv::Mat(mat1 + mat2);
    print_data<uint64_t>(mat3, "mat3");
}

The output is:

mat1[0]: 9223372036854775807
mat1[1]: 9223372036854775806
mat1[2]: 9223372036854775805

mat2[0]: 0
mat2[1]: 1
mat2[2]: 2

mat3[0]: 16777215
mat3[1]: 0
mat3[2]: 0

Obviously, we got some wrong numbers. But we do have some clues from the value 16777215, which is $2^{24}-1$, or in other words, 0xFF_FF_FF.

This means somewhere deep inside OpenCV, it still thinks that these matrices are some other type instead of CV_64S.

After a quick grep in the OpenCV's code base, the following lines in particular drew my attention (in modules/core/include/opencv2/core/cvdef.h):

/** Size of each channel item,
   0x28442211 = 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem) */
#define CV_ELEM_SIZE1(type) ((0x28442211 >> CV_MAT_DEPTH(type)*4) & 15)

It's a pretty compact way to store the size info of all 8 data types into a single 32-bit integer.

// LSB
// 0001
#define CV_8U   0
// 0001
#define CV_8S   1
// 0010
#define CV_16U  2
// 0010
#define CV_16S  3
// 0100
#define CV_32S  4
// 0100
#define CV_32F  5
// 1000
#define CV_64F  6
// MSB
// 0010
#define CV_16F  7

I would probably do the same thing if I knew that my library would only deal with 8 data types.

Nevertheless, for this line, it's still relatively simple to change it so that it fits our needs.

As a reminder, we've added 3 types after exisiting ones,

// add the custom `CV_64S` macro
#define CV_64S  8
// and since now we can have up to $2^4=16$ types
// so it's possible to add `CV_64U` (`:u64`) and `CV_32U` (`:u32`) as well
#define CV_64U  9
#define CV_32U  10

Hence we should prepend 3 4-bit size info to this magic number

/** Original
   0x28442211 = 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem) 

   Size of each channel item (new),
   0x48828442211 = 0100 1000 1000 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem) 

    MSB
    0100 - CV_32U
    1000 - CV_64U
    1000 - CV_64S
    ...
    LSB
*/

#define CV_ELEM_SIZE1(type) (int)((0x48828442211 >> CV_MAT_DEPTH(type)*4) & 15)

5. More changes needed, but does it worth the effort?

Well, it would be a happy ending if it worked after all the patches above, but I found more hard-coded things in OpenCV's code base, for example, this data conversion function in modules/core/src/matrix_sparse.cpp

static ConvertData getConvertElem(int fromType, int toType)
{
    static ConvertData tab[][8] =
    {{ convertData_<uchar, uchar>, convertData_<uchar, schar>,
      convertData_<uchar, ushort>, convertData_<uchar, short>,
      convertData_<uchar, int>, convertData_<uchar, float>,
      convertData_<uchar, double>, 0 },

    { convertData_<schar, uchar>, convertData_<schar, schar>,
      convertData_<schar, ushort>, convertData_<schar, short>,
      convertData_<schar, int>, convertData_<schar, float>,
      convertData_<schar, double>, 0 },

    { convertData_<ushort, uchar>, convertData_<ushort, schar>,
      convertData_<ushort, ushort>, convertData_<ushort, short>,
      convertData_<ushort, int>, convertData_<ushort, float>,
      convertData_<ushort, double>, 0 },

    { convertData_<short, uchar>, convertData_<short, schar>,
      convertData_<short, ushort>, convertData_<short, short>,
      convertData_<short, int>, convertData_<short, float>,
      convertData_<short, double>, 0 },

    { convertData_<int, uchar>, convertData_<int, schar>,
      convertData_<int, ushort>, convertData_<int, short>,
      convertData_<int, int>, convertData_<int, float>,
      convertData_<int, double>, 0 },

    { convertData_<float, uchar>, convertData_<float, schar>,
      convertData_<float, ushort>, convertData_<float, short>,
      convertData_<float, int>, convertData_<float, float>,
      convertData_<float, double>, 0 },

    { convertData_<double, uchar>, convertData_<double, schar>,
      convertData_<double, ushort>, convertData_<double, short>,
      convertData_<double, int>, convertData_<double, float>,
      convertData_<double, double>, 0 },

    { 0, 0, 0, 0, 0, 0, 0, 0 }};

    ConvertData func = tab[CV_MAT_DEPTH(fromType)][CV_MAT_DEPTH(toType)];
    CV_Assert( func != 0 );
    return func;
}

Again, it's not hard to add a few specialized template functions of convertData_. The core issue here from my perspective is -- does it worth all the effort?

The reasons why I hesitate to go further are that:

  1. Even if I managed to find all the hard-coded lines (relevant ones) and patched them correctly, we would only get limited operations from OpenCV that are available to these added types.

  2. The raw_type in Evision.Mat (or the value of int cv::Mat type()) will be totally different than the ones returned from the official build.

    It doesn't seem to be a huge problem at the first glance, however, OpenCV does have the functionality to persist/serialise cv::Mat to disk. Therefore, if one tries to load the serialised data which was generated by the original code, it would fail or return wrong data.

    Simply put, the header part of the serialised data will be different because we changed the what the underlying bits represent in cv::Mat's flags member.

  3. Even if we somehow managed to recognise if the serialised data was produced by the modified code or the original one, the amount of patches together with all the python code in this project would make it even harder for anyone who's willing to contribute ti this project.

  4. It's possible to submit all the patches to the upsteam (OpenCV), yet I personally highly doubt that if they would accept the PR because

    1. all the compatibilities issues (as in 2.);
    2. these types are not often used in computer vision (otherwise OpenCV would have supported these types in the first place).
  5. Even if they were willing to add these types, these new types would not be available until the next major update (OpenCV 5.0) because of these compatibilities issues.

    For example, CV_USRTYPE1 was available in OpenCV 3.x, and OpenCV decided to replace CV_USRTYPE1 with CV_16F (half-precision float). But they had to do that in a major update, i.e., OpenCV 4.0.

    // modules/core/include/opencv2/core/hal/interface.h
    
    // in OpenCV 3.x
    #define CV_8U   0
    #define CV_8S   1
    #define CV_16U  2
    #define CV_16S  3
    #define CV_32S  4
    #define CV_32F  5
    #define CV_64F  6
    #define CV_USERTYPE1  7
    
    // in OpenCV 4.x
    #define CV_USRTYPE1 (void)"CV_USRTYPE1 support has been dropped in OpenCV 4.0"
    
    #define CV_8U   0
    #define CV_8S   1
    #define CV_16U  2
    #define CV_16S  3
    #define CV_32S  4
    #define CV_32F  5
    #define CV_64F  6
    #define CV_16F  7

6. Any workarounds?

There are two workarounds that I can think of at the moment, and they all have different trade-offs.

a. Map these types to some other types

It's possible set a map for those unsupported types in the config.exs file.

config :evision, unsupported_type_map: %{
  {:s, 64} => {:f, 64},
  {:u, 64} => {:f, 64},
  {:u, 32} => {:f, 32}
}

The above config would map :s64 and :u64 to :f64, and map :u32 to :f32. And the very first drawback is that it would be a totally different type. Secondly, value-wise :f64 does not cover every single possible value of :u64 or :s64.

The 64-bit double (assuming using the IEEE 754 standard) can has 52 bits of mantissa, so the largest integer you can store in a double without losing precision is $2^{53}-1$. (reference: stackoverflow)

b. Use other Nx backends

i) :nx

Nx.BinaryBackend is implemented in pure Elixir, and :nx is a dependency of this library, so you can use it out-of-box. However, Nx.BinaryBackend could be really slow if you have a relatively large martix.

ii) :torchx

Torchx.Backend is another Nx backend and it uses libtorch. Very fast and superb library, but the official prebuilt binaries of libtorch only support x86_64 CPUs (and Apple Silicon (aarch64-apple-darwin) via brew).

from evision.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.