Comments (8)
Could be a separate library? Perhaps yes. But since nx is one of the dependencies of this project, it should be fine if I do it here.
from evision.
To get an ROI of an image, we may use Access
behaviour, the code will be somewhat similar to the following
img = Evision.imread!("image.jpg")
img[{10..30, 10..30, :all}]
from evision.
Checklist:
Required callbacks
- constant
- from_binary
- eye
- iota
- random_uniform
- random_normal
- backend_deallocate
- backend_copy
- backend_transfer
- to_batched
- to_binary
- inspect
- as_type
- bitcast
- reshape
- squeeze
- broadcast
- transpose
- pad
- reverse
- dot
- clip
- slice
- put_slice
- take
- take_along_axis
- gather
- concatenate
- select
- conv
- all
- any
- sum
- product
- reduce_max
- reduce_min
- argmax
- argmin
- reduce
- window_reduce
- window_sum
- window_product
- window_max
- window_min
- map
- sort
- argsort
- window_scatter_max
- window_scatter_min
- indexed_add
- cholesky
- lu
- qr
- triangular_solve
- eigh
- svd
- add
- subtract
- multiply
- power
- remainder
- divide
- atan2
- min
- max
- quotient
- bitwise_and
- bitwise_or
- bitwise_xor
- left_shift
- right_shift
- equal
- not_equal
- greater
- less
- greater_equal
- less_equal
- logical_and
- logical_or
- logical_xor
- abs
- bitwise_not
- ceil
- conjugate
- floor
- negate
- round
- sign
- count_leading_zeros
- population_count
- real
- imag
Optional callbacks
- optional
- solve
- determinant
- logical_not
- cumulative_sum
- cumulative_product
- cumulative_min
- cumulative_max
from evision.
To get an ROI of an image, we may use
Access
behaviour, the code will be somewhat similar to the followingimg = Evision.imread!("image.jpg") img[{10..30, 10..30, :all}]
I dont mind having a sane API. I feel python goes off the rails with their syntax and article explaining just that https://www.cigrainger.com/introducing-explorer/
By sane I mean
Evision.crop(img, %{x_begin: 10, x_end: 30, y_begin: 10, y_end: 30})
#(maybe instead of crop it could be called a generic mutate, since it will probably need to incorporate stride and other fun stuff)
from evision.
@vans163 thanks for the suggestion :) I agree that evision should have such kind of helper functions. I plan to put them in a dedicated module, maybe Evision.Mat.Image
. PRs and/or ideas are welcomed!
from evision.
OpenCV does not support the following types
:s64
:u32
:u64
(Although it's possible to store values with those types using custom types, the resulting Mat/tensor
will be incompatible with most existing functions in OpenCV)
The type inference function, Nx.Type.infer/1, in Nx.Backend
returns {:s, 64}
for integers by default. I wonder if it's possible to add an optional callback where the backend implementation can report what types are supported (or simply use the infer/1
function if it presents in a custom backend)?
cc @josevalim What do you think? If this sounds good, I can open a PR for this :)
from evision.
Unfortunately I think this won't be enough. :( For example, inside defn
, we will automatically cast an int to s64 inside defn
and by the time we execute defn
, we don't know the compiler/backend yet. There is a rewrite_types
functionality but that will push concern to the users.
I think the best option for now is for you to simply treat s64 as s32 and document that the maximum precision is s32, so everything gets downcast. I would perhaps raise for u32/u64 though.
from evision.
For the :u32
, :u64
and :s64
types, I looked into OpenCV's source code and I found that it's much more difficult than I thought to have full support for these types. I'll document all my findings here (will use :s64
for all the examples below).
1. cv::Mat_<_Tp>
At the first glance, it's possible to instantiate an int64_t
-type cv::Mat
by using the template matrix class cv::Mat_<_Tp>
with _Tp=int64_t
.
template<typename _Tp> class Mat_ : public Mat
{
public:
typedef _Tp value_type;
typedef typename DataType<_Tp>::channel_type channel_type;
typedef MatIterator_<_Tp> iterator;
typedef MatConstIterator_<_Tp> const_iterator;
// ... skipped
}
2. cv::DataType<_Tp>
In order to achive that, we need to have the corresponding specialized template class for DataType<_Tp>
.
#define CV_64S 8
namespace cv {
template<> class DataType<int64_t>
{
public:
typedef int64_t value_type;
typedef int64_t work_type;
typedef value_type channel_type;
typedef value_type vec_type;
enum { generic_type = 0,
depth = CV_64S,
channels = 1,
fmt = (int)'I',
type = CV_MAKETYPE(depth, channels)
};
};
}
3. Adding the custom CV_64S
macro
Of course, the CV_64S
macro (shown in the code above) does not exist in OpenCV's source code (as of OpenCV 4.6.0), which should be defined by us.
Exisiting types (like CV_8U
) are defined in modules/core/include/opencv2/core/hal/interface.h
:
#define CV_CN_MAX 512
#define CV_CN_SHIFT 3
#define CV_DEPTH_MAX (1 << CV_CN_SHIFT)
#define CV_8U 0
#define CV_8S 1
#define CV_16U 2
#define CV_16S 3
#define CV_32S 4
#define CV_32F 5
#define CV_64F 6
#define CV_16F 7
Here we see the first hard-coded thing: #define CV_CN_SHIFT 3
. Because OpenCV has 8 pre-defined types, it uses the least-significant 3 bits in the cv::Mat
's 32-bit flags
member.
Bit | 31-3 |
2-0 |
---|---|---|
DDD |
MSB LSB
31............................| 2...0 |
|.............................| depth |
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxx| DDD |
The other hard-coded thing is #define CV_CN_MAX 512
, and 512 = 1 << 9
, therefore, cv::Mat
's channel information is stored from bit 3 to bit 11.
Bit | 31-3 |
11-3 |
2-0 |
---|---|---|---|
CCCCCCCCC |
DDD |
MSB LSB
31...................| 11......3 | 2...0 |
|....................| channels | depth |
|xxxxxxxxxxxxxxxxxxxx| CCCCCCCCC | DDD |
The channels
infomation in the cv::Mat
is used by some OpenCV functions (via CV_MAT_CN(mat.type())
) for some sanity checks, for example, some functions that only works with 3-channel 2D images.
Now, let's suppose that we agreed we can reduce the number of bits for channels
from 9
to 8
, and use that saved 1 bit for depth
:
Bit | 31-3 |
11-4 |
3-0 |
---|---|---|---|
CCCCCCCC |
DDDD |
MSB LSB
31...................| 11.....4 | 3...0 |
|....................| channels | depth |
|xxxxxxxxxxxxxxxxxxxx| CCCCCCCC | DDDD |
Then we can make the following modifications to that header file
#define CV_CN_MAX 256
#define CV_CN_SHIFT 4
#define CV_DEPTH_MAX (1 << CV_CN_SHIFT)
#define CV_8U 0
#define CV_8S 1
#define CV_16U 2
#define CV_16S 3
#define CV_32S 4
#define CV_32F 5
#define CV_64F 6
#define CV_16F 7
// add the custom `CV_64S` macro
#define CV_64S 8
// and since now we can have up to $2^4=16$ types
// so it's possible to add `CV_64U` (`:u64`) and `CV_32U` (`:u32`) as well
#define CV_64U 9
#define CV_32U 10
On the surface, this looks pretty legit, and in fact, if you made all the mentioned changes to OpenCV, you can compile a cv::Mat
that is initialized with CV_64S
as its type.
#include <iostream>
#include <opencv2/opencv.hpp>
#include <vector>
using namespace cv;
template <typename T, typename AS=T>
void print_data(cv::Mat& mat, const char * name) {
for (int i = 0; i < 3; i++) {
std::cout << name << '[' << i << "]: " << (AS)mat.template at<T>(i) << '\n';
}
std::cout << '\n';
}
int main() {
std::vector<int64_t> data1 = {INT64_MAX, INT64_MAX - 1, INT64_MAX - 2};
std::vector<int64_t> data2 = {0, 1, 2};
std::vector<int> as_shape = {1, 1, 3};
cv::Mat mat1((int)as_shape.size(), as_shape.data(), CV_64S, data1.data());
cv::Mat mat2((int)as_shape.size(), as_shape.data(), CV_64S, data2.data());
print_data<uint64_t>(mat1, "mat1");
print_data<uint64_t>(mat2, "mat2");
}
The output is
mat1[0]: 9223372036854775807
mat1[1]: 9223372036854775806
mat1[2]: 9223372036854775805
mat2[0]: 0
mat2[1]: 1
mat2[2]: 2
4. The magic number -- 0x28442211
However, once we try to do some operation on them, even the simplest one, like adding two matrices, we would get an incorrect result:
int main() {
// ... skipped
print_data<uint64_t>(mat2, "mat2");
// add `mat1` and `mat2`
auto mat3 = cv::Mat(mat1 + mat2);
print_data<uint64_t>(mat3, "mat3");
}
The output is:
mat1[0]: 9223372036854775807
mat1[1]: 9223372036854775806
mat1[2]: 9223372036854775805
mat2[0]: 0
mat2[1]: 1
mat2[2]: 2
mat3[0]: 16777215
mat3[1]: 0
mat3[2]: 0
Obviously, we got some wrong numbers. But we do have some clues from the value 16777215
, which is 0xFF_FF_FF
.
This means somewhere deep inside OpenCV, it still thinks that these matrices are some other type instead of CV_64S
.
After a quick grep in the OpenCV's code base, the following lines in particular drew my attention (in modules/core/include/opencv2/core/cvdef.h
):
/** Size of each channel item,
0x28442211 = 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem) */
#define CV_ELEM_SIZE1(type) ((0x28442211 >> CV_MAT_DEPTH(type)*4) & 15)
It's a pretty compact way to store the size info of all 8 data types into a single 32-bit integer.
// LSB
// 0001
#define CV_8U 0
// 0001
#define CV_8S 1
// 0010
#define CV_16U 2
// 0010
#define CV_16S 3
// 0100
#define CV_32S 4
// 0100
#define CV_32F 5
// 1000
#define CV_64F 6
// MSB
// 0010
#define CV_16F 7
I would probably do the same thing if I knew that my library would only deal with 8 data types.
Nevertheless, for this line, it's still relatively simple to change it so that it fits our needs.
As a reminder, we've added 3 types after exisiting ones,
// add the custom `CV_64S` macro
#define CV_64S 8
// and since now we can have up to $2^4=16$ types
// so it's possible to add `CV_64U` (`:u64`) and `CV_32U` (`:u32`) as well
#define CV_64U 9
#define CV_32U 10
Hence we should prepend 3 4-bit size info to this magic number
/** Original
0x28442211 = 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem)
Size of each channel item (new),
0x48828442211 = 0100 1000 1000 0010 1000 0100 0100 0010 0010 0001 0001 ~ array of sizeof(arr_type_elem)
MSB
0100 - CV_32U
1000 - CV_64U
1000 - CV_64S
...
LSB
*/
#define CV_ELEM_SIZE1(type) (int)((0x48828442211 >> CV_MAT_DEPTH(type)*4) & 15)
5. More changes needed, but does it worth the effort?
Well, it would be a happy ending if it worked after all the patches above, but I found more hard-coded things in OpenCV's code base, for example, this data conversion function in modules/core/src/matrix_sparse.cpp
static ConvertData getConvertElem(int fromType, int toType)
{
static ConvertData tab[][8] =
{{ convertData_<uchar, uchar>, convertData_<uchar, schar>,
convertData_<uchar, ushort>, convertData_<uchar, short>,
convertData_<uchar, int>, convertData_<uchar, float>,
convertData_<uchar, double>, 0 },
{ convertData_<schar, uchar>, convertData_<schar, schar>,
convertData_<schar, ushort>, convertData_<schar, short>,
convertData_<schar, int>, convertData_<schar, float>,
convertData_<schar, double>, 0 },
{ convertData_<ushort, uchar>, convertData_<ushort, schar>,
convertData_<ushort, ushort>, convertData_<ushort, short>,
convertData_<ushort, int>, convertData_<ushort, float>,
convertData_<ushort, double>, 0 },
{ convertData_<short, uchar>, convertData_<short, schar>,
convertData_<short, ushort>, convertData_<short, short>,
convertData_<short, int>, convertData_<short, float>,
convertData_<short, double>, 0 },
{ convertData_<int, uchar>, convertData_<int, schar>,
convertData_<int, ushort>, convertData_<int, short>,
convertData_<int, int>, convertData_<int, float>,
convertData_<int, double>, 0 },
{ convertData_<float, uchar>, convertData_<float, schar>,
convertData_<float, ushort>, convertData_<float, short>,
convertData_<float, int>, convertData_<float, float>,
convertData_<float, double>, 0 },
{ convertData_<double, uchar>, convertData_<double, schar>,
convertData_<double, ushort>, convertData_<double, short>,
convertData_<double, int>, convertData_<double, float>,
convertData_<double, double>, 0 },
{ 0, 0, 0, 0, 0, 0, 0, 0 }};
ConvertData func = tab[CV_MAT_DEPTH(fromType)][CV_MAT_DEPTH(toType)];
CV_Assert( func != 0 );
return func;
}
Again, it's not hard to add a few specialized template functions of convertData_
. The core issue here from my perspective is -- does it worth all the effort?
The reasons why I hesitate to go further are that:
-
Even if I managed to find all the hard-coded lines (relevant ones) and patched them correctly, we would only get limited operations from OpenCV that are available to these added types.
-
The
raw_type
inEvision.Mat
(or the value ofint cv::Mat type()
) will be totally different than the ones returned from the official build.It doesn't seem to be a huge problem at the first glance, however, OpenCV does have the functionality to persist/serialise
cv::Mat
to disk. Therefore, if one tries to load the serialised data which was generated by the original code, it would fail or return wrong data.Simply put, the header part of the serialised data will be different because we changed the what the underlying bits represent in
cv::Mat
'sflags
member. -
Even if we somehow managed to recognise if the serialised data was produced by the modified code or the original one, the amount of patches together with all the python code in this project would make it even harder for anyone who's willing to contribute ti this project.
-
It's possible to submit all the patches to the upsteam (OpenCV), yet I personally highly doubt that if they would accept the PR because
- all the compatibilities issues (as in 2.);
- these types are not often used in computer vision (otherwise OpenCV would have supported these types in the first place).
-
Even if they were willing to add these types, these new types would not be available until the next major update (OpenCV 5.0) because of these compatibilities issues.
For example,
CV_USRTYPE1
was available in OpenCV 3.x, and OpenCV decided to replaceCV_USRTYPE1
withCV_16F
(half-precision float). But they had to do that in a major update, i.e., OpenCV 4.0.// modules/core/include/opencv2/core/hal/interface.h // in OpenCV 3.x #define CV_8U 0 #define CV_8S 1 #define CV_16U 2 #define CV_16S 3 #define CV_32S 4 #define CV_32F 5 #define CV_64F 6 #define CV_USERTYPE1 7 // in OpenCV 4.x #define CV_USRTYPE1 (void)"CV_USRTYPE1 support has been dropped in OpenCV 4.0" #define CV_8U 0 #define CV_8S 1 #define CV_16U 2 #define CV_16S 3 #define CV_32S 4 #define CV_32F 5 #define CV_64F 6 #define CV_16F 7
6. Any workarounds?
There are two workarounds that I can think of at the moment, and they all have different trade-offs.
a. Map these types to some other types
It's possible set a map for those unsupported types in the config.exs
file.
config :evision, unsupported_type_map: %{
{:s, 64} => {:f, 64},
{:u, 64} => {:f, 64},
{:u, 32} => {:f, 32}
}
The above config would map :s64
and :u64
to :f64
, and map :u32
to :f32
. And the very first drawback is that it would be a totally different type. Secondly, value-wise :f64
does not cover every single possible value of :u64
or :s64
.
The 64-bit double (assuming using the IEEE 754 standard) can has 52 bits of mantissa, so the largest integer you can store in a double without losing precision is
b. Use other Nx backends
i) :nx
Nx.BinaryBackend
is implemented in pure Elixir, and :nx
is a dependency of this library, so you can use it out-of-box. However, Nx.BinaryBackend
could be really slow if you have a relatively large martix.
ii) :torchx
Torchx.Backend
is another Nx backend and it uses libtorch
. Very fast and superb library, but the official prebuilt binaries of libtorch
only support x86_64
CPUs (and Apple Silicon (aarch64-apple-darwin) via brew
).
from evision.
Related Issues (20)
- Unexpected crash without any errors during face detection HOT 4
- Documentation Enhancements
- Consider make available precompiled binaries with FFmpeg included HOT 3
- Error compiling on Elixir 1.15.0/OTP 26.0.1 on M1 Mac with EVISION_PREFER_PRECOMPILED=true HOT 6
- Not able to use Evision Library with Elixir 1.14.4-otp25 HOT 6
- Use asdf for macOS CI HOT 1
- [macOS] `@behaviour :wx_object does not exist` HOT 2
- Add precompiled binaries for NIF version 2.17 (OTP 26)
- Failed to parse ONNX model on FaceRecognizerSF
- Face recognition always answers that the two photos are of the same person.
- Proposal: To prevent the entire Erlang VM from crashing when Evision's NIF aborts HOT 3
- Split this repo into `evision_beam` and `evision` HOT 9
- Using manylinux2010 for x86_64 linux? HOT 5
- Has Evision.Backend been removed? HOT 8
- Update model zoo URL for OpenCV 4.8.0
- Compilation error on Mac M1 HOT 5
- `Evision.imread` gives confusing `{:error, "empty matrix"}` error when file is not found HOT 2
- Zoo smartcell fails face detection because Kino.Input.image changed its return structure HOT 4
- Inheritance with Evision.Features2D HOT 3
- Fail to `im_write` after divided by Nx HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evision.