Giter Site home page Giter Site logo

bkloppenborg / liboi Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 6.0 3.59 MB

OpenCL Interferometry Library

Home Page: https://github.com/bkloppenborg/liboi/wiki

License: GNU Lesser General Public License v3.0

C++ 85.19% Shell 6.80% Makefile 1.60% Python 4.14% C 0.86% CMake 0.98% M4 0.43%

liboi's Introduction

OpenCL Interferometry Library (liboi)

Description

The OpenCL Interferometry Library (liboi) is a C / C++ library that aims to provide software developers with access to routines that are commonly used in interferometry. The software heavily relies on the heterogeneous computing environment targeted by the Open Compute Language (OpenCL) to target a wide range of traditional and multi-core CPUs; servers, hand-held/embedded devices, specialized hardware, and Graphical Processing Units (GPUs).

Features

The library currently provides:

  • OpenGL / OpenCL Interop (copy OpenGL image to OpenCL buffers)
  • Image to Fourier transform via. a discrete Fourier Transform
  • Fourier transform to interferometric data (visibility squared, bispectra)
  • Image data to chi, chi squared, and log(likelihood).

Installing prerequisites

Apple / OS X

To use liboi on OS X, the following are required:

  • OpenCL 1.1 support (OS X 10.7 or higher)
  • gcc 4.7.4 or later โ€ 
  • cmake 2.8 or higher
  • cfitsio and ccfits

The OS can be installed/upgraded through the Appstore. Several of the additional required libraries can be installed through MacPorts:

sudo port install gcc47
sudo port install cmake
sudo port install cfitsio
sudo port install git

Please note that Apple may have installed a fake version of gcc located at /opt/local/bin/gcc that is a wrapper for clang. Besure to specify that you want to use the macport-installed compiler using

export CC=/opt/local/bin/gcc-mp-4.7
export CXX=/opt/local/bin/g++-mp-4.7

before running and ./configure or cmake commands!

Next you will need to download and install ccfits. You can do this using commands similar to the following

wget http://heasarc.gsfc.nasa.gov/fitsio/CCfits/CCfits-2.4.tar.gz
tar xvzf CCfits-2.4.tar.gz
cd CCfits
./configure --prefix=/opt/local --with-cfitsio=/opt/local
make
sudo make install

After this follow the installation instructions below.

โ€  When we last attempted to compile liboi on a Mac the machine had Xcode 4.6 (which included apple clang 4.2 which is based on llvm-clang 3.2svn) that did not have full C++11 support. We do not have access to an Apple system for development, thus we cannot try compiling on a more recent system. If you wish to help get liboi running on a Mac please contact us!

Debian/Ubuntu

To use liboi on a Debian / Ubuntu system, the following are should be installed:

  • gcc and g++ v4.6.3 (or later)
  • cmake v2.8
  • cfitsio, ccfits
  • An OpenGL library (optional, enables OpenCL-OpenGL interoperability)
  • An OpenCL 1.1 compliant device and library

Most of these packages can be easily installed through apt-get. First the compiler, cmake, cfitsio, and ccfits:

sudo apt-get install build-essential g++ cmake libccfits0 libccfits-dev git

To enable OpenCL-OpenGL interoperability you should also install an OpenGL library. This should install the prerequisites:

sudo apt-get install libglu1-mesa libglu1-mesa-dev

Lastly install the OpenCL Installable Client Driver (ICD) loader:

sudo apt-get install opencl-headers ocl-icd-libopencl1

NVIDIA GPUS:

On systems that do not have NVIDIA unified virtual memory (e.g. Ubuntu 13.10 and earlier) you only need the display drivers (e.g. one of nvidia-current, nvidia-304, or nvidia-331), OpenCL headers, and an OpenCL ICD loader:

sudo apt-get install nvidia-319 opencl-headers nvidia-opencl-dev

On later Ubuntu systems you will need to add the UVM package. You may also need to install special modprobe rules for the NVIDIA UVM drivers. These can be found in the following packages:

sudo apt-get install nvidia-331 nvidia-331-uvm nvidia-modprobe nvidia-opencl-icd-331

If you prefer, you can install the drivers from NVIDIA instead.

If you receive an error involving clCreateFromGLTexture during the linking stage, edit the src/CMakeLists.txt and replace the OpenCL version testing section with the following to force OpenCL 1.1 compatability mode:

#if(${OpenCL_VERSION_STRING} VERSION_EQUAL 1.0)
#    add_definitions(-DDETECTED_OPENCL_1_0)
#elseif(${OpenCL_VERSION_STRING} VERSION_EQUAL 1.1)
    add_definitions(-DDETECTED_OPENCL_1_1)
#elseif(${OpenCL_VERSION_STRING} VERSION_EQUAL 1.2)
#    add_definitions(-DDETECTED_OPENCL_1_2)    
#elseif(${OpenCL_VERSION_STRING} VERSION_EQUAL 2.0)
#    add_definitions(-DDETECTED_OPENCL_2_0)
#else(${OpenCL_VERSION_STRING} VERSION_EQUAL 2.0)
#    add_definitions(-DDETECTED_OPENCL_UNKNOWN_VERSION)     
#endif(${OpenCL_VERSION_STRING} VERSION_EQUAL 1.0)

AMD GPUs and AMD CPUs:

sudo apt-get install fglrx opencl-headers

or if you so choose you can install the AMD graphic drivers, and then install the [OpenCL SDK] (http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/downloads/)

Intel CPUs and GPUs

For Intel CPUs you need to install the Intel OpenCL SDK

Unlike on Windows and Apple systems, Intel does not provide support for OpenCL on their GPUs with their display drivers. Instead an open source project called Beignet is filling the gap.

We have successfully compiled liboi and verified that it functions with Beignet, however doing so was not a straightforward process. See the beignet.md document in this directory for further details.

Checkout / getting a copy of LibOI source code

After installing the aforementioned prerequisites, you simply need to checkout a copy of LibOI

git clone https://github.com/bkloppenborg/liboi.git
cd liboi
git submodule update --init

If you wish to have the bleeding-edge development version of liboi (which often includes the latest features and bugfixes) checkout the development version:

git checkout develop

otherwise you can stay on the default master branch.

Building instructions

After you have obtained a copy of the source and initialized the submodules, simply

cd build
cmake ..
make

If you have installed a library in a non-standard location, please see the [Overriding library locations][] section below. If you have any errors in the compilation steps, please contact us. (If you are on a Apple machine be sure to set the export lines mentioned above!)

Overriding library locations

If you have installed a library in a non-standard location, it may be necessary to override the library installation location. The following environmental variables are checked by CMake when building:

CFITSIO_ROOT_DIR    - path to directory above cfitstio.h and libcfitsio.*
CCFITS_ROOT_DIR     - path to directory above CCfits/ (the folder)
OPENCL_ROOT_DIR     - path to directory containing OpenCL
                      that is OpenCL/cl.hpp (Apple) or CL/cl.hpp (everyone else)

These can be set by typing export VARIABLE=/path/to/directory before calling cmake in the building instructions above. CMake should indicate that the directory you specified is used, rather than the default search path on your computer.

Testing and benchmarking

After compiling liboi it is useful to test that liboi is functioning correcly on your hardware. For this purpose we have packaged the liboi_tests and liboi_benchmark program in the liboi/bin directory.

The liboi_tests program executes a series of unit tests and compares the result of liboi's calculations with analytic results. By default liboi_tests runs on the first OpenCL-compatabile GPU it can find. You can also have it run on an OpenCL-supporting CPU by specifying the -cpu option. See liboi_tests -h for more information.

When you execute liboi_testsyou should see something like this:

liboi/bin$ ./liboi_tests 
[==========] Running 34 tests from 12 test cases.
[----------] Global test environment set-up.
[----------] 14 tests from ChiTest
[ RUN      ] ChiTest.CPU_Chi_ZERO
[       OK ] ChiTest.CPU_Chi_ZERO (3 ms)
...
[----------] Global test environment tear-down
[==========] 34 tests from 12 test cases ran. (5987 ms total)
[  PASSED  ] 34 tests.

In the ideal situation all tests will pass, but frequently a few tests will fail. In partcular, the following tests often fail for the following reasons:

  • CRoutine_Sum_NVidia.CL_Sum_CPU_CHECK executes a parallel sum. Due to some optimizations specific to NVidia GPUs, this test seems to always fail on non-NVidia hardware. Thus if you have an Intel or AMD GPU, don't worry if this test fails.
  • CRoutine_DFT.CL_UniformDisk this test compares the real (indexed as .s[0]) and imaginary (indexed .s[1]) components of an analytical uniform disk Fourier transform with a discrete Fourier transform of an image. The analytic and DFT answers are required to match at a 3% or better level. Out of the 10 UV points tested, it is common for a few to fail, particularly on older hardware.
  • CRoutine_DFT.CL_DFT_LARGE_UNEVEN_N_UV_POINTS compares an analytical and DFT result when there are a large uneven number of UV points (10221 in total). This test often fails on older hardware where shared memory limitations occur.

The liboi_benchmark program accesses how fast your hardware can execute the following sequence:

  1. Copy a 128x128 image from RAM to OpenCL device memory
  2. Compute the DFT on a reference data set
  3. Compute a chi-squared
  4. Sum the chi-squared
  5. Copy the summed chi-squared back to the host.

By default this will execute on a GPU. You can, specify the -cpu option to run the benchmark program on your OpenCL-compatible CPU. You can also change the image size, image scale, and number of iterations used in the benchmark. See liboi_benchmark -h for this and other options.

liboi_benchmark will print some information about your hardware, print metadata about the data file being used, and then benchmark liboi. When you run liboi_benchmark you should see something like the following:

liboi/bin$ ./liboi_benchmark 
Running Benchmark with: 

Device information: 
Device Name: Tahiti

...

Data set information for: 
 /home/bkloppenborg/workspace/liboi/bin/../samples/PointSource_noise.oifits
N Vis: 0
N V2 : 525
N T3 : 700
N UV : 529
Average JD: 2456253.90049
Building kernels.
Starting benchmark.
Iteration 0 Chi2: 1993.02527
Iteration 100 Chi2: 1993.02527
Iteration 200 Chi2: 1993.02527
Iteration 300 Chi2: 1993.02527
Iteration 400 Chi2: 1993.02527
Iteration 500 Chi2: 1993.02527
Iteration 600 Chi2: 1993.02527
Iteration 700 Chi2: 1993.02527
Iteration 800 Chi2: 1993.02527
Iteration 900 Chi2: 1993.02527
Benchmark Test completed!
1000 iterations in 6.33000 seconds. Throughput 157.97788 iterations/sec.

The important aspects to note are that (1) the chi-squared is constant as a function of iteration number, (2) the chi-squared is near the reference number above, and (2) your throughput. The performance of liboi is limited by the DFT which is linearly dependent on the product of the number of UV points and number of pixels. In terms of what you expect, here are some representative test values from liboi_benchmark on various hardware:

OpenCL device Iterations/sec Other information
NVIDIA GTX 970m 322 Using optirun / bumblebee
NVIDIA GeForce GTX 570 260
Intel i7-3520M (GPU) 210 Apple driver, HD Graphics 4000
AMD Radeon R9 280x 155
Intel i7-4770K (GPU) 135 Beignet, HD Graphics 4600
NVIDIA GeForce 8600m GT 60
NVIDIA GeForce 8400 GS 50
i7-4770K (CPU) 5 Running on 4 physical cores.

All tests were performed on various Linux distributions using manufacturer drivers unless otherwise noted. Any additional benchmarks would be appreciated.

Licensing and Acknowledgements

LibOI is free software, distributed under the [GNU Lesser General Public License (Version 3)](<http://www.gnu.org/licenses/lgpl.html).

If you use this software as part of a scientific publication, please cite as:

Kloppenborg, B.; Baron, F. (2012), "LibOI: The OpenCL Interferometry Library" (Version X). Available from https://github.com/bkloppenborg/liboi.

liboi's People

Contributors

bkloppenborg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

liboi's Issues

liboi CPU code fails to produce correct output

Running liboi with a CPU OpenCL device always produces incorrect output. In order to determine the cause of this, I have modified liboi_tests in commit ea59e3c to have a -cpu flag so that tests can be executed on the CPU. In the end, all tests should be executed on both types of devices by default.

Support Intel HD Graphics / Intel Iris Graphics on Linux platforms

Reading over the Beignet mailing lists it appears that it will be some time before OpenCL - OpenGL interop is enabled on Linux systems. A suitable workaround for this platform would be to:

  1. Get a pointer to the OpenGL texturebuffer
  2. Copy the image to RAM
  3. Copy the image back to the GPU, but to an OpenCL buffer.

We will need to implement some new functionality in CLibOI::SetImageSource, perhaps catching the else // mImage_gl == NULL block to notify CRoutine_ImageToBuffer that the buffer is pure OpenGL without any interop enabled.

liboi throws an OpenCL error from CRoutine_Sum::ComputeSum for data sets with nData = [33-64]

Due to a bug in NVidia parallel reduction SDK example code (i.e. oclReduce), parallel sums with N = [33-64] will access memory outside of bounds. liboi detects this error and throws the following exception:

terminate called without an active exception
Error Detected
Unable to copy summed value to host. CRoutine_Sum::ComputeSum 
OpenCL Error: -5 Out of resources

The bug in oclReduce needs to be fixed and propagated into liboi's sum CPU code.

DFT tests incomplete.

The testing environment for CRoutine_DFT presently checks the CPU routine using a point source in the corner of the array. I haven't been able to get the DFT routine to pass with a point source in the center of the image.

We need to:

  1. Implement CPU point source check for the center of the image
  2. Implement CPU assymetric point source check (non-zero phases)
  3. Implement CPU vs. OpenCL checks.

Add WCS information to exported FITS images

Adding WCS code to the saved FITS images seems fairly easy. Fabien indicates it would be something like this:

    // Write keywords to get WCS to work //
  fits_write_key_dbl(fptr, "CDELT1", -scale, 3, "Milli-arcsecs per pixel", status);
  fits_write_key_dbl(fptr, "CDELT2", scale, 3, "Milli-arcsecs per pixel", status);
  fits_write_key_dbl(fptr, "CRVAL1", 0.0, 3, "X-coordinate of ref pixel", status);
  fits_write_key_dbl(fptr, "CRVAL2", 0.0, 3, "Y-coordinate of ref pixel", status);
  fits_write_key_lng(fptr, "CRPIX1", naxes[0]/2, "Ref pixel in X", status);
  fits_write_key_lng(fptr, "CRPIX2", naxes[1]/2, "Ref pixel in Y", status);
  fits_write_key_str(fptr, "CTYPE1", "RA",  "Name of X-coordinate", status);
  fits_write_key_str(fptr, "CTYPE2", "DEC", "Name of Y-coordinate", status);
  fits_write_key_str(fptr, "CUNIT1", "mas", "Unit of X-coordinate", status);
  fits_write_key_str(fptr, "CUNIT2", "mas", "Unit of Y-coordinate", status);

What is not clear is whether or not the scale needs to be in mas/pixel or rad/pixel.

Chi2 kernel buffer should be zeroed out

(importing from Redmine)

If multiple data sets are to be supported, the chi2 buffer must be zeroed before computing the chi2, lest un-updated values may remain in the end of the buffer.

liboi will not compile with Xcode 4.5

Report from Fabien. liboi will not compile using Xcode 4.5 on Mac OS 10.7. The version of clang that is shipped with Xcode does not appear to support the full c++11 features required for liboi.

$ clang -v      
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin11.4.2
Thread model: posix

Compiling with clang --std=c++11 and --stdlib=libc++ gives:

[  5%] Building CXX object lib/ccoifits/src/CMakeFiles/ccoifits_static.dir/COI_ARRAY.cpp.o
In file included from /Users/fbaron/fb/liboi/lib/ccoifits/src/COI_ARRAY.cpp:8:
In file included from /Users/fbaron/fb/liboi/lib/ccoifits/src/COI_ARRAY.h:11:
/Users/fbaron/fb/liboi/lib/ccoifits/src/COI_TABLE.h:16:10: fatal error: 'array' file not found
#include <array>
         ^
1 error generated.
make[2]:  [lib/ccoifits/src/CMakeFiles/ccoifits_static.dir/COI_ARRAY.cpp.o] Error 1
make[1]:  [lib/ccoifits/src/CMakeFiles/ccoifits_static.dir/all] Error 2
make: *** [all] Error 2

Compiling with clang --std=c++11 and --stdlib=libc++ gives:

[  1%] Building CXX object lib/gtest-1.6.0/CMakeFiles/gtest.dir/src/gtest-all.cc.o
In file included from /Users/fbaron/fb/liboi/lib/gtest-1.6.0/src/gtest-all.cc:39:
In file included from /Users/fbaron/fb/liboi/lib/gtest-1.6.0/include/gtest/gtest.h:57:
In file included from /Users/fbaron/fb/liboi/lib/gtest-1.6.0/include/gtest/internal/gtest-internal.h:40:
/Users/fbaron/fb/liboi/lib/gtest-1.6.0/include/gtest/internal/gtest-port.h:499:13: fatal error: 'tr1/tuple' file not found
#   include <tr1/tuple>  // NOLINT
            ^
1 error generated.
make[2]:  [lib/gtest-1.6.0/CMakeFiles/gtest.dir/src/gtest-all.cc.o] Error 1
make[1]:  [lib/gtest-1.6.0/CMakeFiles/gtest.dir/all] Error 2
make: *** [all] Error 2

liboi assumes OpenGL is always wanted.

The COpenCL::Init function cl_context_properties variable assumes that OpenGL is always wanted; however, it is conceivable that liboi could be used from a console application with no access to a GUI. This property should be pushed upstream and specified by the application developer, rather than assumed in the library.

Data export method

Implement method to export modeled data to OIFITS format. Perhaps using an OIDataList from ccoifits would be best, then add writing routines to ccoifits would be the best option.

Switch to OpenCL Installable Client Drivers

Installable Client Drivers for OpenCL are being released by NVidia, Intel and AMD/ATI. liboi should link against these instead of vendor-specific OpenCL implementations. This might involve rewriting the COpenCL class to use different context initialization functions.

Rework Chi Framework

The Chi/Chi2 routines should be able to output Chis based upon V2, T3, complex vis, closure amp, differential visibilities, and closure phase only. Presently the routines require both V2 and T3 to be present in the data.

Flagged data can result in poor performance

Flagging data in OIFITS files can result in poor performance for certain values of flagged points. Here are some test results on a MIRC 6T data file (8 channels, 15 rows). From the test results below it is clear this is a problem size division and/or memory access pattern issue.

This data set 2011Nov03-epsAur-avg5.oifits consitsts of
0 Vis
720 V2
960 T3
2860 UV Points

We flagged only V2 records.

Flagging N values inside of a MIRC (8-channel) data file results in the following performance:

  • 0: good (no values flagged)
  • 1: good
  • 2: bad
  • 3: bad
  • 4: good
  • 5: good
  • 6: bad
  • 7: good
  • 8: good (entire row flagged)

Flagging N values between rows of a MIRC (8-channel) data file results in the following performance:

  • 0: good (no values flagged)
  • 1: good
  • 2: bad
  • 3: bad
  • 4: good
  • 5: good
  • 6: bad
  • 7: bad
  • 8: good
  • 9: good
  • 10: good
  • 11: bad
  • 12: good
  • 13: good
  • 14: good
  • 15: good

Flagging one entire column (15) plus an extra element in each row to total N:

  • 16: good
  • 17: bad
  • 18: good
  • 19: good
  • 20: good
  • 21: good
  • 22: bad
  • 23: good
  • 24: good
  • 25: good
  • 26: good
  • 28: good
  • 29: bad
  • 30: good

dft_2d.cl speedup

Right now the local execution size is hard coded to 128 units. On newer GPUs, this limit can be increased. The function

CRoutine_DFT::FT

should determine group sizes automatically by querying the OpenCL context for it's capabilities. The kernel. The kernel itself will need to be modified to ensure it doesn't read/write from/to invalid memory locations.

The DFT kernel occupies 77% of the GPU's time, so this is should be regarded as a high priority item.

liboi host memory image must be defined prior to Init() being called.

Liboi requires that the image source be specified prior to Init() being called. If you initialize your images after this, then liboi will not internally allocate buffers correctly. For host memory, you can do something like this:

'''
float * image_liboi_bug = 0;
liboi.SetImageSource(image_liboi_bug); //
'''
Ideally you should be able to say where your images come from without defining the memory location.

normalize_float.cl speedup

The normalize_float.cl kernel could be accelerated by computing the 1/x and storing it into local/global memory. Right now this kernel achieves ~60% occupancy and accounts for < 0.1% of GPU time. Therefore this is quite low in terms of priority.

Debian packaging of kernels

As of revision 1c0054d liboi supports debian packaging however the OpenCL kernels are not built in to or distributed inside of the package. We should probably stringify the kernels and build them in to liboi to make distribution easier. A few existing solutions to this problem:

  • ArrayFire's packaging
    which supports namespaces and stringification of individual kernels, but requires a separate executable
    (bin2cpp) to be built.
  • KitWare's viscl which will
    compile all of the OpenCL source into a single include via. CMake REGEX functions.

I like the Kitware approach, but liboi will require that each kernel be assembled separately. We need a hybrid solution.

Implement a plan interface for the library

In order to be easy to use, it would be best for the programmer to be able to define a "plan" of execution. The plan should:

  1. Be a vector of CRoutine* objects.
  2. Handle routine (re)allocation, initialization, and destruction.
  3. Detect if a plan is being executed.
  4. Terminate a plan during execution?

LIBOI should be modified to accept plan objects and execute the plan as instructed. This gives the programmer maximum flexibility as to what a plan should do. The plans could be invoked by:

CLibOI::run(plan, data_num, output, n):
    for routine in plan:
        switch(routine.id):
            ...

As the routines allocate and manage their own memory internally, the remaining memory (mostly buffers) could either be managed by the plan or by CLibOI.

CVectorList should be replaced by vector< std::shared_ptr<type> >

With C++11 the shared_ptr data type (in ) can be used with vectors to store data. This data type insures that objects are destroyed when there are no longer any references. The CVectorList data type is therefore superceeded and should be replaced.

Note, this will likely require several internal changes to the library to make it C++11 compatible.

loglike values seem to have increased

Since the non-convex chi2 kernels were introduced the log likelihood values appear to have increased dramatically. This needs to be addressed before version 1.0 is released.

NVidia-based sum kernel produces incorrect output on CPU

Possibly related to issue #32, the CRoutine_Sum kernel currently produces extremely incorrect values when executed on the CPU. It appears to occasionally generate incorrect sums when executed on an ATI GPU as well. The first aspect was revealed in commit ea59e3c, whereas the second aspect manifests occasionally on my AT R9 280x.

Since this will be the third sum kernel we've used (see commit ac60283), we'll make the sum kernel an abstract class with a unified interface and let the user decide which kernel they should use.

New non-convex chi2 formulation

In a discussion with Fabien Baron, it appears stars that are well resolved often have underestimated chi2r for the triple products when T3 amplitudes are small. His solution is to implement the non-convex chi2 for the bispectra:

t3amp_residual = (t3amp_data - t3amp)/t3amperr
t3phi_residual = mod360(t3phi_data - t3phi)/t3phi_err

Alternatively, the phase expression can be rewritten as:

abs( exp(complex(0, t3_t3phi) ) - t3_model/abs(t3_model) ) /(t3_t3phierr)

In order to implement this formulation we will need to create a new chi2 kernel and (possibly) rearrange how data is stored on the GPU. In particular, it might be worthwhile to split the T3_amp and T3_phi into two separate memory blocks. Profiling will be needed to ensure memory access patterns do not significantly degrade performance.

Chi, chi2, and loglike for individual types of data

At present liboi calculates the chi, chi2, and loglike for all of the loaded data at the same time; however, some applications might find it useful to only calculate these quantities for subsets of the available interferometric data.

Data randomization for bootstrapping

For bootstrapping data must be somehow randomized and placed into the GPU's memory.

At present this library reads OIFITS data from the file and then stores it in RAM. When

COILibData::CopyToOpenCLDevice 

is called the data in RAM are copied to the GPU, allocating memory as required. When ~COILibData() is called, all memory on the GPU and CPU are deallocated.

At present there is no method to bootstrap data without creating a new fake OIFITS data file and copying it to the GPU. It would be nice if each COILibData object could bootstrap itself and keep track of all allocated memory.

Switch to smart pointers internally

At present liboi uses standard pointers. Although this is fine, memory leaks could be easily introduced if new functions/routines are added in the future. Instead we should switch to std::shared_ptr or something similar to ensure all memory is freed upon exit.

This will likely require some modifications to programs that use liboi. The C-interface may need to unwrap pointers in some instances.

Create standardized C++ header for the library.

At present we are linking against COILib.h in external programs. We should probably make a standard header against which external programs can link and place this in the includes directory.

OpenCL Chi/Chi2 and loglike values returning NaNs

(importing from Redmine)

Sometimes the Chi, Chi2, and Loglike OpenCL values are NaN's whereas the CPU-based version of the tests always reports a valid number. This appears to be an issue with the parallel sum kernel.

Data read error

In some circumstances data is not read in. This is due to uninitialized memory in COILibDataList::ReadFile

Compliation error

Run into an issue when compiling liboi (GLuint not recognized).
Adding #include <GL/gl.h> on line 55 of liboi.hpp fixed it.

Create logging framework

It would be nice if liboi had a built-in logging framework with different levels of verbosity. The logging framework should compute md5 or sha hashes of any files read in (kernels, data files) too. Suggestion comes from conversation with Will M. about problems when doing support work for HITRAN

DFT routine does not accept arbitrary sized images

The CRoutine_DFT::FT(...) function assumes a local working size of 128. This setting arbitrarily restricts the image sizes (in total number of pixels) to multiples of 128. This number should be determined dynamically from

  • the size of the image
  • the total local working size supported by the GPU (as the DFT kernel uses shared memory)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.