Giter Site home page Giter Site logo

adaboost's People

Contributors

akankshasehrawat avatar arushisinghal avatar bits2zbytes avatar czgdp1807 avatar fiza11 avatar tanvi141 avatar vi1i avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

adaboost's Issues

Complete documentation

Description of the problem

Documentation for some classes and functions isn't complete in .hpp files. The issue can be resolved by adding docstrings to all the missing places.

Example of the problem

References/Other comments

Adding support for Windows

Description of the problem

There isn't any documentation in README for building the project and working with it on Windows.

Example of the problem

References/Other comments

Using non-default streams in CUDA

Description of the problem

Currently, in #6 all the operations are issued to default stream. However, I was thinking that we can use non-default streams for issuing various kernels to different operations for their parallel execution.
An example of such a situation is filling n vectors parallelly with fill_vector_kernel launched in n separate streams. In fact, one more example can be to fill n*m matrix with n or m kernels launched in separate streams.
Before moving on to the implementation we can discuss the API for the above use case.
Please comment below if you have thought of something. I will come up with the design soon.
One more advantage of using non-default streams as claimed by https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/ is the overlap of data transfers and kernel execution. However IMO, this isn't really useful for this library because it may be the case that user wants to copy back only small sized Vector to host and for that wasting time in creating streams isn't a good idea.

Example of the problem

Design Discussions

Description of the problem

This issue aims at discussing the design of the software that is going to be developed covering the below topics,

  1. File structure
  2. User facing APIs
  3. Class design
  4. Hardware requirements(mainly GPUs)

I will try to come up with the first one ASAP. However, if you already prepared something then let us know in the comments.

Example of the problem

References/Other comments

We will follow https://web.stanford.edu/~hastie/Papers/samme.pdf
If you have something to suggest which can be used in the project then let us know.

Removing unnecessary lines from .travis.yml and fixing CMakeLists.txt

Description of the problem

  1. The following line is not needed,

    - ls -a

  2. The following should be changed to use 1.10.0 release

    adaboost/CMakeLists.txt

    Lines 23 to 31 in 5df11ee

    if(INSTALL_GOOGLETEST AND BUILD_TESTS)
    include(${CMAKE_ROOT}/Modules/ExternalProject.cmake)
    ExternalProject_Add(googletest
    GIT_REPOSITORY https://github.com/google/googletest
    GIT_TAG master
    SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/googletest-src"
    BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/googletest-build"
    TEST_COMMAND "")
    endif()

Example of the problem

References/Other comments

Boost.Python or SWIG or PyBindGen

Description of the problem

I was just exploring options to generate Python wrappers for C++ code and came across PyBindGen . SWIG is fine but I think PyBindGen is more logical to use.
What do you say?

I was going through the Boost.Python as well and I believe that this is much better than both of the above alternatives due to the greatest amount of content available on the web.

Example of the problem

References/Other comments

[1] https://pybindgen.readthedocs.io/en/latest/tutorial/#what-is-pybindgen

Bug in matrix multiplication using shared memory

Description of the problem

The following code doesn't give correct results for matrix multiplication using shared memory,

template <class data_type_matrix>
            __device__
            data_type_matrix get_element(
            data_type_matrix* mat,
            unsigned row,
            unsigned col,
            unsigned stride)
            {
                return mat[row*stride+col];
            }

            template <class data_type_matrix>
            __device__
            void set_element(
            data_type_matrix* mat,
            unsigned row,
            unsigned col,
            data_type_matrix value,
            unsigned stride)
            {
                mat[row*stride+col] = value;
            }

            template <class data_type_matrix>
            __device__
            data_type_matrix* get_sub_matrix(
            data_type_matrix* mat,
            unsigned block_row,
            unsigned block_col,
            unsigned stride)
            {
                data_type_matrix* mat_sub =
                new data_type_matrix[BLOCK_SIZE*BLOCK_SIZE];
                mat_sub = &mat[stride*BLOCK_SIZE*block_row+BLOCK_SIZE*block_col];
                return mat_sub;
            }

            template <class data_type_matrix>
            __global__
            void multiply_kernel(
            data_type_matrix* mat1,
            data_type_matrix* mat2,
            data_type_matrix* result,
            unsigned mat1_cols,
            unsigned mat2_cols,
            unsigned result_cols)
            {
                unsigned block_row = blockIdx.y;
                unsigned block_col = blockIdx.x;
                data_type_matrix* result_sub = get_sub_matrix(result, block_row,
                                                              block_col, result_cols);

                unsigned row = threadIdx.y;
                unsigned col = threadIdx.x;

                for(unsigned m = 0; m < (mat1_cols + BLOCK_SIZE - 1)/BLOCK_SIZE; m++)
                {
                    data_type_matrix* mat1_sub = get_sub_matrix(mat1, block_row,
                                                                m, mat1_cols);
                    data_type_matrix* mat2_sub = get_sub_matrix(mat2, m,
                                                                block_col, mat2_cols);

                    __shared__ data_type_matrix mat1_shared[BLOCK_SIZE][BLOCK_SIZE];
                    __shared__ data_type_matrix mat2_shared[BLOCK_SIZE][BLOCK_SIZE];

                    mat1_shared[row][col] = get_element(mat1_sub, row, col, mat1_cols);
                    mat2_shared[row][col] = get_element(mat2_sub, row, col, mat2_cols);

                    data_type_matrix cvalue = 0.0;

                    __syncthreads();

                    for(unsigned e = 0; e < BLOCK_SIZE; e++)
                    {
                        cvalue += mat1_shared[row][e] * mat2_shared[e][col];
                    }

                    __syncthreads();

                    set_element(result_sub, row, col, cvalue, result_cols);
                }
            }

            template <class data_type_matrix>
            void multiply_gpu(const MatrixGPU<data_type_matrix>& mat1,
                              const MatrixGPU<data_type_matrix>& mat2,
                              MatrixGPU<data_type_matrix>& result)
            {
                adaboost::utils::check(mat1.get_cols() == mat2.get_rows(),
                                       "Order of matrices don't match.");
                dim3 gridDim((mat2.get_cols() + BLOCK_SIZE)/BLOCK_SIZE,
                             (mat1.get_rows() + BLOCK_SIZE)/BLOCK_SIZE);
                dim3 blockDim(BLOCK_SIZE, BLOCK_SIZE);
                multiply_kernel
                <<<gridDim, blockDim>>>
                (mat1.get_data_pointer(),
                 mat2.get_data_pointer(),
                 result.get_data_pointer(),
                 mat1.get_cols(),
                 mat2.get_cols(),
                 result.get_cols());
            }

Example of the problem

#include<gtest/gtest.h>
#include<string>
#include<adaboost/cuda/cuda_data_structures.hpp>
#include<adaboost/utils/cuda_wrappers.hpp>
#include<stdexcept>

TEST(Cuda, MatricesGPU)
{
    adaboost::utils::cuda::cuda_event_t has_happened;
    adaboost::utils::cuda::cuda_event_create(&has_happened);
    adaboost::cuda::core::MatrixGPU<float> mat_f;
    EXPECT_EQ(0, mat_f.get_cols())<<"Number of columns should be 0";
    EXPECT_EQ(0, mat_f.get_rows())<<"Number of rows should be 0.";
    adaboost::cuda::core::MatrixGPU<float> mat1(3, 3), mat2(3, 3), mat3(2, 1);
    mat1.fill(4.0);
    mat2.fill(5.0);
    mat1.copy_to_device();
    mat2.copy_to_device();
    adaboost::utils::cuda::cuda_event_record(has_happened);
    adaboost::utils::cuda::cuda_event_synchronize(has_happened);
    adaboost::cuda::core::MatrixGPU<float> result1(3, 3);
    adaboost::cuda::core::multiply_gpu(mat1, mat2, result1);
    adaboost::utils::cuda::cuda_event_record(has_happened);
    adaboost::utils::cuda::cuda_event_synchronize(has_happened);
    result1.copy_to_host();
    for(unsigned int i = 0; i < 3; i++)
    {
        for(unsigned int j = 0; j < 3; j++)
        {
            std::cout<<i<<" "<<j<<" "<<result1.at(i, j)<<std::endl;
            EXPECT_EQ(60.0, result1.at(i, j));
        }
    }
    mat3.set(0, 0, 6.0);
    mat3.set(1, 0, 6.0);
    EXPECT_THROW({
        try
        {
            adaboost::cuda::core::multiply_gpu(mat1, mat3, result1);
        }
        catch(const std::logic_error& e)
        {
            EXPECT_STREQ("Order of matrices don't match.", e.what());
            throw;
        }
    }, std::logic_error);
}

int main(int ac, char* av[])
{
    testing::InitGoogleTest(&ac, av);
    return RUN_ALL_TESTS();
}

References/Other comments

Refactoring Codebase

Description of the problem

Currently the code base has mixed up CUDA kernels with C++ class and the API is too confusing. Some changes are to be made summarised in the following points,

  • Remove fill method from both Matrix and Vector classes - The first phase of refactoring should include removing fill method from both of these classes and their GPU counter parts and shifting it to operations module. This will avoid kernel calls inside class methods and the APIs for Matrix and Vector will be unambiguous.

  • Shift product and multiply functions to operations module - The reason for doing this are similar to the above.

  • Discuss API for using streams i.e., working on #2. Currently, methods like, Vector.fill decipher from the block_size whether to use GPU or not. Instead separate functions should be used for GPU and CPU with clean APIs.

Example of the problem

References/Other comments

Parallelizing Artifical Neural Networks Using CUDA C

Description of the problem : Perform parallelization of Artificial Neural Networks buy using CUDA C on the MNIST dataset.

Me and my teammate Riddhi Thakker from Team Cutting Edge. Would like to perform this task, to get more familiar to the project,

CUDA libraries are installed even if they aren't built

Description of the problem

libbnn_cuda* files are installed even if they aren't built. See,

adaboost/CMakeLists.txt

Lines 46 to 51 in 5164fea

install(FILES
${CMAKE_BINARY_DIR}/libs/libadaboost_core.so
${CMAKE_BINARY_DIR}/libs/libadaboost_utils.so
${CMAKE_BINARY_DIR}/libs/libadaboost_cuda.so
${CMAKE_BINARY_DIR}/libs/libadaboost_cuda_wrappers.so
DESTINATION ${CMAKE_INSTALL_PREFIX}/lib)

A minor check is needed to fix this,

if(BUILD_CUDA)
    install(...)
endif()

The above is just a hint not a complete solution.

Example of the problem

References/Other comments

[Discussion] Implementation of AdaBoost

Description of the problem

Task is to discuss implementation of both Multi-Class and Two-Class AdaBoost. Paper we have referred to is this: https://web.stanford.edu/~hastie/Papers/samme.pdf

Current Thoughts

  • For the multi-class adaboost we will implement Algorithm 4 in the paper which is SAMME.R

Initial Doubts

  • For two-class adaboost, which algorithm should we be following?
  • We are having some trouble understanding the paper. Especially this line in Algorithm 4:
    image

where the RHS is expanded as:
image

We are not able to understand what the f and the I symbol mean.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.