codezonediitj / adaboost Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 16.0 110 KB

Implementation of AdaBoost Algorithm

License: Other

CMake 1.72% C++ 97.75% Cuda 0.53%

boost-python cpp11 cuda google-test hacktoberfest machine-learning-algorithms

adaboost's People

Contributors

Stargazers

Watchers

Forkers

czgdp1807 bits2zbytes akankshasehrawat arushisinghal shubhidua sidrakshe28 flamefly18 fiza11 apekshamanchanda kirti-del vi1i sreshu tanvi141 icodein 2537369758

adaboost's Issues

Complete documentation

Description of the problem

Documentation for some classes and functions isn't complete in .hpp files. The issue can be resolved by adding docstrings to all the missing places.

Example of the problem

References/Other comments

Adding support for Windows

Description of the problem

There isn't any documentation in README for building the project and working with it on Windows.

Example of the problem

References/Other comments

Using non-default streams in CUDA

Description of the problem

Currently, in #6 all the operations are issued to default stream. However, I was thinking that we can use non-default streams for issuing various kernels to different operations for their parallel execution.
An example of such a situation is filling n vectors parallelly with fill_vector_kernel launched in n separate streams. In fact, one more example can be to fill n*m matrix with n or m kernels launched in separate streams.
Before moving on to the implementation we can discuss the API for the above use case.
Please comment below if you have thought of something. I will come up with the design soon.
One more advantage of using non-default streams as claimed by https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/ is the overlap of data transfers and kernel execution. However IMO, this isn't really useful for this library because it may be the case that user wants to copy back only small sized Vector to host and for that wasting time in creating streams isn't a good idea.

Example of the problem

Design Discussions

Description of the problem

This issue aims at discussing the design of the software that is going to be developed covering the below topics,

File structure
User facing APIs
Class design
Hardware requirements(mainly GPUs)

I will try to come up with the first one ASAP. However, if you already prepared something then let us know in the comments.

Example of the problem

References/Other comments

We will follow https://web.stanford.edu/~hastie/Papers/samme.pdf
If you have something to suggest which can be used in the project then let us know.

Removing unnecessary lines from .travis.yml and fixing CMakeLists.txt

Description of the problem

The following line is not needed,

adaboost/.travis.yml

Line 13 in 5df11ee

- ls -a

The following should be changed to use 1.10.0 release

adaboost/CMakeLists.txt

Lines 23 to 31 in 5df11ee

    
           if(INSTALL_GOOGLETEST AND BUILD_TESTS) 
        
               include(${CMAKE_ROOT}/Modules/ExternalProject.cmake) 
        
               ExternalProject_Add(googletest 
        
               GIT_REPOSITORY https://github.com/google/googletest 
        
               GIT_TAG master 
        
               SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/googletest-src" 
        
               BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/googletest-build" 
        
               TEST_COMMAND "") 
        
           endif()

Example of the problem

References/Other comments

Boost.Python or SWIG or PyBindGen

Description of the problem

I was just exploring options to generate Python wrappers for C++ code and came across PyBindGen . SWIG is fine but I think PyBindGen is more logical to use.
What do you say?

I was going through the Boost.Python as well and I believe that this is much better than both of the above alternatives due to the greatest amount of content available on the web.

Example of the problem

References/Other comments

[1] https://pybindgen.readthedocs.io/en/latest/tutorial/#what-is-pybindgen

Bug in matrix multiplication using shared memory

Description of the problem

The following code doesn't give correct results for matrix multiplication using shared memory,

template <class data_type_matrix>
            __device__
            data_type_matrix get_element(
            data_type_matrix* mat,
            unsigned row,
            unsigned col,
            unsigned stride)
            {
                return mat[row*stride+col];
            }

            template <class data_type_matrix>
            __device__
            void set_element(
            data_type_matrix* mat,
            unsigned row,
            unsigned col,
            data_type_matrix value,
            unsigned stride)
            {
                mat[row*stride+col] = value;
            }

            template <class data_type_matrix>
            __device__
            data_type_matrix* get_sub_matrix(
            data_type_matrix* mat,
            unsigned block_row,
            unsigned block_col,
            unsigned stride)
            {
                data_type_matrix* mat_sub =
                new data_type_matrix[BLOCK_SIZE*BLOCK_SIZE];
                mat_sub = &mat[stride*BLOCK_SIZE*block_row+BLOCK_SIZE*block_col];
                return mat_sub;
            }

            template <class data_type_matrix>
            __global__
            void multiply_kernel(
            data_type_matrix* mat1,
            data_type_matrix* mat2,
            data_type_matrix* result,
            unsigned mat1_cols,
            unsigned mat2_cols,
            unsigned result_cols)
            {
                unsigned block_row = blockIdx.y;
                unsigned block_col = blockIdx.x;
                data_type_matrix* result_sub = get_sub_matrix(result, block_row,
                                                              block_col, result_cols);

                unsigned row = threadIdx.y;
                unsigned col = threadIdx.x;

                for(unsigned m = 0; m < (mat1_cols + BLOCK_SIZE - 1)/BLOCK_SIZE; m++)
                {
                    data_type_matrix* mat1_sub = get_sub_matrix(mat1, block_row,
                                                                m, mat1_cols);
                    data_type_matrix* mat2_sub = get_sub_matrix(mat2, m,
                                                                block_col, mat2_cols);

                    __shared__ data_type_matrix mat1_shared[BLOCK_SIZE][BLOCK_SIZE];
                    __shared__ data_type_matrix mat2_shared[BLOCK_SIZE][BLOCK_SIZE];

                    mat1_shared[row][col] = get_element(mat1_sub, row, col, mat1_cols);
                    mat2_shared[row][col] = get_element(mat2_sub, row, col, mat2_cols);

                    data_type_matrix cvalue = 0.0;

                    __syncthreads();

                    for(unsigned e = 0; e < BLOCK_SIZE; e++)
                    {
                        cvalue += mat1_shared[row][e] * mat2_shared[e][col];
                    }

                    __syncthreads();

                    set_element(result_sub, row, col, cvalue, result_cols);
                }
            }

            template <class data_type_matrix>
            void multiply_gpu(const MatrixGPU<data_type_matrix>& mat1,
                              const MatrixGPU<data_type_matrix>& mat2,
                              MatrixGPU<data_type_matrix>& result)
            {
                adaboost::utils::check(mat1.get_cols() == mat2.get_rows(),
                                       "Order of matrices don't match.");
                dim3 gridDim((mat2.get_cols() + BLOCK_SIZE)/BLOCK_SIZE,
                             (mat1.get_rows() + BLOCK_SIZE)/BLOCK_SIZE);
                dim3 blockDim(BLOCK_SIZE, BLOCK_SIZE);
                multiply_kernel
                <<<gridDim, blockDim>>>
                (mat1.get_data_pointer(),
                 mat2.get_data_pointer(),
                 result.get_data_pointer(),
                 mat1.get_cols(),
                 mat2.get_cols(),
                 result.get_cols());
            }

Example of the problem

#include<gtest/gtest.h>
#include<string>
#include<adaboost/cuda/cuda_data_structures.hpp>
#include<adaboost/utils/cuda_wrappers.hpp>
#include<stdexcept>

TEST(Cuda, MatricesGPU)
{
    adaboost::utils::cuda::cuda_event_t has_happened;
    adaboost::utils::cuda::cuda_event_create(&has_happened);
    adaboost::cuda::core::MatrixGPU<float> mat_f;
    EXPECT_EQ(0, mat_f.get_cols())<<"Number of columns should be 0";
    EXPECT_EQ(0, mat_f.get_rows())<<"Number of rows should be 0.";
    adaboost::cuda::core::MatrixGPU<float> mat1(3, 3), mat2(3, 3), mat3(2, 1);
    mat1.fill(4.0);
    mat2.fill(5.0);
    mat1.copy_to_device();
    mat2.copy_to_device();
    adaboost::utils::cuda::cuda_event_record(has_happened);
    adaboost::utils::cuda::cuda_event_synchronize(has_happened);
    adaboost::cuda::core::MatrixGPU<float> result1(3, 3);
    adaboost::cuda::core::multiply_gpu(mat1, mat2, result1);
    adaboost::utils::cuda::cuda_event_record(has_happened);
    adaboost::utils::cuda::cuda_event_synchronize(has_happened);
    result1.copy_to_host();
    for(unsigned int i = 0; i < 3; i++)
    {
        for(unsigned int j = 0; j < 3; j++)
        {
            std::cout<<i<<" "<<j<<" "<<result1.at(i, j)<<std::endl;
            EXPECT_EQ(60.0, result1.at(i, j));
        }
    }
    mat3.set(0, 0, 6.0);
    mat3.set(1, 0, 6.0);
    EXPECT_THROW({
        try
        {
            adaboost::cuda::core::multiply_gpu(mat1, mat3, result1);
        }
        catch(const std::logic_error& e)
        {
            EXPECT_STREQ("Order of matrices don't match.", e.what());
            throw;
        }
    }, std::logic_error);
}

int main(int ac, char* av[])
{
    testing::InitGoogleTest(&ac, av);
    return RUN_ALL_TESTS();
}

References/Other comments

Refactoring Codebase

Description of the problem

Currently the code base has mixed up CUDA kernels with C++ class and the API is too confusing. Some changes are to be made summarised in the following points,

Remove fill method from both Matrix and Vector classes - The first phase of refactoring should include removing fill method from both of these classes and their GPU counter parts and shifting it to operations module. This will avoid kernel calls inside class methods and the APIs for Matrix and Vector will be unambiguous.
Shift product and multiply functions to operations module - The reason for doing this are similar to the above.
Discuss API for using streams i.e., working on #2. Currently, methods like, Vector.fill decipher from the block_size whether to use GPU or not. Instead separate functions should be used for GPU and CPU with clean APIs.

Example of the problem

References/Other comments

Parallelizing Artifical Neural Networks Using CUDA C

Description of the problem : Perform parallelization of Artificial Neural Networks buy using CUDA C on the MNIST dataset.

Me and my teammate Riddhi Thakker from Team Cutting Edge. Would like to perform this task, to get more familiar to the project,

CUDA libraries are installed even if they aren't built

Description of the problem

libbnn_cuda* files are installed even if they aren't built. See,

adaboost/CMakeLists.txt

Lines 46 to 51 in 5164fea

    
           install(FILES 
        
                   ${CMAKE_BINARY_DIR}/libs/libadaboost_core.so 
        
                   ${CMAKE_BINARY_DIR}/libs/libadaboost_utils.so 
        
                   ${CMAKE_BINARY_DIR}/libs/libadaboost_cuda.so 
        
                   ${CMAKE_BINARY_DIR}/libs/libadaboost_cuda_wrappers.so 
        
                   DESTINATION ${CMAKE_INSTALL_PREFIX}/lib)

A minor check is needed to fix this,

if(BUILD_CUDA)
    install(...)
endif()

The above is just a hint not a complete solution.

Example of the problem

References/Other comments

[Discussion] Implementation of AdaBoost

Description of the problem

Task is to discuss implementation of both Multi-Class and Two-Class AdaBoost. Paper we have referred to is this: https://web.stanford.edu/~hastie/Papers/samme.pdf

Current Thoughts

For the multi-class adaboost we will implement Algorithm 4 in the paper which is SAMME.R

Initial Doubts

For two-class adaboost, which algorithm should we be following?
We are having some trouble understanding the paper. Especially this line in Algorithm 4:

where the RHS is expanded as:

We are not able to understand what the f and the I symbol mean.

	if(INSTALL_GOOGLETEST AND BUILD_TESTS)
	include(${CMAKE_ROOT}/Modules/ExternalProject.cmake)
	ExternalProject_Add(googletest
	GIT_REPOSITORY https://github.com/google/googletest
	GIT_TAG master
	SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/googletest-src"
	BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/googletest-build"
	TEST_COMMAND "")
	endif()

	install(FILES
	${CMAKE_BINARY_DIR}/libs/libadaboost_core.so
	${CMAKE_BINARY_DIR}/libs/libadaboost_utils.so
	${CMAKE_BINARY_DIR}/libs/libadaboost_cuda.so
	${CMAKE_BINARY_DIR}/libs/libadaboost_cuda_wrappers.so
	DESTINATION ${CMAKE_INSTALL_PREFIX}/lib)

codezonediitj / adaboost Goto Github PK

adaboost's People

Contributors

Stargazers

Watchers

Forkers

adaboost's Issues

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Example of the problem

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Example of the problem

References/Other comments

Description of the problem

Current Thoughts

Initial Doubts

Recommend Projects

Recommend Topics

Recommend Org