jayavanth / back40computing Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 37.52 MB

Automatically exported from code.google.com/p/back40computing

License: BSD 3-Clause "New" or "Revised" License

Cuda 93.11% Shell 1.16% Makefile 3.35% C++ 2.38% Brainfuck 0.01%

back40computing's People

Contributors

Watchers

Forkers

sree314

back40computing's Issues

Problem Including in Multiple files

Very nice performance with your sorting function!  A 2x boost on all sorting is 
always welcome.  Just looked at the updates and see that you guys have been 
very busy, so I guess this is already on the radar.  I wanted to use it in 
multiple files that are compiled separately, but received linking errors.   

What steps will reproduce the problem?
1. Include the radixsort_api.cu in different files and try to combine them at 
linking time.
   nvcc -c file1.cu
   nvcc -c file2.cu
   nvcc -c main.cu
   nvcc -o main main.o file1.o file2.o

This gives linking errors, when some variables are defined multiple times.
I attached the files, compiler_error.txt with the error I received, and the 
source files. 
Thanks,
Scott

Original issue reported on code.google.com by [email protected] on 11 Aug 2010 at 11:58

Attachments:

Branch FastSortSm20 radix_sort fails to handle floating point key ranges between 1.0 and 2.0

What steps will reproduce the problem?
1. Given a key array of floating points (float or double) and a value array of 
integer. For example: 
  //key[i] = (double)N / (double)i; 
  double key[5]   = [0.0, 5.0, 2.5, 1.66667, 1.25 ] ; 
  int    value[5] = [0, 1, 2, 3, 4] ;

2. Use the radix_sort from Branch FastSortSm20. The key value between 1.xx will 
be greater than any other elements.
  The output is

    key[] = [0, 2.5, 5, 1.25, 1.66667 ]     
  value[] = [0, 2, 1, 4, 3, ]


3. Use the radix_sort in trunk gives the correct result.

    key[] = [0, 1.25, 1.66667, 2.5, 5 ]     
  value[] = [0, 4, 3, 2, 1 ]


What version of the product are you using? On what operating system?

Ubuntu 10.10 x64
CUDA 4.0 (GPU GTX 470)
r893, branches/FastSortSm20
r893, trunk

Please provide any additional information below.

I want to sort the elements in each row vector of a floating poinrt 2D matrix. 
The Entactor::SmallSort() interface in the branch FastSortSm20 seems a good 
fit. Even better, it allows specifying cudaStream in the interface. I want to 
split the row vectors by several streams to utilize the concurrent kernel 
execution. Note that the Compute Capability 2.0 hardware supports 16 concurrent 
kernels.

Original issue reported on code.google.com by [email protected] on 3 Jul 2012 at 10:26

BFS taking too much time

What steps will reproduce the problem?
1.Do make in test/bfs folder 
2.Run test_bfs_5.0_i386 random 32 128 --v --undirected 

What is the expected output? What do you see instead?
Expected output should be BFS traversal of above graph but it is displaying 
some partial output as follows:

Using device 0: Tesla C2070
  Selecting 128 undirected random edges in COO format... Done selecting (0s).
  Converting 32 vertices, 256 edges (unordered rows) to CSR format... Done converting (0s).

Degree Histogram (32 vertices, 256 directed edges):
    Degree 2^-1: 0 (0.00%)
    Degree 2^0: 0 (0.00%)
    Degree 2^1: 2 (6.25%)
    Degree 2^2: 13 (40.62%)
    Degree 2^3: 17 (53.12%)

Running non-instrumented distance-marking copied-to-device tests...

---------------------------------------------------------------
Work Histogram:
Depth, Expanded, Unique-Expanded, Discovered
0, 1, 1, 1
1, 7, 5, 5
2, 42, 22, 19
3, 158, 30, 7


Warmup iteration: 0.000 ms

GPU 0 source path: 32 elements (128 bytes)
GPU 0 collision mask: 4 elements (4 bytes)
GPU 0 queue sizes: compact 332 elements (1328 bytes), expand 332 elements (1328 
bytes)

BFS min occupancy 8, level-grid size 112
Warmup iteration: 1.532 ms


BFS min occupancy 8, level-grid size 112
Warmup iteration: 0.579 ms


BFS expand min occupancy 8, level-grid size 112
BFS compact min occupancy 8, level-grid size 112
Warmup iteration: 0.986 ms


BFS one_phase min occupancy 8, level-grid size 112
BFS expand min occupancy 8, level-grid size 112
BFS compact min occupancy 8, level-grid size 112



 and after this no output for long time. Graph is small but still why it is taking so much time? Am I missing something?
I'm using linux and recent version of product.

Original issue reported on code.google.com by [email protected] on 25 Aug 2012 at 12:05

sort enactor does not accept unsigned int problem size specifiaction

What steps will reproduce the problem?
1. If N is a const unsigned int, this will not work
sort_enactor.Sort(sort_storage, N);

2.  This will work
sort_enactor.Sort(sort_storage, (int) N);
3.

What is the expected output? What do you see instead?
I get unspecified launch failures in the first case

What version of the product are you using? On what operating system?
r603 Ubuntu Linux 11.04

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 10 Jul 2011 at 9:54

Incorrect sorting on large arrays when doubles are all integral.

What steps will reproduce the problem?
1. Perform a sort of sufficient size that LARGE_SORT is used.  Keys-only or 
keys with values will reproduce this issue.
2. Have the keys be doubles, but all of them be integers.
3. A basic example I use is the natural numbers decreasing from 1000000 to 1.

What is the expected output? What do you see instead?

I expect to see the numbers 1 through to 1000000 in increasing order.  I 
instead see negative numbers for the sorted keys.

What version of the product are you using? On what operating system?

Version v1.0.655 (SVN r655).  Windows 8 64-bit, targeting 64-bit, 
compute_13,sm_13.  Running on NVidia GEForce GTX 580.


Please provide any additional information below.

I have an (inelegant) workaround of multiplying all numbers by some number 
(7.76345621464357) before sorting, and dividing again afterwards, for problem 
sizes over 100,000.  This is of course not ideal - speed, roundoff, etc :).

Original issue reported on code.google.com by [email protected] on 18 Feb 2013 at 9:00

Incorrect results with (key,value) = (unsigned long long,unsigned int)

What steps will reproduce the problem?
1. sorting the particular input sequence of unsigned long long values in file 
input_data gives incorrect output (It works on most other input however, I 
include one example working_data) 
2. Compiling and running the included sort_by_key.cu should reproduce

What is the expected output? What do you see instead?
I sorted them using thrust v1.2.1 as well, output is in the files
Expected:
  thrust_data
  thrust_indices
Received:
  b40c_data
  b40c_indices


What version of the product are you using? On what operating system?
Using rv208 of b40c and thrust v 1.2.1 compiled on a 64 bit linux machine, with 
a C2050 GPU and using nvcc 3.1.  
I tried 
  a) nvcc -O2 -arch=sm_20 -o sort-test sort_by_key.cu
  b) nvcc -o sort-test sort_by_key.cu

Original issue reported on code.google.com by [email protected] on 15 Aug 2010 at 5:55

Attachments:

kernel launches from templates are not allowed in system files

i run it on GTX560 on Ubuntu10.10
when i try to make,it gives errors，such as
/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(48
1): error: kernel launches from templates are not allowed in system files
/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(48
8): error: kernel launches from templates are not allowed in system files

/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(50
4): error: kernel launches from templates are not allowed in system files

/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(51
7): error: kernel launches from templates are not allowed in system files

/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(52
1): error: kernel launches from templates are not allowed in system files

i find the first error before occured in:
    if ((_device_sm_version == 130) && (_work_decomposition.num_elements > static_cast<unsigned int>(_device_props.multiProcessorCount * _cycle_elements * 2))) { 
        FlushKernel<void><<<_grid_size, B40C_RADIXSORT_THREADS, scan_scatter_attrs.sharedSizeBytes>>>();
        synchronize_if_enabled("FlushKernel");
    }
can you give me some help about it?
thx

Original issue reported on code.google.com by [email protected] on 27 Nov 2011 at 5:33

EarlyExitRadixSortingEnactor bug

The following code that uses v270 (compiled under Windows 7) works fine with 
SM20, but not with SM12:
http://encode.ru/attachment.php?attachmentid=1488&d=1297084367

Replacing
EarlyExitRadixSortingEnactor<K, V> sorting_enactor;
with
SingleGridRadixSortingEnactor<K, V> sorting_enactor;
solves the problem

[email protected]

Original issue reported on code.google.com by [email protected] on 9 Feb 2011 at 11:54

a bug in enactor_cull.cuh

What steps will reproduce the problem?
1. "make cull", and you will get many compiler errors, but they can be solved 
easily. (most of them seem to be caused by the parameters of function calls to 
the underlying function don't match with their declaration) 
2. after solve the compile errors，
run "microbench_bfs_5.0_x86_64 grid2d 5000 --src=randomize --i=50 --quick 
--device=1 --queue-sizing=0.5"
3. you will get an error "illegal addr", because the space of d_filter_mask is 
not allocated.

Original issue reported on code.google.com by [email protected] on 28 Mar 2013 at 4:29

simple_sort.cu is not working in revision r648

Testcases fails with flowing errors:

Using device 0: GeForce 8800 GTS 512
Simple key-value sort: INCORRECT: [0]: 102400 != 4128
Small-problem key-value sort: INCORRECT: [0]: 102400 != 4128
Small-problem restricted-range key-value sort: INCORRECT: [0]: 47840 != 4128

Original issue reported on code.google.com by [email protected] on 18 Aug 2011 at 6:05

add make document please

What steps will reproduce the problem?
1. go to ../test/bfs
2. make

What is the expected output? What do you see instead?
../../b40c/partition/upsweep/kernel_policy.cuh(110): error: identifier 
"CUDA_ARCH" is undefined
../../b40c/partition/downsweep/kernel_policy.cuh(122): error: identifier 
"CUDA_ARCH" is undefined


What version of the product are you using? On what operating system?
Version v1.0.655 (SVN r655)
NVCC  release 4.0, V0.2.1221

Please provide any additional information below.
ubuntu 11.04
GTS 450

Original issue reported on code.google.com by [email protected] on 28 Nov 2011 at 3:16

compiler warning

What steps will reproduce the problem?

code snippet:

    b40c::radix_sort::Enactor sort_enactor;
    b40c::util::PingPongStorage<unsigned int, Scalar4> sort_storage(d_keys,d_values);
    sort_enactor.Sort(sort_storage, N);


What is the expected output? What do you see instead?
I get lots of compiler warnings

.../b40c/radix_sort/enactor.cuh:529:40: warning: suggest parentheses around 
assignment used as truth value

also for similar lines in enactor_base.h  etc.

What version of the product are you using? On what operating system?
r603 

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 9 Jul 2011 at 3:21

jayavanth / back40computing Goto Github PK

back40computing's People

Contributors

Watchers

Forkers

back40computing's Issues

Problem Including in Multiple files

Branch FastSortSm20 radix_sort fails to handle floating point key ranges between 1.0 and 2.0

BFS taking too much time

sort enactor does not accept unsigned int problem size specifiaction

Incorrect sorting on large arrays when doubles are all integral.

Incorrect results with (key,value) = (unsigned long long,unsigned int)

kernel launches from templates are not allowed in system files

EarlyExitRadixSortingEnactor bug

a bug in enactor_cull.cuh

simple_sort.cu is not working in revision r648

add make document please

compiler warning

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent