tony-tan / cuda_freshman Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 407.0 148 KB

CMake 2.63% Cuda 95.63% C 1.62% Makefile 0.12%

cuda_freshman's Introduction

Hi there 👋

🔭 I’m currently working on:
- Reinforcement Learning
🚀 Some old projects:
- CUDA Freshman is a repository for learning CUDA and some blogs had been wroten in Chinese
- DIPpro is a repository for learning digital image processes and some blogs had been wroten in Chinese
🌱 I’m currently learning reinforcement learning algorithms and some mathematics subjects like topology, Differential Geometry, Functional Analysis e.t.c.

cuda_freshman's People

Contributors

Stargazers

Watchers

Forkers

oftenliu tianxingyzxq lsclone booool xia00100 damilytutu yyyqy outmanwt 906527105 szqxx so2bin xiaoxiaotao yishengcheng 2251713364 zhouhaocomeon1 buddhisant fanghao6666 zhuangbility111 wwd605075811 git-nibird thuliusj lhyxx zkwalt neodai liuxubit llljun phny phnyhyl taotaolin missximon 18369674616 lijiunderstand mlbo cufer luckmoon pinery-sls ginkgo-cheung zyf12389 dltensor sarsigmadelta niuliling123 zhenlin-work hwscut haochenye wangcongbme juno119 hx2009302823 lebronhe yz-27 royzon kuozhang eyxxxxx smallflyfly lightsalt2011 neineit aliang-ai bmfire1 y-hann kylewu11 cqray1990 yfeng-44 qtguo uniwangwang tangzhiyi11 oldify herolin12 lbboier billxw haoran-001 xiongjun19 leviome learnpythontheew xiaoyu1004 perfcv tianzhao-007 sustcsonglin adam1iu yolunghiu zivzone amanda-barbara xiangchunyang code-fool lemon-lm pzw520125 sowhat1 bingooyang studytutorials duzhiqiang2019 jiatongdu fffzlfk zyzzu gjhan3 stevengu999 cc1019054695 xuxingxian liu-rj doorteeth yangtze736 gongzhanli feizhouxiaozhu

cuda_freshman's Issues

代码5有误

代码5中，使用到cpu printMatrix代码中有误，应将C[j] 改为 ic[j]：
void printMatrix(float * C,const int nx,const int ny)
{
float *ic=C;
printf("Matrix<%d,%d>:",ny,nx);
for(int i=0;i<ny;i++)
{
for(int j=0;j<nx;j++)
{
printf("%6f ",ic[j]); // change C[j] -> ic[j]
}
ic+=nx;
printf("\n");
}
}

代码5中，使用到gpu printThreadIndex代码中有误，应将最后的格式化%d 改为 %f, 否则输出全为0：
global void printThreadIndex(float *A, const int nx, const int ny){
int ix = threadIdx.x + blockIdx.x * blockDim.x;
int iy = threadIdx.y + blockIdx.y * blockDim.y;
unsigned int idx = iy * nx + ix;
printf("thread_id(%d,%d) block_id(%d,%d) coordinate(%d,%d)"
"global index %2d ival %2f\n",threadIdx.x,threadIdx.y, // change %2d -> %2f
blockIdx.x,blockIdx.y,ix,iy,idx,A[idx]);
}

9_sum_matrix2D报错

https://github.com/Tony-Tan/CUDA_Freshman/tree/master/9_sum_matrix2D

报错：0x00007FF760F91640 处(位于 SumMat2D.exe 中)引发的异常: 0xC0000005: 读取位置 0x000001C80E020000 时发生访问冲突。

build

博客内容有误[2.2]

Hi Tony,
很感谢您的分享。此处指出博客中一个可能的错误。

博客链接：https://face2ai.com/CUDA-F-2-2-%E6%A0%B8%E5%87%BD%E6%95%B0%E8%AE%A1%E6%97%B6/

在此节中，分析不完整块用时会有“滑铁卢”时，数据大小本该为(1 << 24 )+ 1, 但从您的输出打印中看到数据大小实则为 1<<(24 + 1) = 33,554,432。所以耗时将近为2倍，这个数据可能是有误的。

Regard，
Juncfang

why this can not be compiled success?

/**

Please refer to the NVIDIA end user license agreement (EULA) associated
with this source code for terms and conditions that govern your use of
this software. Any use, reproduction, disclosure, or distribution of
this software and related documentation outside the terms of the EULA
is strictly prohibited.
*/
#include <stdio.h>
#include <stdlib.h>

#include <cuda.h>
#include <cuda_runtime_api.h>

static const int WORK_SIZE = 256;

/**

This macro checks return value of the CUDA runtime call and exits
the application if the call failed.
See cuda.h for error code descriptions.
*/
#define CHECK_CUDA_RESULT(N) {
CUresult result = N;
if (result != 0) {
printf("CUDA call on line %d returned error %d\n", LINE,
result);
exit(1);
} }

int main(int argc, char **argv)
{
CUmodule module;
CUcontext context;
CUdevice device;
CUdeviceptr deviceArray;
CUfunction process;

void *kernelArguments[] = { &deviceArray };
int deviceCount;
unsigned int idata[WORK_SIZE], odata[WORK_SIZE];

for (int i = 0; i < WORK_SIZE; ++i) {
	idata[i] = i;
}

CHECK_CUDA_RESULT(cuInit(0));
CHECK_CUDA_RESULT(cuDeviceGetCount(&deviceCount));
if (deviceCount == 0) {
	printf("No CUDA-compatible devices found\n");
	exit(1);
}
CHECK_CUDA_RESULT(cuDeviceGet(&device, 0));
CHECK_CUDA_RESULT(cuCtxCreate(&context, 0, device));

CHECK_CUDA_RESULT(cuModuleLoad(&module, "bitreverse.fatbin"));
CHECK_CUDA_RESULT(cuModuleGetFunction(&process, module, "bitreverse"));

CHECK_CUDA_RESULT(cuMemAlloc(&deviceArray, sizeof(int) * WORK_SIZE));
CHECK_CUDA_RESULT(
		cuMemcpyHtoD(deviceArray, idata, sizeof(int) * WORK_SIZE));

CHECK_CUDA_RESULT(
		cuLaunchKernel(process, 1, 1, 1, WORK_SIZE, 1, 1, 0, NULL, kernelArguments, NULL));

CHECK_CUDA_RESULT(
		cuMemcpyDtoH(odata, deviceArray, sizeof(int) * WORK_SIZE));

for (int i = 0; i < WORK_SIZE; ++i) {
	printf("Input value: %u, output value: %u\n", idata[i], odata[i]);
}

CHECK_CUDA_RESULT(cuMemFree(deviceArray));
CHECK_CUDA_RESULT(cuCtxDestroy(context));

return 0;

}
：对‘cuDeviceGetCount’未定义的引用
HSigmoid.cu:58：对‘cuModuleLoad’未定义的引用
/cudaHelloworld/src/HSigmoid.cu:59：对‘cuModuleGetFunction’未定义的引用
cudaHelloworld/src/HSigmoid.cu:61：对‘cuMemAlloc_v2’未定义的引用
cudaHelloworld/src/HSigmoid.cu:62：对‘cuMemcpyHtoD_v2’未定义的引用
cudaHelloworld/src/HSigmoid.cu:68：对‘cuMemcpyDtoH_v2’未定义的引用
cudaHelloworld/src/HSigmoid.cu:75：对‘cuMemFree_v2’未定义的引用
/cudaHelloworld/src/HSigmoid.cu:76：对‘cuCtxDestroy_v2’未定义的引用