-
🔭 I’m currently working on:
-
🚀 Some old projects:
- CUDA Freshman is a repository for learning CUDA and some blogs had been wroten in Chinese
- DIPpro is a repository for learning digital image processes and some blogs had been wroten in Chinese
-
🌱 I’m currently learning reinforcement learning algorithms and some mathematics subjects like topology, Differential Geometry, Functional Analysis e.t.c.
cuda_freshman's Introduction
cuda_freshman's People
Forkers
oftenliu tianxingyzxq lsclone booool xia00100 damilytutu yyyqy outmanwt 906527105 szqxx so2bin xiaoxiaotao yishengcheng 2251713364 zhouhaocomeon1 buddhisant fanghao6666 zhuangbility111 wwd605075811 git-nibird thuliusj lhyxx zkwalt neodai liuxubit llljun phny phnyhyl taotaolin missximon 18369674616 lijiunderstand mlbo cufer luckmoon pinery-sls ginkgo-cheung zyf12389 dltensor sarsigmadelta niuliling123 zhenlin-work hwscut haochenye wangcongbme juno119 hx2009302823 lebronhe yz-27 royzon kuozhang eyxxxxx smallflyfly lightsalt2011 neineit aliang-ai bmfire1 y-hann kylewu11 cqray1990 yfeng-44 qtguo uniwangwang tangzhiyi11 oldify herolin12 lbboier billxw haoran-001 xiongjun19 leviome learnpythontheew xiaoyu1004 perfcv tianzhao-007 sustcsonglin adam1iu yolunghiu zivzone amanda-barbara xiangchunyang code-fool lemon-lm pzw520125 sowhat1 bingooyang studytutorials duzhiqiang2019 jiatongdu fffzlfk zyzzu gjhan3 stevengu999 cc1019054695 xuxingxian liu-rj doorteeth yangtze736 gongzhanli feizhouxiaozhucuda_freshman's Issues
代码5有误
代码5中,使用到cpu printMatrix代码中有误,应将C[j] 改为 ic[j]:
void printMatrix(float * C,const int nx,const int ny)
{
float *ic=C;
printf("Matrix<%d,%d>:",ny,nx);
for(int i=0;i<ny;i++)
{
for(int j=0;j<nx;j++)
{
printf("%6f ",ic[j]); // change C[j] -> ic[j]
}
ic+=nx;
printf("\n");
}
}
代码5中,使用到gpu printThreadIndex代码中有误,应将最后的格式化%d 改为 %f, 否则输出全为0:
global void printThreadIndex(float *A, const int nx, const int ny){
int ix = threadIdx.x + blockIdx.x * blockDim.x;
int iy = threadIdx.y + blockIdx.y * blockDim.y;
unsigned int idx = iy * nx + ix;
printf("thread_id(%d,%d) block_id(%d,%d) coordinate(%d,%d)"
"global index %2d ival %2f\n",threadIdx.x,threadIdx.y, // change %2d -> %2f
blockIdx.x,blockIdx.y,ix,iy,idx,A[idx]);
}
9_sum_matrix2D报错
https://github.com/Tony-Tan/CUDA_Freshman/tree/master/9_sum_matrix2D
报错:0x00007FF760F91640 处(位于 SumMat2D.exe 中)引发的异常: 0xC0000005: 读取位置 0x000001C80E020000 时发生访问冲突。
build
博客内容有误[2.2]
Hi Tony,
很感谢您的分享。此处指出博客中一个可能的错误。
博客链接:https://face2ai.com/CUDA-F-2-2-%E6%A0%B8%E5%87%BD%E6%95%B0%E8%AE%A1%E6%97%B6/
在此节中,分析不完整块用时会有“滑铁卢”时,数据大小本该为(1 << 24 )+ 1, 但从您的输出打印中看到数据大小实则为 1<<(24 + 1) = 33,554,432。 所以耗时将近为2倍,这个数据可能是有误的。
Regard,
Juncfang
why this can not be compiled success?
/**
- Copyright 1993-2012 NVIDIA Corporation. All rights reserved.
- Please refer to the NVIDIA end user license agreement (EULA) associated
- with this source code for terms and conditions that govern your use of
- this software. Any use, reproduction, disclosure, or distribution of
- this software and related documentation outside the terms of the EULA
- is strictly prohibited.
*/
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <cuda_runtime_api.h>
static const int WORK_SIZE = 256;
/**
- This macro checks return value of the CUDA runtime call and exits
- the application if the call failed.
- See cuda.h for error code descriptions.
*/
#define CHECK_CUDA_RESULT(N) {
CUresult result = N;
if (result != 0) {
printf("CUDA call on line %d returned error %d\n", LINE,
result);
exit(1);
} }
int main(int argc, char **argv)
{
CUmodule module;
CUcontext context;
CUdevice device;
CUdeviceptr deviceArray;
CUfunction process;
void *kernelArguments[] = { &deviceArray };
int deviceCount;
unsigned int idata[WORK_SIZE], odata[WORK_SIZE];
for (int i = 0; i < WORK_SIZE; ++i) {
idata[i] = i;
}
CHECK_CUDA_RESULT(cuInit(0));
CHECK_CUDA_RESULT(cuDeviceGetCount(&deviceCount));
if (deviceCount == 0) {
printf("No CUDA-compatible devices found\n");
exit(1);
}
CHECK_CUDA_RESULT(cuDeviceGet(&device, 0));
CHECK_CUDA_RESULT(cuCtxCreate(&context, 0, device));
CHECK_CUDA_RESULT(cuModuleLoad(&module, "bitreverse.fatbin"));
CHECK_CUDA_RESULT(cuModuleGetFunction(&process, module, "bitreverse"));
CHECK_CUDA_RESULT(cuMemAlloc(&deviceArray, sizeof(int) * WORK_SIZE));
CHECK_CUDA_RESULT(
cuMemcpyHtoD(deviceArray, idata, sizeof(int) * WORK_SIZE));
CHECK_CUDA_RESULT(
cuLaunchKernel(process, 1, 1, 1, WORK_SIZE, 1, 1, 0, NULL, kernelArguments, NULL));
CHECK_CUDA_RESULT(
cuMemcpyDtoH(odata, deviceArray, sizeof(int) * WORK_SIZE));
for (int i = 0; i < WORK_SIZE; ++i) {
printf("Input value: %u, output value: %u\n", idata[i], odata[i]);
}
CHECK_CUDA_RESULT(cuMemFree(deviceArray));
CHECK_CUDA_RESULT(cuCtxDestroy(context));
return 0;
}
:对‘cuDeviceGetCount’未定义的引用
HSigmoid.cu:58:对‘cuModuleLoad’未定义的引用
/cudaHelloworld/src/HSigmoid.cu:59:对‘cuModuleGetFunction’未定义的引用
cudaHelloworld/src/HSigmoid.cu:61:对‘cuMemAlloc_v2’未定义的引用
cudaHelloworld/src/HSigmoid.cu:62:对‘cuMemcpyHtoD_v2’未定义的引用
cudaHelloworld/src/HSigmoid.cu:68:对‘cuMemcpyDtoH_v2’未定义的引用
cudaHelloworld/src/HSigmoid.cu:75:对‘cuMemFree_v2’未定义的引用
/cudaHelloworld/src/HSigmoid.cu:76:对‘cuCtxDestroy_v2’未定义的引用
引入的文件头顺序问题 导致无法正常编译((.text+0x20): undefined reference to `main')
22 transform_matrix2D 代码基本上都是错误的
转置方法0和1代码能运行成功是因为
int nx = 289;
int ny = 289;
原文设置成了1<<12,nx和ny都是2的倍数才可以转置成功。方法5也有同样的问题。
在switch里case4 和case5执行的函数一样,是笔误。
在最后比较cpu和gpu的输出时,使用了同样的两个指针比较,所有方法结果都是正确的,这里也是一个笔误。希望作者更新一下。
博客打不开了。
貌似因为你更新了个人主页的原因。 We're having a really bad day.
The Unicorns have taken over. We're doing our best to get them under control and get GitHub back up and running.
代码 6_sum_matrix 有错误
6_sum_matrix
sumMatrix() 函数:
idx=ix+iy * ny; 应该改成
idx=ix+iy * nx;
例10中reduceNeighbored函数的边界检测
if (tid >= n) return;
这个地方好像有点问题
hello world无法输出核函数内部的打印信息
下载完代码,编译后,执行hello_world核函数内部的日志未输出,
cudaDeviceReset();//if no this line ,it can not output hello world from gpu
cudaDeviceSynchronize();
添加这两个都不起作用。
打印matrix有笔误
CUDA_Freshman/include/freshman.h
Line 74 in e33a9d6
好像有一行不起作用的代码.
6_sum_matrix/sum_matrix.cu : 64
cudaMemcpy(C_from_gpu,C_dev,nBytes,cudaMemcpyDeviceToHost);
这行代码没有用吧?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.