Giter Site home page Giter Site logo

cmu10-714's Introduction

Hi there 👋

Anurag's GitHub stats

cmu10-714's People

Contributors

pkuflyingpig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cmu10-714's Issues

About conv operation in HW4

Maybe there is a mistake in the conv's gradient function with the following place

X_grad = conv(out_grad, W_permute, padding=K-1-self.padding)

I have a test case with :

  • A's shape: (1, 16, 16, 1)
  • B's shape: (7, 7, 1, 1)
  • stride=3 & padding=2

When I test the above case, there are some errors with

tests/hw4/test_conv.py::test_op_conv[backward-needle.backend_ndarray.ndarray_backend_cuda-Z_shape16-W_shape16-1-0]  
- ValueError: operands could not be broadcast together with shapes (1,16,16,1) (1,17,17,1)

This function may not have considered the situation of a large convolution kernel during implementation, especially when $(H - 2p + k) \pmod s \ne 0$.

May I ask if it‘s possible to check such case and carry out improvement/perfection? Thanks for your reply.

能请教一下关于 adam 内存check 验证失败的问题么

您好,关于 adam 内存验证失败后, 发现无法调试代码,减少Tensor使用,请问您能帮我看看么?

我的代码如下

class Adam(Optimizer):
    def __init__(
        self,
        params,
        lr=0.01,
        beta1=0.9,
        beta2=0.999,
        eps=1e-8,
        weight_decay=0.0,
    ):
        super().__init__(params)
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.eps = eps
        self.weight_decay = weight_decay
        self.t = 0
        from collections import defaultdict
        self.m = defaultdict(float)
        self.v = defaultdict(float)

    def step(self):
        ### BEGIN YOUR SOLUTION
        self.t += 1
        for param in self.params:
            grad = param.grad.detach() + self.weight_decay * param.detach()
            
            self.m[param] = self.beta1 * self.m.get(param, 0) + (1 - self.beta1) * grad
            self.v[param] = self.beta2 * self.v.get(param, 0) + (1 - self.beta2) * (grad ** 2)
            # breakpoint()
            m_t1_hat = self.m[param] / (1 - self.beta1 ** (self.t))
            v_t1_hat = self.v[param] / (1 - self.beta2 ** (self.t))
            
            param.cached_data -= (self.lr * m_t1_hat / ((v_t1_hat ** 0.5) + self.eps)).cached_data

命令行提示如下

========================================================================= test session starts ==========================================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.5.0 -- /home/wplf/miniconda3/bin/python3
cachedir: .pytest_cache
rootdir: /home/wplf/dl-sys/hw2
collected 93 items / 86 deselected / 7 selected                                                                                                                        

tests/hw2/test_nn_and_optim.py::test_optim_adam_1 PASSED                                                                                                         [ 14%]
tests/hw2/test_nn_and_optim.py::test_optim_adam_weight_decay_1 PASSED                                                                                            [ 28%]
tests/hw2/test_nn_and_optim.py::test_optim_adam_batchnorm_1 PASSED                                                                                               [ 42%]
tests/hw2/test_nn_and_optim.py::test_optim_adam_batchnorm_eval_mode_1 PASSED                                                                                     [ 57%]
tests/hw2/test_nn_and_optim.py::test_optim_adam_layernorm_1 PASSED                                                                                               [ 71%]
tests/hw2/test_nn_and_optim.py::test_optim_adam_weight_decay_bias_correction_1 PASSED                                                                            [ 85%]
tests/hw2/test_nn_and_optim.py::test_optim_adam_z_memory_check_1 FAILED                                                                                          [100%]

关于hw4中显存的释放

您好!
我阅读了ndarray_backend_cuda.cu中关于CudaArray的实现

struct CudaArray {
  CudaArray(const size_t size) {
    cudaError_t err = cudaMalloc(&ptr, size * ELEM_SIZE);
    if (err != cudaSuccess) throw std::runtime_error(cudaGetErrorString(err));
    this->size = size;
  }
  ~CudaArray() { cudaFree(ptr); }
  size_t ptr_as_int() { return (size_t)ptr; }
  
  scalar_t* ptr;
  size_t size;
};

CudaArray类在其析构函数中释放显存。那么请问CudaArray类的析构是如何触发的呢?我好像没有在代码中看到有关delete的操作,希望您能指点一下,谢谢!

关于 hw3 中的 cuda matmul 优化, grid 与 block 是否写反?

在调用 matmul kernel 时, 您的代码的 grid = (256, 256, 1), block 是 ( ceil(M/256), ceil(P/256), 1 )
这两个代码变量是不是写反了, 一般情况是 block 设置线程数, 而 grid 设置 有多少个block 数,但在您的程序中刚好相反。

  /// BEGIN YOUR SOLUTION
  dim3 grid(BASE_THREAD_NUM, BASE_THREAD_NUM, 1);
  dim3 block((M + BASE_THREAD_NUM - 1) / BASE_THREAD_NUM, (P + BASE_THREAD_NUM - 1) / BASE_THREAD_NUM, 1);
   MatmulKernel<<<grid, block>>>(a.ptr, b.ptr, out->ptr, M, N, P);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.