Giter Site home page Giter Site logo

nccl通信边界问题? about megatron-llama HOT 10 OPEN

alibaba avatar alibaba commented on September 27, 2024
nccl通信边界问题?

from megatron-llama.

Comments (10)

li-yi-dong avatar li-yi-dong commented on September 27, 2024 1

目前没遇到问题

你们逐位对齐过精度了?还是说DeepSpeed这样的操作多此一举?

据我所知,对齐的问题不影响nccl 集合通信的正确性,可能会影响速度。目前的模型维度以及切分方式下,已经是4byte 对齐的了。

精度逐个OP 以及E2E 与huggingface 的实现对齐过。

from megatron-llama.

li-yi-dong avatar li-yi-dong commented on September 27, 2024 1

精度逐个OP 以及E2E 与huggingface 的实现对齐过

好的好的,感谢。我还有个问题请教下,看你们pr稿写Megatron-LM 的 DistributedOptimizer 实现了ZeRO-2的功能,但是我看代码,好像是实现的ZeRO-1的功能。grad_buffer求和后,没有释放不属于自己rank的梯度。不知道我看的是否准确。PR稿

我也这么觉得

from megatron-llama.

li-yi-dong avatar li-yi-dong commented on September 27, 2024

目前没遇到问题

from megatron-llama.

Baibaifan avatar Baibaifan commented on September 27, 2024

目前没遇到问题

你们逐位对齐过精度了?还是说DeepSpeed这样的操作多此一举?

from megatron-llama.

Baibaifan avatar Baibaifan commented on September 27, 2024

精度逐个OP 以及E2E 与huggingface 的实现对齐过

好的好的,感谢。我还有个问题请教下,看你们pr稿写Megatron-LM 的 DistributedOptimizer 实现了ZeRO-2的功能,但是我看代码,好像是实现的ZeRO-1的功能。grad_buffer求和后,没有释放不属于自己rank的梯度。不知道我看的是否准确。PR稿

from megatron-llama.

Baibaifan avatar Baibaifan commented on September 27, 2024

精度逐个OP 以及E2E 与huggingface 的实现对齐过

好的好的,感谢。我还有个问题请教下,看你们pr稿写Megatron-LM 的 DistributedOptimizer 实现了ZeRO-2的功能,但是我看代码,好像是实现的ZeRO-1的功能。grad_buffer求和后,没有释放不属于自己rank的梯度。不知道我看的是否准确。PR稿

我也这么觉得

需要给写PR稿的小伙子减鸡腿了

from megatron-llama.

yinzhijian avatar yinzhijian commented on September 27, 2024

应该是ZeRO-2,reduce_scatter_grad后就会释放buffer

from megatron-llama.

Baibaifan avatar Baibaifan commented on September 27, 2024

reduce_scatter_grad

方便贴一下代码?我学习一下,谢谢大佬!

from megatron-llama.

yinzhijian avatar yinzhijian commented on September 27, 2024

reduce_scatter_grad

方便贴一下代码?我学习一下,谢谢大佬!

    def _collect_grad(self, param, group_idx):
        bucket = self._bucket_assignment[group_idx].get_param_bucket(param)
        bucket.collect_param_grad(param)

        if bucket.is_all_grad_collected():
            target_buffer = self._param_buffer[group_idx].get_bucket_receiving_buffer(bucket)
            bucket.reduce_scatter_grad(target_buffer)

bucket收集完成所有梯度后,会调用reduce-scatter通信,最后会返还申请的buffer,从而释放梯度(即_grad_buffer)

    @nvtx.annotate("reduce_scatter_grad", color="indigo")
    def reduce_scatter_grad(self, target_buffer):
        assert self.is_all_grad_collected()

        dist.reduce_scatter_tensor(output=target_buffer,
                                            input=self._grad_buffer,
                                            group=self._dp_group,
                                            async_op=False)

        Bucket._grad_buffer_pool.return_buffer(self._borrowed_grad_buffer)
        self._borrowed_grad_buffer = None
        self._grad_buffer = None
        target_buffer.div_(self._num_partitions)

from megatron-llama.

Baibaifan avatar Baibaifan commented on September 27, 2024
_collect_grad

大佬,我和作者说的是“Megatron-LM 的 DistributedOptimizer 实现了ZeRO-1的功能,PR稿内容有错误”,没说Megatron-LLaMA,不知道您是如何理解的,要不您在看看我问的问题?

from megatron-llama.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.