Giter Site home page Giter Site logo

Comments (3)

sserdoubleh avatar sserdoubleh commented on May 16, 2024

这部分代码是没问题的,checkpoints是可选参数,这个checkpoints设置实际上是不必要的,显存优化效果不明显
至于NSP训练,想问您是使用提供的NSP model进行finetune吗?
这个模型checkpoint的参数,由于在不同版本的paddle的迭代过程中出了些问题,尽量避免再对其finetune;如果需要finetune的话,可以尝试在config中加上一行use_amp="False",之前测试可以正常训练,可以尝试一下,速度就会慢一倍

from knover.

onewaymyway avatar onewaymyway commented on May 16, 2024

这部分代码是没问题的,checkpoints是可选参数,这个checkpoints设置实际上是不必要的,显存优化效果不明显
至于NSP训练,想问您是使用提供的NSP model进行finetune吗?
这个模型checkpoint的参数,由于在不同版本的paddle的迭代过程中出了些问题,尽量避免再对其finetune;如果需要finetune的话,可以尝试在config中加上一行use_amp="False",之前测试可以正常训练,可以尝试一下,速度就会慢一倍

_calc_logits的定义是这样的
def _calc_logits(self, enc_out, checkpoints=None, seq_pos=None):

代码调用fc_out = self._calc_logits(outputs["enc_out"], inputs["tgt_pos"])这个会把inputs["tgt_pos"]传给checkpoints,这样参数就错位了,得改成fc_out = self._calc_logits(outputs["enc_out"], seq_pos=inputs["tgt_pos"])或者fc_out = self._calc_logits(outputs["enc_out"],None, inputs["tgt_pos"])

我训练NSP是为了做模型融合,就是多技能对话那个比赛,上面这个错就是尝试训练NSP任务的时候出现的,后来我按我说的那样改了就能正常训练了。

非常感谢耐心回答: )

from knover.

sserdoubleh avatar sserdoubleh commented on May 16, 2024

嗯,这个没有查看参数对位的问题,现在把这个不必要的checkpoint去掉了
#31 已经修复了

from knover.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.