您好,我训练自己数据集的时候发现在15.6K迭代额时候,d0和d1的输出为nan,导致l0和l1损失为nan
`[epoch: 308/100000, batch: 2456/ 4085, ite: 156707] train loss: 2.115248, tar: 0.097755
l0: 0.090264, l1: 0.090268, l2: 0.094700, l3: 0.108498, l4: 0.157684, l5: 0.269343, l6: 0.561054
[epoch: 308/100000, batch: 2464/ 4085, ite: 156708] train loss: 2.114669, tar: 0.097731
l0: 0.110660, l1: 0.110660, l2: 0.116909, l3: 0.147194, l4: 0.230913, l5: 0.414125, l6: 0.675684
[epoch: 308/100000, batch: 2472/ 4085, ite: 156709] train loss: 2.115880, tar: 0.097773
l0: 0.101519, l1: 0.101512, l2: 0.107206, l3: 0.128373, l4: 0.198813, l5: 0.377140, l6: 0.674387
[epoch: 308/100000, batch: 2480/ 4085, ite: 156710] train loss: 2.116687, tar: 0.097785
l0: 0.092943, l1: 0.092937, l2: 0.097863, l3: 0.117802, l4: 0.182888, l5: 0.299898, l6: 0.505494
[epoch: 308/100000, batch: 2488/ 4085, ite: 156711] train loss: 2.115991, tar: 0.097769
l0: 0.104595, l1: 0.104529, l2: 0.109673, l3: 0.131785, l4: 0.201885, l5: 0.407138, l6: 0.842563
[epoch: 308/100000, batch: 2496/ 4085, ite: 156712] train loss: 2.118025, tar: 0.097791
l0: nan, l1: nan, l2: 2.413359, l3: 2.419617, l4: 2.441422, l5: 2.419549, l6: 2.403301
[epoch: 308/100000, batch: 2504/ 4085, ite: 156713] train loss: nan, tar: nan
l0: nan, l1: nan, l2: 2.489194, l3: 2.498003, l4: 2.527905, l5: 2.497976, l6: 2.474765
`