Giter Site home page Giter Site logo

l2g's People

Contributors

pengtaojiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

l2g's Issues

resnet模型

您好,能提供一下'ilsvrc-cls_rna-a1_cls1000_ep-0001.params'这个文件吗,谷歌云盘的地址失效了,非常感谢。

显著性图的问题

您好,我如果训练其他数据集是否需要重新训练对应的显著性图呢?有相关训练代码吗?

下载数据集的问题

你好,你们提供的PASCAL VOC下载地址PASCAL VOC 2012貌似下载下来大小是0k,不知道是不是你们上传出错了,希望能得到你们的帮助~

AttributeError: 'Net' object has no attribute 'module'

在运行test_l2g_voc.py文件时出现了这个报错:
validating ... models.resnet38
Weights cannot be loaded:
[]
0it [00:05, ?it/s]
Traceback (most recent call last):
File "/home/work/Desktop/Projects/xy/L2G/scripts/test_l2g_voc.py", line 123, in
validate(args)
File "/home/work/Desktop/Projects/xy/L2G/scripts/test_l2g_voc.py", line 76, in validate
cam_map = model.module.get_heatmaps()
File "/home/work/anaconda3/envs/l2g/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1178, in getattr
type(self).name, name))
AttributeError: 'Net' object has no attribute 'module'

Process finished with exit code 1
请问这可能是哪方面的问题呢?

网络结构

您好,请教一下。源代码中的局部网络和全局网络具体用的是什么网络结构,是resnet38+deeplabv2吗

a question

How does the code deal with passing several cropped images to the local model at a time? Does it put together the cropped to make one picture.

densecrf

Hello, currently, most papers perform post-processing with denseCRF when comparing performance. I would like to ask how to implement denseCRF optimization for the localization maps and segmentation results in our project's code?

预训练权重

请问“ilsvrc-cls_rna-a1_cls1000_ep-0001.params”这个权重是什么权重,是在什么数据集上的训练权重?

模型参数训练问题

你好,想请教一下,全局网络和局部网络在代码中采用了同一个模型但是属于不同的实例,局部网络的输出调用了.clone().detach(),请问这行代码的作用是什么。两支网络是共享参数吗,如果不是的话局部网络的参数是怎么训练的呢,是和全局网络同时独立的更新参数吗

关于epoch

您好,我用您的实验设置进行了训练,但在第4到5个epoch时模型就已经收敛了,请问为什么要设置10个epoch呢?谢谢!

怎么用自己的数据集?

您好,这个如果我想用自己的数据集训练生成伪标签,该怎么做?我看到这里似乎没有自己训练的代码。

关于Attention maps的问题

你好,我尝试在VOC2012数据集上测试你们的代码时发现,生成pseudo label的阶段中缺少了attention maps,然而我并没有找到你们提供了工具加载checkpoint生成这份东西的代码,所以我想请教一下,你们是在代码的哪个阶段生成这个attention maps的?

Discrepancy in Pseudo Label Accuracy & Repository Settings for COCO

Hello Pengtao,
I reproduced the weakly supervised semantic segmentation on COCO using this repo, and followed all the default setting. However, I achieved a pseudo-label accuracy of ~40%. As of my knowledge, the final performance rarely exceeds the pseudo label accuracy significantly, there seems to be a difference compared to the paper's reported results on COCO val set.

I was wondering if there are specific settings or adjustments required for COCO that aren't set by default in the repository?

Thank you,
TC

不使用显著图进行训练

您好,如果我想复现不使用显著图时的结果,是否只需要注释掉以下代码:
feat_local_label_bool = (feat_local_label > args.bg_thr).type_as(feat_local_label)
crop_sals = F.interpolate(crop_sals, (h, w)).type_as(feat_local_label)
feat_local_label[:, :-1, :, :] = feat_local_label_bool[:, :-1, :, :] * crop_sals.repeat(1, args.num_classes, 1, 1)
feat_local_label[:, -1, :, :] = feat_local_label[:, -1, :, :] * ((1 - crop_sals).squeeze(1))

PASCAL VOC2012测试集的性能

在pascal官网上下载测试集数据后,没有图像的分类标签_cls文件,那实验中生成测试图像伪标签的时候是提前给数据集打好标签的吗?

预测测试集出来是空的

你好,我用训练完的check_final.pth文件预测测试集分割结果,但不知道是不是因为没label,存储路径里没有分割结果图,困扰挺长时间的,能帮忙解答一下吗

cam_png

代码中cam_png是伪标签吗?和用于分割的伪标签是什么联系呢?

data问题

请问data/coco14/JPEGImages和SegmentationClass中放的是训练集的还是验证集的图片

a problem of the code

Excuse me. What is the folder named SegmentationOnClassAug? And what is used to calculate the mIou with 'Attentions' in the test stage. I think 'Attentions' are evaluated by the number of 1464 segmentation labeled pictures in the voc12 at the beginning. BUT the code do not match my thought seamingly.

pascal voc测试结果问题

您好,我按照readme产生了pseudo_seg_labels,miou是70.3,将伪掩码拿去deeplabv2_resnet监督训练(这里我将train_aug.txt中的所有SegmentationClassAug目录换成了所有pseudo_seg_labels目录),训练之后测试阶段加载得到的checkpoint_final.pth,但是在val上的结果非常差,miou只有22%左右,不过我的预训练权重是按照deeplab的readme转换得到的:Generate deeplabv2_resnet101_msc-vocaug.pth from train2_iter_20000.caffemodel ,是这个vocaug.pth,不知道是否是这个造成了影响呢?

crop_size的问题

您好,请问您的方法中crop_size大小是否会影响模型的精度,因为您局部图像的标签和全局图像标签是一致的,但是crop_size过小的话,局部图像可能会出现全是背景的情况,但是类别标签为1,这样是否会导致精度下降呢?

显著性图的问题

请问voc12的显著性图是直接使用poolnet的预训练权重生成的吗,还是先用voc数据集训练了poolnet?

deeplab训练问题

你好,我在使用生成的voc伪掩码训练deeplab时,所采用的预训练模型均和论文一致,但无法达到论文的精度,分别是:vgg16_v1:65%,vgg16_v2,:66.2%,resnet_v1:68.5,resnet_v2:68.9,以上全是对应在val set上的miou ,而论文值分别是68.1%,68.5%,72%,72.1%,我想请问在使用生成的伪掩码训练deeplab时是否需要注意什么超参数的设置?比如随机种子的设置是否有讲究,谢谢

使用本代码进行训练过程中local loss变为nan

您好,我使用本仓库提供的代码,安装了相关的依赖库(其中如torch,torchvision等框架包根据requirement.txt中的版本要求进行安装,其他依赖库由于版本兼容性原因,未指定版本安装)后,由于设备原因,将batchsize从3降至1在PASCAL VOC2012上训练分类模型(其他超参数没有进行修改),在训练过程中,local loss变为nan,请问您知道问题所在吗?
image
训练配置参数如下:
image

MS COCO数据集问题

coco14数据集中的SegmentationClass的图片可以在哪里下载呢?我在官网下载只有JEPGImages图片

框架图注意力图如何生成的?

论文网络框架图的局部注意力图是在哪步生成的,跑完程序没见到呀?attention是全局的 ,cam_png也不是,是需要自己处理吗?

letter of thanks

Very nice work!Thanks for the authors' contribution. The proposed method is effective and helpful.

训练二分类

作者你好,感谢你的开源代码!我目前在用你的代码跑实验,我的数据集仅包含一个类别,有含有前景物体的图像及其仅包含背景的图像,这样的话我在训练模型的时候该如何设置numclass呢,目前我设置了两个类别,背景图像的标签是0,含有前景的图像标签设置为1,这样操作会不会有什么影响,有没有好的建议?

The aggregation of saliency map

Hi, thanks for your work.
I have confusion in terms of the aggregation of the saliency map. After getting the local CAM, the saliency map is adopted to revise the pseudo mask as in https://github.com/PengtaoJiang/L2G/blob/main/scripts/train_l2g_sal_voc.py#L203 . It seems to remove a CAM foreground region if the region has no overlap with the saliency map? i.e., the transfer loss would be L_st =||0-G_i|| if |S_i|=0 instead of L_st =||A_i-G_i|| if |S_i|=0 as paper report.
Thank you.

miou差距较大,请问是哪里出了问题?

老师您好,我按照给的md文件的流程,跑了一下voc数据集,miou结果差距很大,经过代码中提供的deep_labv2训练后miou只有0.49。想咨询一下您是哪里出了问题。

以下是各个sh文件下的超参数(只改了文件路径):
train_l2g_sal_voc.sh 文件中:

EXP=exp_voc
RUN_FILE=train_l2g_sal_voc.py
BASH_FILE=train_l2g_sal_voc.sh
GPU_ID=0
CROP_SIZE=320
PATCH_NUM=6

mkdir -p runs/${EXP}/model/
cp ${BASH_FILE} runs/${EXP}/model/${BASH_FILE}
cp scripts/${RUN_FILE} runs/${EXP}/model/${RUN_FILE} 

CUDA_VISIBLE_DEVICES=0 python3 ./scripts/${RUN_FILE} \
    --img_dir=/mnt/data/zcw/segmentation/data/VOCdevkit/VOC2012/ \
    --train_list=./data/voc12/train_cls.txt \
    --test_list=./data/voc12/val_cls.txt \
    --epoch=10 \
    --lr=0.001 \
    --batch_size=3 \
    --iter_size=1 \
    --dataset=pascal_voc \
    --input_size=448 \
    --crop_size=${CROP_SIZE} \
    --disp_interval=100 \
    --num_classes=20 \
    --num_workers=16 \
    --patch_size=${PATCH_NUM} \
    --snapshot_dir=./runs/${EXP}/model/  \
    --att_dir=./runs/${EXP}/  \
    --decay_points='5' \
    --kd_weights=10 \
    --bg_thr=0.001
    # --load_checkpoint="./runs/${EXP}/model/author_pretrained/pascal_voc_epoch_9.pth" \
    # --current_epoch=10

在test_l2g_voc.sh文件中:

EXP=exp_voc
TYPE=ms
THR=0.25

CUDA_VISIBLE_DEVICES=1 python3 ./scripts/test_l2g_voc.py \
    --img_dir=/mnt/data/zcw/segmentation/data/VOCdevkit/VOC2012/JPEGImages/ \
    --test_list=./data/voc12/train_cls.txt \
    --arch=vgg \
    --batch_size=1 \
    --dataset=pascal_voc \
    --input_size=224 \
	--num_classes=20 \
    --thr=${THR} \
    --restore_from=./runs/${EXP}/model/author_pretrained/pascal_voc_epoch_9.pth \
    --save_dir=./runs/${EXP}/${TYPE}/attention/ \
    --multi_scale \
    --cam_png=./runs/${EXP}/cam_png/

CUDA_VISIBLE_DEVICES=1 python3 scripts/evaluate_mthr_voc.py \
    --datalist ./data/voc12/train_aug.txt \
    --gt_dir /mnt/data/zcw/segmentation/data/VOCdevkit/VOC2012/SegmentationClassAug// \
    --save_path ./runs/${EXP}/${TYPE}/result.txt \
    --pred_dir ./runs//${EXP}/${TYPE}/attention/

在gen_gt_voc.sh文件中

EXP=exp_voc
TYPE=ms

CUDA_VISIBLE_DEVICES=0 python3 gen_gt.py \
   --dataset=pascal_voc \
   --datalist=data/voc12/train_aug.txt \
   --gt_dir=/mnt/data/zcw/segmentation/data/VOCdevkit/VOC2012/JPEGImages/ \
   --save_path=/mnt/data/zcw/segmentation/data/VOCdevkit/VOC2012/pseudo_seg_labels/\
   --pred_dir=./runs/${EXP}/${TYPE}/attention/ \
   --num_workers=16

消融实验

实验1中使用滑动窗口,局部图像和L2G方法比较mIoU值的时候,用CAM作的基线,但原始CAM提出时没有经过分割;所以这个实验是把CAM生成注意力图的方法替换成了本文中的注意力生成方法吗

about reproducing

Hello.
Thank you for sharing the impressive work, L2G!

I tried to reproduce the L2G following the implementation script, however, I obtained about 66% mIoU on VOC2012, which is far behind the 72.1% of paper mIoU.

I used the bellow scripts.

  1. use the Pretrained models for VOC, without ./train_l2g_sal_voc.sh
  2. run ./test_l2g_voc.sh
  3. run ./gen_gt_voc.sh
  4. run deeplab v2 python main.py train --config-path configs/voc12_resnet_dplv2.yaml

Could you check reproducing of this work?
Are there any recommendations about this result?

# data/scores/voc12_resnet_v2/deeplabv2_resnet101_msc/val/scores_crf.json
{
    "Class IoU": {
        "0": 0.8893978389769717,
        "1": 0.7332514237124688,
        "2": 0.2647600116462558,
        "3": 0.7530019963539774,
        "4": 0.6229389456497264,
        "5": 0.6844399420943811,
        "6": 0.8477865759662723,
        "7": 0.7323184116750477,
        "8": 0.8361064618817893,
        "9": 0.268236496479961,
        "10": 0.7934048627881579,
        "11": 0.4293026338035794,
        "12": 0.7782026771867929,
        "13": 0.7793843619160901,
        "14": 0.6993258587392808,
        "15": 0.7108989092335944,
        "16": 0.5175071559996408,
        "17": 0.7581466783853126,
        "18": 0.4200272117472957,
        "19": 0.8026335398365552,
        "20": 0.5178384991250041
    },
    "Frequency Weighted IoU": 0.839895707892765,
    "Mean Accuracy": 0.8471590439388089,
    "Mean IoU": 0.6589957377713407,
    "Pixel Accuracy": 0.9036374933774816
}

box_generation 的作用

您好,在阅读您的代码后,在您LoadData.py中的 box_generation 函数有些难理解,可以向您请教一下box_generation 中生成boxes的过程,以及最后生成的boxes的含义和用途吗?十分感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.