nyukat / breast_cancer_classifier Goto Github PK

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Home Page: https://ieeexplore.ieee.org/document/8861376

License: GNU Affero General Public License v3.0

Python 46.48% Shell 2.31% Jupyter Notebook 51.21%

pytorch breast-cancer classification pretrained-models deep-learning neural-network breast-cancer-diagnosis medical-imaging medical-image-analysis tensorflow

breast_cancer_classifier's People

Contributors

Stargazers

Watchers

Forkers

shyamalschandra saeedseyyedi jdc08161063 vladimiralencar nikolasbielski hongyunnchen faisalshahbaz hack121 francishero jpatrickpark detectdimples abdelhafiz ferrumhealth anganmitra pukkapies qigtang hsouporto peetwa gangooteli mave5 2amitprakash alzayats manik-hossain deepdeepdot baffour-share jaggernaut007 debjyotic mukhar ahmedfadhil salelkafrawy xiaochengcike linhduongtuan tomassa banyet1 amonst marioandresr amit2014 johnyjyu lxmwust kant cmansaq dmuinoo tonthatj neuralmed shreezus lorenzomammana beatrizgsc zphang saminyeasar linciheng mzkaramat ansuini fede112 tomarraj008 wxiaoman emkpmg-zz ramirojc zlatnizmaj ameth97 dattachandan ouceduxzk rezacsedu wenzhilv dpetrini kkc-krish rajeevsahani hassandayoub daryahash hinh1108 zibagandomkar sjq5263 muba1 piyushceo beatlejuicepack wuzhan11 dylansppy lizhaodong lordguerdo hafsaraissouli saiuz tejamoy jacekduszenko drsxr kartikmehta09 saraualberta temadivnich kkpalczewski shintotm kjgeras peterzs saching6 cvprojectkiit markjacksonfishing sherlock42 many-hats dupsys lincn duzhanyuan zhouleisjtu haehn

breast_cancer_classifier's Issues

question about model

Why you classification only 3 class (incomplete, normal, benign) ? why not included malignant class ?

random number generator on len(datum[view])

Hi,

Thanks so much for this repo. Great stuff. And love the paper.

the len(datum[view]) from here seems to consistently be 1. And then the image_index is always zero.

Is that right?

Thanks again.

ImageNet weights

is it possible to share the weights of the pre-trained model on imagenet?

Thanks for the great contribution in mammogram! I really appreciate your work.I'm using this repository for my school project but i just wonder, how to use the project to predict one single case (4 mammogram photos) instead of 16 (16 photos found in sample_data/images).
i tried many things before asking and did not work , it keeps asking for the rest of images , or batch_size error

Thanks for reading my question

Import Error: No module named 'imageIO'

Hi everyone I have downloaded the files and when I try to run run.sh, the program breaks at each stage when importing reading_images.py at line 27. ImportError: No module named 'imageio'. I am doing this through the command line. I try to see if it is installed I did pip install imagio and it said requirement already satisfied. I then further try to see if its the path to python by trying to run :python3 import imageio in the same directory, but that command doesn't throw an error.

input images dimension

Hi,

why their sample images provided are not of the measurements, 2677x1942 and 2974x1748 as mentioned in the readme.

Outputs predictions

Hi, I have a question about how are outputs measured, benign and malign are both independent probs from 0 to 1 ? Since they don't sum up to 1 I assumed they are independent. But how about having probs at i.e. 0.8 for both.

IK this is not the right place for doing such a question, my apologize.

unable to read HDF5 files benign and malignant

I used given sample images and ran run.sh all of the steps successfully completed. but i am unable to read hdf5 files from heatmaps benign and malignant.

I used tools like below to convert them to png
hdf5 viewer
https://mygeodata.cloud/converter/hdf5-to-png

any recommendations on how to read these HDF5 files ?

Question on when to use models in practical settings

Dear authors,

Thanks so much for releasing this very useful code!

I have a related question that I hope to hear from you:
I am assuming that radiologists or healthcare workers at NYU are using your models.

How do you determine when the models are good enough for practical use?
Is it all up to a particular clinical standard? OR is there a set of general standards somewhere?

Any pointers would be much appreciated!!

Anh

Preprocessing

I have problems in the preprocessing fase when I run the code on Jupyter.
Infact when I run the following part gives me this error:

model_input = load_inputs(
image_path="sample_single_output/cropped.png",
metadata_path="sample_single_output/cropped_metadata.pkl",
use_heatmaps=False,
)

FileNotFoundError Traceback (most recent call last)
in
2 image_path="sample_single_output/cropped.png",
3 metadata_path="sample_single_output/cropped_metadata.pkl",
----> 4 use_heatmaps=False,

Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence

Stage 1: Crop Mammograms
sample_data/images\0_L_CC.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
sample_data/images\0_R_CC.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
sample_data/images\1_L_MLO.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
sample_data/images\1_R_MLO.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
sample_data/images\1_R_CC.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
sample_data/images\1_L_CC.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
sample_data/images\0_L_MLO.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
sample_data/images\0_R_MLO.png
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Lubomir\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\Lubomir\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\Lubomir\Downloads\breast_cancer_classifier-master\breast_cancer_classifier-master\src\cropping\crop_mammogram.py", line 351, in crop_mammogram_one_image_short_path
return list(zip([scan['short_file_path']] * 4, cropping_info))
TypeError: 'NoneType' object is not iterable
"""

Hello,

I tried to run it with custom data set, but I keep getting this error:
Failed to crop image because image is invalid. attempt to get argmax ofan empty sequence

0 and 1 are all images converted from DICOM to PNG and then cropped to: 2290 × 1890 pixels.

It seems like it is not working just with my images added to dataset.

More Dataset Information

Hi All,

I am currently testing out your suite of models and was hoping to learn more about the dataset on which the models were trained. I have read your paper describing the data in detail, but I could not find the answer to my question in it. I was curious as to what is the statistical distribution of the original dicom images in regards to the Relative X-Ray Exposure as well as the Exposure Index/Sensitivity values found in the dicom tags?

Permission to release my PyTorch implementation for the training procedure and the dataset implementation

Hi everyone, I highly appreciate the work you put into this fantastic paper in the field of breast cancer classification.

I've noticed that this code doesn't include the code for the training procedure, so, I've implemented my own custom training procedure, and a dataset implementation (Pytorch's dataset class).

I wanted to release my implementation for others to use in the future, and I just wanted to make sure that this is fine by the NYU team.

Training procedure?

Hi! I would like to train your model on a dummy dataset. Can you guide me or better yet provide some helper code.? Specifically for the view wise model for a single image. The model outputs a csv file with the resulting probabilities for malignant/benign, so i'm assuming that the training dataset would be images with binary labels for malignant and benign classes. Am I assuming right?

Official request of the dataset

Hi, I wanted to know whether the dataset has been made public.

If not, is there a way I can formally request to have access to the data under the regulations of your organisation or else if you are planning to release it in future.

For processing or for presentation in DICOM

Are the images in the dataset raw data(for processing) or processed data(for presentation)

Predict on DDSM

Firstly, Thanks for your great work on mammogram classification. Recently, I tried to predict your model on a public dataset(DDSM, and CBIS-DDSM). But I found the result is always predicted as BENIGN. Below is a sample case for your reference

Predicted by model (only image):
left_benign right_benign left_malignant right_malignant
0.2456 0.3804 0.0131 0.0716
0.1495 0.5369 0.0180 0.1072
0.1644 0.1658 0.0338 0.0284
0.1821 0.3101 0.0121 0.0585

GoundTurth:
left_benign right_benign left_malignant right_malignant
1 1 0 0
0 0 1 1
0 0 1 1
0 0 0 0

I known the imbalance issue which described in #9, so I selected 2 obvious MALIGNANT cases and 1 obvious BENIGN case. And done all preprocessing which described in #9 and dataset report. But the result is still predicted as BENIGN.

So, could you have evaluated the released model (in your code: breast_cancer_classifier/models/sample_image_model.p) on DDSM or INBreast? And the other question I want to known is, what's different between DDSM with your private dataset?
Thanks

RuntimeError: Error(s) in loading state_dict for SplitBreastModel:

With Device_type = 'cpu', I am getting below error during running 'Stage 4a: Run Classifier (Image)' stage.

Traceback (most recent call last):
File "src/modeling/run_model.py", line 238, in
main()
File "src/modeling/run_model.py", line 233, in main
parameters=parameters,
File "src/modeling/run_model.py", line 188, in load_run_save
model, device = load_model(parameters)
File "src/modeling/run_model.py", line 51, in load_model
model.load_state_dict(torch.load(parameters["model_path"])["model"])
File "/usr/local/envs/py3env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SplitBreastModel:
Missing key(s) in state_dict: "fc1_cc.weight", "fc1_cc.bias", "fc1_mlo.weight", "fc1_mlo.bias", "output_layer_cc.fc_layer.weight", "output_layer_cc.fc_layer.bias", "output_layer_mlo.fc_layer.weight", "output_layer_mlo.fc_layer.bias".
Unexpected key(s) in state_dict: "fc1_lcc.weight", "fc1_lcc.bias", "fc1_rcc.weight", "fc1_rcc.bias", "fc1_lmlo.weight", "fc1_lmlo.bias", "fc1_rmlo.weight", "fc1_rmlo.bias", "output_layer_lcc.fc_layer.weight", "output_layer_lcc.fc_layer.bias", "output_layer_rcc.fc_layer.weight", "output_layer_rcc.fc_layer.bias", "output_layer_lmlo.fc_layer.weight", "output_layer_lmlo.fc_layer.bias", "output_layer_rmlo.fc_layer.weight", "output_layer_rmlo.fc_layer.bias".

Weights Resnet 22

Is it possible to share the original weights of the pre-trained resnet22?

some request

Can you share some training details in the following?

ModuleNotFoundError: No module named 'src'

python3 src/cropping/crop_mammogram.py
--input-data-folder $DATA_FOLDER
--output-data-folder $CROPPED_IMAGE_PATH
--exam-list-path $INITIAL_EXAM_LIST_PATH
--cropped-exam-list-path $CROPPED_EXAM_LIST_PATH
--num-processes $NUM_PROCESSES

Hi when i try to execute the this, I got the following error. Can you help me please? Thanks :)

Traceback (most recent call last):
File "src/cropping/crop_mammogram.py", line 32, in
import src.utilities.pickling as pickling
ModuleNotFoundError: No module named 'src'

dataset

Is this dataset public available?

helper function

thank you for your contributions.

using another dataset to this respository. this code can be useful.

`def divide_list(l, n):
# 리스트 l의 길이가 n이면 계속 반복
for i in range(0, len(l), n):
yield l[i:i + n]
def make_dict(file_list):
# print(file_list)
mkdict = {'horizontal_flip' :"NO",'L-CC' :[file_list[0].split('.')[0]],'L-MLO' : [file_list[1].split('.')[0]],'R-CC':[file_list[2].split('.')[0]],'R-MLO':[file_list[3].split('.')[0]]}
return mkdict
import natsort

sample_file_path = 'your folder path'
sample_file_list = natsort.natsorted(os.listdir(sample_file))
n=4
result = list(divide_list(sample_file_list,n))

bin_list = []
for i in result:
bin_list.append(make_dict(i))
with open(sample_file+ '/nccpatient.pkl','wb') as f:
pickle.dump(bin_list,f)
`
this code can make your pkl file.

Anyway, I have a question,

in your resposirtory, image_prediction.csv,
left_benign,right_benign,left_malignant,right_malignant
0.0580,0.0754,0.0091,0.0179
0.0646,0.9536,0.0012,0.7258
0.4388,0.3526,0.2325,0.1061
0.3765,0.6483,0.0909,0.2597

Are these results written in the order of patient number in the pklfile?

Working with dicom image

Hi,

I want to try my dicom image(mammogram) to the model. But I can't find the procedure to transfer dicom image to png file.
I try some procedure to produce the png file:
First, I load dicom image as an array and use Histogram Equalization algorithm(in open cv) to make the contrast as expected.
Then, I save the array as png file with int32 byte.
Finally, I run your code without any error, but I'm not sure the result is correct or not?

Thanks for your contribution!

some images doesn't work in crop_single_mammogram.py

thank you for your work sharing,
I'm trying to adapt your repository to our dataset.

`score_heat_list = []
import glob
def make_dir(name):
if not os.path.isdir(name):
os.makedirs(name)
print(name, "폴더가 생성되었습니다.")
else:
print("해당 폴더가 이미 존재합니다.")

make_dir('save_imageheatmap_model_figure_folder')

def json_extract_feature(json_data):
patient=json_data['case_id']

#read_all_data:
"""
components 
'user id' = no
'case_id' = split.('_')[1] = patients number
'contour_list' = dict('image_type',dict())

"""
temp_image_type = []
temp_image_type1 = []
temp_image_type2 = []
temp_image_type3 = []

temp_key = []

temp_contour = []
temp_contour1 = []
temp_contour2 = []
temp_contour3 = []



for image_type in json_data['contour_list']['cancer']:
    # print(image_type)
    if image_type == 'lcc': 
        temp_image_type.append(image_type)
    if image_type == 'lmlo':
        temp_image_type1.append(image_type)
    if image_type == 'rcc':
        temp_image_type2.append(image_type)
    if image_type == 'rmlo':
        temp_image_type3.append(image_type)



    for key in json_data['contour_list']['cancer'][image_type]:
        # print(key)

        for contour in json_data['contour_list']['cancer'][image_type][key]:
            
            # print(contour)
            # print(contour.get('x'))
            # print(contour.get('y'))
            bin_list = [contour.get('y'),contour.get('x')]
            if image_type == 'lcc':
                temp_contour.append(bin_list)
            if image_type == 'lmlo':
                temp_contour1.append(bin_list)
            if image_type == 'rcc':
                temp_contour2.append(bin_list)
            elif image_type == 'rmlo':
                temp_contour3.append(bin_list)
    
return temp_image_type,temp_image_type1,temp_image_type2,temp_image_type3,temp_contour,temp_contour1,temp_contour2,temp_contour3

from skimage import draw
def polygon2mask(image_shape, polygon):
"""Compute a mask from polygon.
Parameters
----------
image_shape : tuple of size 2.
The shape of the mask.
polygon : array_like.
The polygon coordinates of shape (N, 2) where N is
the number of points.
Returns
-------
mask : 2-D ndarray of type 'bool'.
The mask that corresponds to the input polygon.
Notes
-----
This function does not do any border checking, so that all
the vertices need to be within the given shape.
Examples
--------
>>> image_shape = (128, 128)
>>> polygon = np.array([[60, 100], [100, 40], [40, 40]])
>>> mask = polygon2mask(image_shape, polygon)
>>> mask.shape
(128, 128)
"""
polygon = np.asarray(polygon)
vertex_row_coords, vertex_col_coords = polygon.T
fill_row_coords, fill_col_coords = draw.polygon(
vertex_row_coords, vertex_col_coords, image_shape)
mask = np.zeros(image_shape, dtype=np.bool)
mask[fill_row_coords, fill_col_coords] = True
return mask
##############################################################################################################################
from tqdm import tqdm
from src.heatmaps.run_producer_single import produce_heatmaps
import json
from PIL import Image
annotation_folder = r'/home/ncc/Desktop/2020_deep_learning_breastcancer/annotation_SN/'
import pickle
for png in tqdm(png_list[0:8]):
print(PATH+png)
crop_single_mammogram(PATH+png, horizontal_flip = 'NO', view = png.split('_')[1].split('.')[0],
cropped_mammogram_path = PATH+'cropped_image/'+png, metadata_path = PATH+png.split('.')[0]+'.pkl',num_iterations = 100, buffer_size = 50)
print(PATH+'cropped_image/'+png)
get_optimal_center_single(PATH+'cropped_image/'+png,PATH+png.split('.')[0]+'.pkl')
model_input = load_inputs(
image_path=PATH+'cropped_image/'+png,
metadata_path=PATH+png.split('.')[0]+'.pkl',
use_heatmaps=False,
)
####################################################################################################################################
parameters = dict(
device_type='gpu',
gpu_number='0',

patch_size=256,

stride_fixed=20,
more_patches=5,
minibatch_size=10,
seed=np.random.RandomState(shared_parameters["seed"]),

initial_parameters="/home/ncc/Desktop/breastcancer/nccpatient/breast_cancer_classifier/models/sample_patch_model.p",
input_channels=3,
number_of_classes=4,

cropped_mammogram_path=PATH+'cropped_image/'+png,
metadata_path=PATH+png.split('.')[0]+'.pkl',
heatmap_path_malignant=PATH+png.split('.')[0]+'_malignant_heatmap.hdf5',
heatmap_path_benign=PATH+png.split('.')[0]+'_benign_heatmap.hdf5',

heatmap_type=[0, 1],  # 0: malignant 1: benign 0: nothing

use_hdf5="store_true"

)
###########################################################################################################################

read annotation SN00000016_L-CC.png

#코드를 읽어보면 이름이 같은 JSON 파일을 4번 읽어오고 있음.. 코드 경량화때 해결 필요
#annotation 기준은 CROP된 이미지가 아니라, 원본 이미지임, 그런데 이미지로 보여주는건 CROP된 이미지로 보여주고 있음..

# print(png.split('_')[0])
with open(PATH+png.split('.')[0]+'.pkl','rb') as f:
    location_data = pickle.load(f)
print(location_data)
start_point1 = list(location_data['window_location'])[0]
endpoint1 = list(location_data['window_location'])[1]
start_point2 = list(location_data['window_location'])[2]
endpoint2 = list(location_data['window_location'])[3]
print(start_point1,start_point2)
with open(annotation_folder+'Cancer_'+png.split('_')[0]+'.json') as json_file:
    json_data = json.load(json_file)

temp_image_type,temp_image_type1,temp_image_type2,temp_image_type3,temp_contour,temp_contour1,temp_contour2,temp_contour3 = json_extract_feature(json_data)

import operator
if png.split('_')[1].split('.')[0] =='L-CC':
    new_contour_list = temp_contour
if png.split('_')[1].split('.')[0] =='L-MLO':
    new_contour_list = temp_contour1
if png.split('_')[1].split('.')[0] =='R-CC':
    new_contour_list = temp_contour2  
if png.split('_')[1].split('.')[0] =='R-MLO':
    new_contour_list = temp_contour3

im = Image.open(PATH+png)
im_cropped = Image.open(PATH+'cropped_image/'+png)
print('원본 이미지:',im.size,'cropped image:',im_cropped.size)
new_contour = []
for image_list in new_contour_list:
    # print('_',image_list)
    new_temp_contour =map(operator.add,image_list,reversed(list(np.array(im.size)/2)))
    new_contour.append(list(new_temp_contour))
    # print(new_contour)
try:
    # 'window_location': (103, 2294, 0, 1041)
    img = polygon2mask(im.size[::-1],np.array(list(new_contour)))
    img_cropped = img[start_point1:endpoint1,start_point2:endpoint2]
    im = cv2.imread(PATH+png)
    im_cropped = cv2.imread(PATH+'cropped_image/'+png)
except ValueError as e:
    img = np.zeros(im.size)

###########################################################################################################################
random_number_generator = np.random.RandomState(shared_parameters["seed"])

# random_number_generator = np.random.RandomState(shared_parameters["seed"])
produce_heatmaps(parameters)
image_heatmaps_parameters = shared_parameters.copy()
image_heatmaps_parameters["view"] = png.split('_')[1].split('.')[0]
image_heatmaps_parameters["use_heatmaps"] = True
image_heatmaps_parameters["model_path"] = "/home/ncc/Desktop/breastcancer/nccpatient/breast_cancer_classifier/models/ImageHeatmaps__ModeImage_weights.p"



model, device = load_model(image_heatmaps_parameters)

model_input = load_inputs(
image_path=PATH+'cropped_image/'+png,
metadata_path=PATH+png.split('.')[0]+'.pkl',
use_heatmaps=True,
benign_heatmap_path=PATH+png.split('.')[0]+'_malignant_heatmap.hdf5',
malignant_heatmap_path=PATH+png.split('.')[0]+'_benign_heatmap.hdf5')

batch = [
process_augment_inputs(
    model_input=model_input,
    random_number_generator=random_number_generator,
    parameters=image_heatmaps_parameters,
    ),
]

tensor_batch = batch_to_tensor(batch, device)
y_hat = model(tensor_batch)
###############################################################
fig, axes = plt.subplots(1, 5, figsize=(16, 4))
x = tensor_batch[0].cpu().numpy()
axes[0].imshow(im, cmap="gray")
axes[0].imshow(img, cmap = 'autumn', alpha = 0.4)
axes[0].set_title("OG_Image")

axes[1].imshow(im_cropped, cmap="gray")
axes[1].imshow(img_cropped, cmap = 'autumn', alpha = 0.4)
axes[1].set_title("Image")

axes[2].imshow(x[0], cmap="gray")
axes[2].imshow(img_cropped, cmap = 'autumn', alpha = 0.4)
axes[2].set_title("Image")

axes[3].imshow(x[1], cmap=LinearSegmentedColormap.from_list("benign", [(0, 0, 0), (0, 1, 0)]))
axes[3].set_title("Benign Heatmap")

axes[4].imshow(x[2], cmap=LinearSegmentedColormap.from_list("malignant", [(0, 0, 0), (1, 0, 0)]))
axes[4].set_title("Malignant Heatmap")
plt.savefig('save_imageheatmap_model_figure_folder'+'/'+png.split('.')[0]+'.png')
################################################################
predictions = np.exp(y_hat.cpu().detach().numpy())[:, :2, 1]
predictions_dict = {
    "image" : png,
    "benign": float(predictions[0][0]),
    "malignant": float(predictions[0][1]),
}

print(predictions_dict)
score_heat_list.append(predictions_dict)`

Attached file is cropped mammography which is made by this code.
Issue is some mammogram doesn't crop well. Am I doing something wrong?

missing single_image files

Hi,
Is it intended or these files are just missing? they were mentioned in the newly uploaded notebook:
sample_single_output/cropped.png
sample_single_output/cropped_metadata.pkl

What loss we should use to train the SplitBreastModel ?

I see that log_softmax is calculated in forward() of output layer used by all the models. Should we be using BCELoss or CrossEntopyloss in that case?

Issue with image_extension when parameter use-hdf5 is used

There is an issue in run_producer with image_extension when use-hdf5 is added as a parameter in run.sh.

Traceback:

Traceback (most recent call last):
  File "src/heatmaps/run_producer.py", line 392, in <module>
    main()
  File "src/heatmaps/run_producer.py", line 388, in main
    produce_heatmaps(model, device, parameters)
  File "src/heatmaps/run_producer.py", line 344, in produce_heatmaps
    making_heatmap_with_large_minibatch_potential(parameters, model, exam_list, device)
  File "src/heatmaps/run_producer.py", line 270, in making_heatmap_with_large_minibatch_potential
    all_patches, all_cases = sample_patches(exam, parameters)
  File "src/heatmaps/run_producer.py", line 223, in sample_patches
    parameters=parameters,
  File "src/heatmaps/run_producer.py", line 240, in sample_patches_single
    parameters,
  File "src/heatmaps/run_producer.py", line 102, in ori_image_prepare
    image = loading.load_image(image_path, view, horizontal_flip)
  File "src/data_loading/loading.py", line 59, in load_image
    image = read_image_mat(image_path)
  File "src/utilities/reading_images.py", line 37, in read_image_mat
    data = h5py.File(file_name, 'r')
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'sample_output/cropped_images/0_L_CC.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Issue seems to go away by hard-coding here

def get_image_path(short_file_path, parameters):
    """
    Convert short_file_path to full file path
    """
    return os.path.join(parameters['original_image_path'], short_file_path + 'png')

The intention has probably been not to use use-hdf5 parameter at all, but it is listed in run_producer and it does allow the script to be modified to save also in png format (e.g. for visualization purposes) by adding here

saving_images.save_image_as_png(img_as_ubyte(heatmap_malignant), os.path.join(
        parameters['save_heatmap_path'][0], 
        short_file_path + '.png
    ))
saving_images.save_image_as_png(img_as_ubyte(heatmap_benign), os.path.join(
        parameters['save_heatmap_path'][1],
        short_file_path + '.png'
    ))

There is a somewhat similar issue in run_model with image_extension when use-hdf5 is added as a parameter in run.sh.

Traceback:

Traceback (most recent call last):
  File "src/modeling/run_model.py", line 238, in <module>
    main()
  File "src/modeling/run_model.py", line 233, in main
    parameters=parameters,
  File "src/modeling/run_model.py", line 189, in load_run_save
    predictions = run_model(model, device, exam_list, parameters)
  File "src/modeling/run_model.py", line 82, in run_model
    horizontal_flip=datum["horizontal_flip"],
  File "src/data_loading/loading.py", line 59, in load_image
    image = read_image_mat(image_path)
  File "src/utilities/reading_images.py", line 37, in read_image_mat
    data = h5py.File(file_name, 'r')
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'sample_output/cropped_images/0_L_CC.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Perhaps the safest solution is to hard-code here the correct file extension

loaded_image = loading.load_image(
    image_path=os.path.join(parameters["image_path"], short_file_path + ".png"),
    view=view,
    horizontal_flip=datum["horizontal_flip"],
    )

Running on CPU

Hi
Nice library! Will i need GPU to run this on? Can you perhaps add a readme to run this on gcloud (which i suppose already has support for pytorch and gpu).

Thanks
Supraja

View the original image with heatmaps

Hi,

Thanks for the great contribution in mammogram! I really appreciate your work. I just wonder, how to view the heatmap prediction to the original image?

Ive tried to view the hdf5 file but didnt get quite right for this.

Thanks!

How do we manipulate the tensors with required_grad=True?

Hi,

I am writing a training procedure, and was taking the help of run_model() code, here probabilities across the vies are averaged. But that works only if tensor is detached as NumPy array.

def run_model() { ..
  batch_predictions = compute_batch_predictions(output, mode=parameters["model_mode"])
  pred_df = pd.DataFrame({k: v[:, 1] for k, v in batch_predictions.items()})
  pred_df.columns.names = ["label", "view_angle"]
  predictions = pred_df.T.reset_index().groupby("label").mean().T[LABELS.LIST].values
}

Can I dod something like below -

def run_model() { ..
  gt = np.transpose(birads_labels['label'].values.reshape(predictions.shape[1], 1))
  gt = torch.tensor(gt, dtype=torch.float, requires_grad=True)
  predictions = torch.tensor(predictions, requires_grad=True)
  l = loss(predictions, gt)
  l.backward()
  optimizer.step()

So I was wondering can we detach the tensor in compute_bach_predictions() but still can use it later as a tensor with required grad true to backpropagate?

Converting .hdf5 heatmap file to PNG.

Is there an option to convert output heatmap files from HDF5 format to PNG or JPEG, so I can open it in regular image viewer software?

Thank you in advance.

Working with my own images

Hi, I am trying to produce a result from your single image model. However, when I try to use my own DICOM images, my images are failed when it comes to use your cropping script. The cropped output is the same as the 16 bit png input. And also, when I tried to load it into the model to make an inference, I just saw almost fully black image when I wanted to try visualizing the tensor batch. Long story short, I cannot obtain a consistent raw image, cropped image and a tensor batch trio. I am working with uint16 image data. What can be the reason for this? Many thanks.
Kindest regards.

Module Not Found Error: No module named 'src'

Hi, when i try to execute the run.sh I got the following error. Can you help me please? Thanks :)

Stage 1: Crop Mammograms
Traceback (most recent call last):
File "src/cropping/crop_mammogram.py", line 32, in
import src.utilities.pickling as pickling
ModuleNotFoundError: No module named 'src'
Stage 2: Extract Centers
Traceback (most recent call last):
File "src/optimal_centers/get_optimal_centers.py", line 32, in
from src.constants import INPUT_SIZE_DICT
ModuleNotFoundError: No module named 'src'
Stage 3: Generate Heatmaps
Traceback (most recent call last):
File "src/heatmaps/run_producer.py", line 36, in
import src.heatmaps.models as models
ModuleNotFoundError: No module named 'src'
Stage 4a: Run Classifier (Image)
Traceback (most recent call last):
File "src/modeling/run_model.py", line 34, in
import src.utilities.pickling as pickling
ModuleNotFoundError: No module named 'src'
Stage 4b: Run Classifier (Image+Heatmaps)
Traceback (most recent call last):
File "src/modeling/run_model.py", line 34, in
import src.utilities.pickling as pickling
ModuleNotFoundError: No module named 'src'

Testing on other data sets

Hi, so I was testing the model against another dataset of mammos, and was wondering if the inputted image dimensions have to be exact? Your sample cropped photos: (2440X3656) & (2607X3818) and of ours (1993X4396) & (2133X4906).

P_00005,RIGHT, CC, MALIGNANT, 0.1293 ,0.0123
P_00005,RIGHT,MLO, MALIGNANT ,0.1293, 0.0123
P_00007, LEFT, CC, BENIGN ,0.3026, 0.1753
P_00007,LEFT,MLO, BENIGN, 0.3026, 0.1753

As you can see the probabilities for benign and malignancy (respectively) are incredibly low, Out of a dataset of 200, the model only accurately predicted ~ 10 of them.

Patch Classifier

KeyError: 'module name can\'t contain "."'

getting above ERROR during stage 3 run.
after research discovered that torch vision package needs to be updated to 0.2.1 (pytorch/vision#474)
this is just an FYI

Tensorflow Error: Default MaxPoolingOp only supports NHWC on device type CPU

I'm running the run_single_tf.sh with CPU device type and following error occurs:

2020-06-12 14:28:39.745875: E tensorflow/core/common_runtime/executor.cc:624] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
	 [[{{node model/resnet/first/max_pooling2d/MaxPool}}]]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
	 [[{{node model/resnet/first/max_pooling2d/MaxPool}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/modeling/run_model_single_tf.py", line 224, in <module>
    main()
  File "src/modeling/run_model_single_tf.py", line 220, in main
    run(parameters)
  File "src/modeling/run_model_single_tf.py", line 171, in run
    y_hat = sess.run(y, feed_dict={x: x_data})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
	 [[node model/resnet/first/max_pooling2d/MaxPool (defined at /home/user/Projects/Dok/git/breast_cancer_classifier/src/modeling/models_tf.py:133) ]]

How can I solve this issue?

How to create a new exam list ?

Hi, Thanks for sharing your code!

Is there any utility to create the exam list in the format needed for this code from standard mammography datasets such as DDSM, INBREAST ?

Standard deviation in AUC

Hi,
I am performing some experiments with your models.
Can you inform the formula used for calculation of standard deviation for AUCs in table S4 (pg 8 of 13) from Arxiv paper?

Thank you,

Question on your patch classifier.

I have a question about your patch classifier.
In your scripts, number_of_classes of your patch_classifier (DenseNet121) is specified by 4.

Question 1. Do these 4 classes mean : no_benign, benign, no_malignant, malignant ?

Question 2. Is the configuration of the output stage in your patch classifier something like:
AvePool2D (7x7x1024,, sirides=1x1) --> linear(1000) --> softmax (4) ???

Dataset

Is this dataset publicly available? where is the download link ?

train our own data without 4 views

our data do not have 4 standard views. how should I train own data？

Resize transform for images during training

Apart from being cropped was any explicit resize applied to the images? I am trying to fine-tune on the DDSM dataset and noticed varying performance on different resize values. Thanks for sharing model weights.

Batch size != 1

Hi,

With the last update of the repo I can no longer use --batch-size 2 to run the full model (stage 4a. in run.sh).
The problem is in this assertion:

def compute_batch_predictions(y_hat, mode):
    """
    Format predictions from different heads
    """

    if mode == MODELMODES.VIEW_SPLIT:
        assert y_hat[VIEWANGLES.CC].shape == (1, 4, 2)
        assert y_hat[VIEWANGLES.MLO].shape == (1, 4, 2)

The first dimension depends on the batch-size.

Hope it is useful.

How long did it take to complete the Stage 3 (Generate Heatmaps) ?

I am running your codes in 'cpu' environment.
It is extremely slow.
How long did it take to complete the Stage 3 (for your 4 sample images) in a single GPU environment?

fine tuning

If I have dataset that dont have the 4 views of a test, how would I benefit from NYU model and fine tune it?

can I run the same image over the 4 views or the 2 views in the single_image model and take the average?

and why there are Right and Left while the preprocessing flips all images to be Left?

OSError: Unable to open file (unable to open file: name = 'sample_output/heatmaps/heatmap_benign/0_L_CC.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Hi!

Thank you for a great project.

I'm trying to run the project on a K80 on Google Cloud using their provided Pytorch image.

The model fails when trying to create the heatmats with the following error.

I've tried creating the missing directories and reading through the source code.

Any help would be greatly appreciated.
Stage 1: Crop Mammograms
Error: the directory to save cropped images already exists.
Stage 2: Extract Centers
Stage 3: Generate Heatmaps
Traceback (most recent call last):
File "src/heatmaps/run_producer.py", line 29, in
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'
Stage 4a: Run Classifier (Image)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:19<00:00, 5.10s/it]
Stage 4b: Run Classifier (Image+Heatmaps)
0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/modeling/run_model.py", line 195, in
main()
File "src/modeling/run_model.py", line 190, in main
parameters=parameters,
File "src/modeling/run_model.py", line 148, in load_run_save
predictions = run_model(model, exam_list, parameters)
File "src/modeling/run_model.py", line 80, in run_model
horizontal_flip=datum["horizontal_flip"],
File "/home/birgermoell/chexnet/breast_cancer_classifier/src/data_loading/loading.py", line 72, in load_heatmaps
benign_heatmap = load_image(benign_heatmap_path, view, horizontal_flip)
File "/home/birgermoell/chexnet/breast_cancer_classifier/src/data_loading/loading.py", line 59, in load_image
image = read_image_mat(image_path)
File "/home/birgermoell/chexnet/breast_cancer_classifier/src/utilities/reading_images.py", line 37, in read_image_mat
data = h5py.File(file_name, 'r')
File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/files.py", line 312, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/files.py", line 142, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'sample_output/heatmaps/heatmap_benign/0_L_CC.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Train model on other radiography dataset

Is there some way to train this model on other radiography dataset? The dataset I am dealing with is the X-ray of casting and weld of mechanical components in which I want to detect and classify defects.

Data

Hello !
Great work . From where can I download the breast data? Thanks !