When I was working on the segmentation task, I ran into a problem: <code class="no

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

In the segmentation task, EncodedPixels seems to have an extra space, which I remove, but... about gloria HOT 15 CLOSED

church-XP commented on September 14, 2024

In the segmentation task, EncodedPixels seems to have an extra space, which I remove, but...

from gloria.

Comments (15)

church-XP commented on September 14, 2024

At the beginning when I was working on the split task
It cannot preprocess to generate train.csv
I ran the code alone and the generation succeeded
as PNEUMOTHORAX_ORIGINAL_TRAIN_CSV using stage_2_sample_submission.csv

But even though I already have the paperwork
The error display is KeyError: 'EncodedPixels'
I thought it was an extra space, so I took it out
But then there were another problems
ValueError: 'a' cannot be empty unless no samples are taken
I really don't know how to deal with this

from gloria.

church-XP commented on September 14, 2024

I found out that this is the prediction sample that you need to submit on Kaggle
Therefore, it should be spilt data with train.csv
But it doesn't work

KeyError: '1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520'
So I would like to ask how should I correspond to mask and imagEID
How do you corresponding ID_9979c1b39 and imageid 1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520

from gloria.

marshuang80 commented on September 14, 2024

Hi there, I am having some difficulties understanding your questions. Can you elaborate on what you meant by "paperwork"? What is the "split task" you were working on? Can you please also provide the detailed error message and the specific command you ran so I can help you debug from my end? It is difficult to help without knowing the scripts you are referring to.

To answer your first question, the "a" here refers to the first parameters for np.random.choice, which is the list that we are randomly sampling. In this case, it is the variable, neg_series, which comes from self.df_neg["ImageId"].unique() from this line. I recommend you double-check if the ImageId column is correctly processed.

For your reference, I have attached PNEUMOTHORAX_TRAIN_CSV below:
train.csv

from gloria.

church-XP commented on September 14, 2024

I'm sorry for the expression，Let me rephrase that

I am working on Your segmentation task.
I've updated the data path after constants.py
I can't find
PNEUMOTHORAX_ORIGINAL_TRAIN_CSV = "train-rle.csv"
The only files I downloaded were
stage_2_train.csv and stage_2_sample_submission.csv

I thought stage_2_train.csv was your train-rle.csv, but it didn't work
It tells me I lack of "train.csv"
In your preprocess_datasets.py
preprocess_pneumonia_data(test_fac=0.15): doesn't seem to be worked

PNEUMOTHORAX_ORIGINAL_TRAIN_CSV cannot be used to allocate train.csv, vald.csv, and test.csv
So I did this separately, but there seems to be a problem with the assignment

`if name == "main":
try:
df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV)
except:
raise Exception(
"Please make sure the the SIIM Pneumothorax dataset is
stored at {PNEUMOTHORAX_DATA_DIR}"
)

# get image paths
img_paths = {}
for subdir, dirs, files in tqdm.tqdm(os.walk(PNEUMOTHORAX_IMG_DIR)):
    for f in files:
        if "dcm" in f:
            # remove dcm
            file_id = f[:-4]
            img_paths[file_id] = os.path.join(subdir, f)

# no encoded pixels mean healthy
df["Label"] = df.apply(
    lambda x: 0.0 if x["EncodedPixels"] == "-1" else 1.0, axis=1
)
df["Path"] = df["ImageId"].apply(lambda x: img_paths[x])

# split data
train_df, test_val_df = train_test_split(df, test_size=0.15 * 2, random_state=0)
test_df, valid_df = train_test_split(test_val_df, test_size=0.5, random_state=0)

print(f"Number of train samples: {len(train_df)}")
print(train_df["Label"].value_counts())
print(f"Number of valid samples: {len(valid_df)}")
print(valid_df["Label"].value_counts())
print(f"Number of test samples: {len(test_df)}")
print(test_df["Label"].value_counts())

train_df.to_csv(PNEUMOTHORAX_TRAIN_CSV)
valid_df.to_csv(PNEUMOTHORAX_VALID_CSV)
test_df.to_csv(PNEUMOTHORAX_TEST_CSV)`

I use this to generate three CSV
It looks like I used stage_2_train.csv incorrectly
so it replace to the stage_2_sample_submission.csv
and it work！
but after all ，something still go wrong

from gloria.

church-XP commented on September 14, 2024

this is what i generate train.csv using the code .
but it definitely go wrong which all the label are 1.0
anyway, so i restart the command
$python run.py -c ./configs/pneumothorax_segmentation_config.yaml --train --test --train_pct 0.01

and it appear this

from gloria.

church-XP commented on September 14, 2024

I really hope you can solve my problem. Thank you very much

from gloria.

church-XP commented on September 14, 2024

The code uses what's in your preprocess_datasets.py
def preprocess_pneumonia_data(test_fac=0.15)
if __name__ == "__main__": try: df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV) except: raise Exception( "Please make sure the the SIIM Pneumothorax dataset is \ stored at {PNEUMOTHORAX_DATA_DIR}" )

The rest is the same as your code

from gloria.

church-XP commented on September 14, 2024

Oh, I found something wrong with my train.csv
I downloaded the dataset
Files usually have the form :id_0011FE81E.dcm
My image ID is different from yours
I tried to use stage_2_train.csv which it look like

but failed

from gloria.

marshuang80 commented on September 14, 2024

Hi @church-XP, you are seeing the error message "ValueError: 'a' cannot be empty unless no samples are taken" because you have created your train/val/test.csv with stage_2_sample_submission.csv, which only contains positive samples.

Can you please try the following:

Download the images by running python download_images.py, which should be in your SIIM data directory. You can also find the script here (https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data?select=download_images.py)
The previous step should download the dicom-images-train folder for you. Please set that as your PNEUMOTHORAX_IMG_DIR
Set PNEUMOTHORAX_ORIGINAL_TRAIN_CSV to your train.csv
Rerun preprocess_pneumonia_data

Hope that helps.

from gloria.

church-XP commented on September 14, 2024

So these downloads file aren't what you're training with
you use the the stage 1 files ,not the The stage 2 files

I should use Python download_images.py to download,right?
Emmm
i can not connect which it Could not automatically determine credentials.

i follow the step ,but i still have problem

and i google ,it can't fix my problem
Is there a second way for me to download it or ...

from gloria.

church-XP commented on September 14, 2024

I tried it on the server and on my computer terminal
I was wondering if there is any other way to get this part of the data set

Does this mean that Cloud Health is not saved, so I can't access it there

from gloria.

church-XP commented on September 14, 2024

also，the kaggle not saved(The link point is empty)

from gloria.

church-XP commented on September 14, 2024

i find the dataset in other way ，hope it can work

from gloria.

marshuang80 commented on September 14, 2024

Good luck!

from gloria.

zyt0211 commented on September 14, 2024

i find the dataset in other way ，hope it can work

Hello! How were your questions about the dataset resolved? I'm struggling with this problem.

from gloria.

In the segmentation task, EncodedPixels seems to have an extra space, which I remove, but... about gloria HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent