Giter Site home page Giter Site logo

Comments (15)

church-XP avatar church-XP commented on September 14, 2024

At the beginning when I was working on the split task
It cannot preprocess to generate train.csv
I ran the code alone and the generation succeeded
as PNEUMOTHORAX_ORIGINAL_TRAIN_CSV using stage_2_sample_submission.csv

But even though I already have the paperwork
The error display is KeyError: 'EncodedPixels'
I thought it was an extra space, so I took it out
But then there were another problems
ValueError: 'a' cannot be empty unless no samples are taken
I really don't know how to deal with this

from gloria.

church-XP avatar church-XP commented on September 14, 2024

I found out that this is the prediction sample that you need to submit on Kaggle
Therefore, it should be spilt data with train.csv
But it doesn't work

KeyError: '1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520'
So I would like to ask how should I correspond to mask and imagEID
How do you corresponding ID_9979c1b39 and imageid 1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520

from gloria.

marshuang80 avatar marshuang80 commented on September 14, 2024

Hi there, I am having some difficulties understanding your questions. Can you elaborate on what you meant by "paperwork"? What is the "split task" you were working on? Can you please also provide the detailed error message and the specific command you ran so I can help you debug from my end? It is difficult to help without knowing the scripts you are referring to.

To answer your first question, the "a" here refers to the first parameters for np.random.choice, which is the list that we are randomly sampling. In this case, it is the variable, neg_series, which comes from self.df_neg["ImageId"].unique() from this line. I recommend you double-check if the ImageId column is correctly processed.

For your reference, I have attached PNEUMOTHORAX_TRAIN_CSV below:
train.csv

from gloria.

church-XP avatar church-XP commented on September 14, 2024

I'm sorry for the expression,Let me rephrase that

I am working on Your segmentation task.
I've updated the data path after constants.py
I can't find
PNEUMOTHORAX_ORIGINAL_TRAIN_CSV = "train-rle.csv"
The only files I downloaded were
stage_2_train.csv and stage_2_sample_submission.csv

I thought stage_2_train.csv was your train-rle.csv, but it didn't work
It tells me I lack of "train.csv"
In your preprocess_datasets.py
preprocess_pneumonia_data(test_fac=0.15): doesn't seem to be worked

PNEUMOTHORAX_ORIGINAL_TRAIN_CSV cannot be used to allocate train.csv, vald.csv, and test.csv
So I did this separately, but there seems to be a problem with the assignment
465ca5d6231fc9907dfc359b0356fad

`if name == "main":
try:
df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV)
except:
raise Exception(
"Please make sure the the SIIM Pneumothorax dataset is
stored at {PNEUMOTHORAX_DATA_DIR}"
)

# get image paths
img_paths = {}
for subdir, dirs, files in tqdm.tqdm(os.walk(PNEUMOTHORAX_IMG_DIR)):
    for f in files:
        if "dcm" in f:
            # remove dcm
            file_id = f[:-4]
            img_paths[file_id] = os.path.join(subdir, f)

# no encoded pixels mean healthy
df["Label"] = df.apply(
    lambda x: 0.0 if x["EncodedPixels"] == "-1" else 1.0, axis=1
)
df["Path"] = df["ImageId"].apply(lambda x: img_paths[x])

# split data
train_df, test_val_df = train_test_split(df, test_size=0.15 * 2, random_state=0)
test_df, valid_df = train_test_split(test_val_df, test_size=0.5, random_state=0)

print(f"Number of train samples: {len(train_df)}")
print(train_df["Label"].value_counts())
print(f"Number of valid samples: {len(valid_df)}")
print(valid_df["Label"].value_counts())
print(f"Number of test samples: {len(test_df)}")
print(test_df["Label"].value_counts())

train_df.to_csv(PNEUMOTHORAX_TRAIN_CSV)
valid_df.to_csv(PNEUMOTHORAX_VALID_CSV)
test_df.to_csv(PNEUMOTHORAX_TEST_CSV)`

I use this to generate three CSV
It looks like I used stage_2_train.csv incorrectly
so it replace to the stage_2_sample_submission.csv
and it work!
but after all ,something still go wrong

from gloria.

church-XP avatar church-XP commented on September 14, 2024

image
this is what i generate train.csv using the code .
but it definitely go wrong which all the label are 1.0
anyway, so i restart the command
$python run.py -c ./configs/pneumothorax_segmentation_config.yaml --train --test --train_pct 0.01
fe365c5399e68c7ef7289e112763a9e

and it appear this

from gloria.

church-XP avatar church-XP commented on September 14, 2024

I really hope you can solve my problem. Thank you very much

from gloria.

church-XP avatar church-XP commented on September 14, 2024

The code uses what's in your preprocess_datasets.py
def preprocess_pneumonia_data(test_fac=0.15)
if __name__ == "__main__": try: df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV) except: raise Exception( "Please make sure the the SIIM Pneumothorax dataset is \ stored at {PNEUMOTHORAX_DATA_DIR}" )

The rest is the same as your code

from gloria.

church-XP avatar church-XP commented on September 14, 2024

Oh, I found something wrong with my train.csv
I downloaded the dataset
Files usually have the form :id_0011FE81E.dcm
My image ID is different from yours
I tried to use stage_2_train.csv which it look like
6ab84a45a7ab11c7dfccbc862fa7642
but failed

b5733c24ec3820dabe4038c12b69b86

from gloria.

marshuang80 avatar marshuang80 commented on September 14, 2024

Hi @church-XP, you are seeing the error message "ValueError: 'a' cannot be empty unless no samples are taken" because you have created your train/val/test.csv with stage_2_sample_submission.csv, which only contains positive samples.

Can you please try the following:

  1. Download the images by running python download_images.py, which should be in your SIIM data directory. You can also find the script here (https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data?select=download_images.py)
  2. The previous step should download the dicom-images-train folder for you. Please set that as your PNEUMOTHORAX_IMG_DIR
  3. Set PNEUMOTHORAX_ORIGINAL_TRAIN_CSV to your train.csv
  4. Rerun preprocess_pneumonia_data

Hope that helps.

from gloria.

church-XP avatar church-XP commented on September 14, 2024

image
So these downloads file aren't what you're training with
you use the the stage 1 files ,not the The stage 2 files

I should use Python download_images.py to download,right?
Emmm
i can not connect which it Could not automatically determine credentials.
image
i follow the step ,but i still have problem
image
and i google ,it can't fix my problem
Is there a second way for me to download it or ...

from gloria.

church-XP avatar church-XP commented on September 14, 2024

I tried it on the server and on my computer terminal
I was wondering if there is any other way to get this part of the data set
image

Does this mean that Cloud Health is not saved, so I can't access it there

from gloria.

church-XP avatar church-XP commented on September 14, 2024

also,the kaggle not saved(The link point is empty)

from gloria.

church-XP avatar church-XP commented on September 14, 2024

i find the dataset in other way ,hope it can work

from gloria.

marshuang80 avatar marshuang80 commented on September 14, 2024

Good luck!

from gloria.

zyt0211 avatar zyt0211 commented on September 14, 2024

i find the dataset in other way ,hope it can work

Hello! How were your questions about the dataset resolved? I'm struggling with this problem.

from gloria.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.