Comments (15)
At the beginning when I was working on the split task
It cannot preprocess to generate train.csv
I ran the code alone and the generation succeeded
as PNEUMOTHORAX_ORIGINAL_TRAIN_CSV using stage_2_sample_submission.csv
But even though I already have the paperwork
The error display is KeyError: 'EncodedPixels'
I thought it was an extra space, so I took it out
But then there were another problems
ValueError: 'a' cannot be empty unless no samples are taken
I really don't know how to deal with this
from gloria.
I found out that this is the prediction sample that you need to submit on Kaggle
Therefore, it should be spilt data with train.csv
But it doesn't work
KeyError: '1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520'
So I would like to ask how should I correspond to mask and imagEID
How do you corresponding ID_9979c1b39 and imageid 1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520
from gloria.
Hi there, I am having some difficulties understanding your questions. Can you elaborate on what you meant by "paperwork"? What is the "split task" you were working on? Can you please also provide the detailed error message and the specific command you ran so I can help you debug from my end? It is difficult to help without knowing the scripts you are referring to.
To answer your first question, the "a" here refers to the first parameters for np.random.choice, which is the list that we are randomly sampling. In this case, it is the variable, neg_series, which comes from self.df_neg["ImageId"].unique()
from this line. I recommend you double-check if the ImageId column is correctly processed.
For your reference, I have attached PNEUMOTHORAX_TRAIN_CSV below:
train.csv
from gloria.
I'm sorry for the expression,Let me rephrase that
I am working on Your segmentation task.
I've updated the data path after constants.py
I can't find
PNEUMOTHORAX_ORIGINAL_TRAIN_CSV = "train-rle.csv"
The only files I downloaded were
stage_2_train.csv and stage_2_sample_submission.csv
I thought stage_2_train.csv was your train-rle.csv, but it didn't work
It tells me I lack of "train.csv"
In your preprocess_datasets.py
preprocess_pneumonia_data(test_fac=0.15): doesn't seem to be worked
PNEUMOTHORAX_ORIGINAL_TRAIN_CSV cannot be used to allocate train.csv, vald.csv, and test.csv
So I did this separately, but there seems to be a problem with the assignment
`if name == "main":
try:
df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV)
except:
raise Exception(
"Please make sure the the SIIM Pneumothorax dataset is
stored at {PNEUMOTHORAX_DATA_DIR}"
)
# get image paths
img_paths = {}
for subdir, dirs, files in tqdm.tqdm(os.walk(PNEUMOTHORAX_IMG_DIR)):
for f in files:
if "dcm" in f:
# remove dcm
file_id = f[:-4]
img_paths[file_id] = os.path.join(subdir, f)
# no encoded pixels mean healthy
df["Label"] = df.apply(
lambda x: 0.0 if x["EncodedPixels"] == "-1" else 1.0, axis=1
)
df["Path"] = df["ImageId"].apply(lambda x: img_paths[x])
# split data
train_df, test_val_df = train_test_split(df, test_size=0.15 * 2, random_state=0)
test_df, valid_df = train_test_split(test_val_df, test_size=0.5, random_state=0)
print(f"Number of train samples: {len(train_df)}")
print(train_df["Label"].value_counts())
print(f"Number of valid samples: {len(valid_df)}")
print(valid_df["Label"].value_counts())
print(f"Number of test samples: {len(test_df)}")
print(test_df["Label"].value_counts())
train_df.to_csv(PNEUMOTHORAX_TRAIN_CSV)
valid_df.to_csv(PNEUMOTHORAX_VALID_CSV)
test_df.to_csv(PNEUMOTHORAX_TEST_CSV)`
I use this to generate three CSV
It looks like I used stage_2_train.csv incorrectly
so it replace to the stage_2_sample_submission.csv
and it work!
but after all ,something still go wrong
from gloria.
this is what i generate train.csv using the code .
but it definitely go wrong which all the label are 1.0
anyway, so i restart the command
$python run.py -c ./configs/pneumothorax_segmentation_config.yaml --train --test --train_pct 0.01
and it appear this
from gloria.
I really hope you can solve my problem. Thank you very much
from gloria.
The code uses what's in your preprocess_datasets.py
def preprocess_pneumonia_data(test_fac=0.15)
if __name__ == "__main__": try: df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV) except: raise Exception( "Please make sure the the SIIM Pneumothorax dataset is \ stored at {PNEUMOTHORAX_DATA_DIR}" )
The rest is the same as your code
from gloria.
Oh, I found something wrong with my train.csv
I downloaded the dataset
Files usually have the form :id_0011FE81E.dcm
My image ID is different from yours
I tried to use stage_2_train.csv which it look like
but failed
from gloria.
Hi @church-XP, you are seeing the error message "ValueError: 'a' cannot be empty unless no samples are taken"
because you have created your train/val/test.csv with stage_2_sample_submission.csv, which only contains positive samples.
Can you please try the following:
- Download the images by running
python download_images.py
, which should be in your SIIM data directory. You can also find the script here (https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data?select=download_images.py) - The previous step should download the dicom-images-train folder for you. Please set that as your PNEUMOTHORAX_IMG_DIR
- Set PNEUMOTHORAX_ORIGINAL_TRAIN_CSV to your train.csv
- Rerun preprocess_pneumonia_data
Hope that helps.
from gloria.
So these downloads file aren't what you're training with
you use the the stage 1 files ,not the The stage 2 files
I should use Python download_images.py to download,right?
Emmm
i can not connect which it Could not automatically determine credentials.
i follow the step ,but i still have problem
and i google ,it can't fix my problem
Is there a second way for me to download it or ...
from gloria.
I tried it on the server and on my computer terminal
I was wondering if there is any other way to get this part of the data set
Does this mean that Cloud Health is not saved, so I can't access it there
from gloria.
also,the kaggle not saved(The link point is empty)
from gloria.
i find the dataset in other way ,hope it can work
from gloria.
Good luck!
from gloria.
i find the dataset in other way ,hope it can work
Hello! How were your questions about the dataset resolved? I'm struggling with this problem.
from gloria.
Related Issues (20)
- No such file or directory: '/gloria/CheXpert-v1.0/master_updated.csv' HOT 2
- Reproduced results on RSNA Pneumonia dataset HOT 1
- The reported results of RSNA dataset HOT 1
- PneumothoraxImageDataset for segmentation HOT 4
- pretrained code HOT 21
- how to load those saved models to continue training HOT 1
- Finetuned model for segmentation? HOT 1
- text prompts for RSNA dataset HOT 1
- image text Retriver HOT 1
- Some questions about the similarities? HOT 2
- Hello, if I don't have the file "master_updated. csv", how can I reproduce your experiment? Can you give me this file? Otherwise, the code won't run. HOT 1
- Does the CheXpert dataset include reports now? HOT 2
- Can not download the pretrained model HOT 5
- Hello, I can't find chexpert_8x200.csv HOT 1
- RuntimeError: grad can be implicitly created only for scalar outputs
- Large CPU usage rate
- Considerations for cheXpert 5*200?
- Pretrain with MIMIC-CXR Val Loss HOT 3
- pip install failed HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gloria.