Hello, I had an error when I try to run DESeq_stats through master_s

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Fix by <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

error while running run_DESeq_stats.R about samsa2 HOT 5 CLOSED

rachel1898 commented on July 17, 2024

error while running run_DESeq_stats.R

from samsa2.

Comments (5)

lisakmalins commented on July 17, 2024

Hi @rachel1898, I got the same error message when I was running SAMSA2 and I was able to fix it so I might be able to help. Did you make sure your filenames all start with control_ and experimental_ before starting the pipeline? Could you post a picture of your step_2_output/raw_counts.txt file?

from samsa2.

lisakmalins commented on July 17, 2024

Question for the developers: Was SAMSA2 written expecting the sample names to be strictly numeric? I ran into two errors in run_DESeq_stats.R which I was able to fix by modifying the script. First I got the same error that @rachel1898 posted above because my sample names were not numeric, so this operation (line 134) replaced them all with NA:

raw_counts_table$X2 <- as.numeric(as.character(raw_counts_table$X2))

So then when those values got used as column names, it made rbind (line 145) throw that error because it needs the column names to be the same between the two dataframes.

Later in the script I had another issue with my count values being turned into factors and introducing more NAs, which I think also might have happened because of my sample names not being numeric. I was able to fix the problem by modifying a few lines of the script, but I was wondering if maybe the real issue was that I misunderstood the usage instructions and should have changed my filenames before running the pipeline to avoid any problems.

Can you provide some guidance on what characters are allowed to be in the input filenames?

from samsa2.

transcript commented on July 17, 2024

Hello,

No, SAMSA2 wasn't written to explicitly expect numeric samples (although I believe that whitespace characters can sometimes mess things up).

I suspect what's happening is that, if the samples are too different, the rbind command leads to a bunch of additional rows because it can't find any common rows to merge on.

One useful check: look at the head of a couple of your files and see if they match the example files in https://github.com/transcript/samsa2/blob/master/sample_files_paired-end/6_RefSeq_org_results/control_1_TINY.RefSeq_annot_organism.tsv or similar.

Another option is to run this in RStudio and see if any of the intermediate tables look invalid.

Finally, if none of this is able to yield results or if you want me to look at one or two of the input files to see if I spot any inconsistencies, you could drop me an email ([email protected]) with one or two attached.

Sorry to hear you're having issues with my pipeline, and I hope I can resolve them!

from samsa2.

lisakmalins commented on July 17, 2024

Hi @transcript, thank you for the thoughtful response!

I have a reproducible example of the behavior that @rachel1898 and I observed:

git clone https://github.com/transcript/samsa2.git
cd samsa2
# Use the sample files, but remove an underscore
# control_1_TINY_R1.fastq --> control_1TINY_R1.fastq
cp -r sample_files_paired-end/1_starting_files input_files
for f in input_files/*; do mv $f ${f/_TINY/TINY}; done
bash setup_and_test/package_installation.bash
bash setup_and_test/full_database_download.bash
bash bash_scripts/master_script.sh
# run_DESeq_stats.R fails

Error message:

[1] "USAGE: $ run_DESeq_stats.R -I working_directory/ -O save.filename"
Working directory is  /redacted/path/to/folder/samsa2/output_files/step_5_output/RefSeq_results/org_results
Error in match.names(clabs, names(xi)) :
  names do not match previous names
Calls: rbind ... eval -> eval -> eval -> rbind -> rbind -> match.names
In addition: Warning message:
NAs introduced by coercion
Execution halted
'Rscript /redacted/path/to/folder/samsa2/R_scripts/run_DESeq_stats.R -I /redacted/path/to/folder/samsa2/output_files/step_5_output/RefSeq_results/org_results -O RefSeq_org_DESeq_results.tab -R /redacted/path/to/folder/samsa2/output_files/step_2_output/raw_counts.txt' exited with non-zero status 1

The problem is that in order to parse the information out of the filenames, run_DESeq_stats.R splits them by underscore into fields, expects the second field to be numeric, and after transposing uses it as column names for rbind.

# Using example data without messing with filenames
                                   V1   V2           X1 X2                   X3
1      control_1_TINY.cleaned.forward 2719      control  1 TINY.cleaned.forward
2      control_2_TINY.cleaned.forward 2695      control  2 TINY.cleaned.forward
3 experimental_3_TINY.cleaned.forward 2682 experimental  3 TINY.cleaned.forward
4 experimental_4_TINY.cleaned.forward 2684 experimental  4 TINY.cleaned.forward

However, if the filenames do not follow that pattern and the second field is not numeric, NA's are induced:

# Example with second underscore removed from filenames
                                  V1   V2
1      control_1TINY.cleaned.forward 2719
2      control_2TINY.cleaned.forward 2695
3 experimental_3TINY.cleaned.forward 2682
4 experimental_4TINY.cleaned.forward 2684

# Split on underscore
                                  V1   V2           X1                    X2
1      control_1TINY.cleaned.forward 2719      control 1TINY.cleaned.forward
2      control_2TINY.cleaned.forward 2695      control 2TINY.cleaned.forward
3 experimental_3TINY.cleaned.forward 2682 experimental 3TINY.cleaned.forward
4 experimental_4TINY.cleaned.forward 2684 experimental 4TINY.cleaned.forward

# Coercing column X2 to numeric induces NA
# and prevents rbind with complete_table dataframe
                                  V1   V2           X1 X2
1      control_1TINY.cleaned.forward 2719      control NA
2      control_2TINY.cleaned.forward 2695      control NA
3 experimental_3TINY.cleaned.forward 2682 experimental NA
4 experimental_4TINY.cleaned.forward 2684 experimental NA

I have a fix that allows more flexibility in the filenames. Are you open to pull requests?

from samsa2.

transcript commented on July 17, 2024

Fix by @lisakmalins added!

from samsa2.

error while running run_DESeq_stats.R about samsa2 HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent