Comments (5)
Hi @rachel1898, I got the same error message when I was running SAMSA2 and I was able to fix it so I might be able to help. Did you make sure your filenames all start with control_
and experimental_
before starting the pipeline? Could you post a picture of your step_2_output/raw_counts.txt
file?
from samsa2.
Question for the developers: Was SAMSA2 written expecting the sample names to be strictly numeric? I ran into two errors in run_DESeq_stats.R
which I was able to fix by modifying the script. First I got the same error that @rachel1898 posted above because my sample names were not numeric, so this operation (line 134) replaced them all with NA:
raw_counts_table$X2 <- as.numeric(as.character(raw_counts_table$X2))
So then when those values got used as column names, it made rbind
(line 145) throw that error because it needs the column names to be the same between the two dataframes.
Later in the script I had another issue with my count values being turned into factors and introducing more NAs, which I think also might have happened because of my sample names not being numeric. I was able to fix the problem by modifying a few lines of the script, but I was wondering if maybe the real issue was that I misunderstood the usage instructions and should have changed my filenames before running the pipeline to avoid any problems.
Can you provide some guidance on what characters are allowed to be in the input filenames?
from samsa2.
Hello,
No, SAMSA2 wasn't written to explicitly expect numeric samples (although I believe that whitespace characters can sometimes mess things up).
I suspect what's happening is that, if the samples are too different, the rbind
command leads to a bunch of additional rows because it can't find any common rows to merge on.
One useful check: look at the head
of a couple of your files and see if they match the example files in https://github.com/transcript/samsa2/blob/master/sample_files_paired-end/6_RefSeq_org_results/control_1_TINY.RefSeq_annot_organism.tsv or similar.
Another option is to run this in RStudio and see if any of the intermediate tables look invalid.
Finally, if none of this is able to yield results or if you want me to look at one or two of the input files to see if I spot any inconsistencies, you could drop me an email ([email protected]) with one or two attached.
Sorry to hear you're having issues with my pipeline, and I hope I can resolve them!
from samsa2.
Hi @transcript, thank you for the thoughtful response!
I have a reproducible example of the behavior that @rachel1898 and I observed:
git clone https://github.com/transcript/samsa2.git
cd samsa2
# Use the sample files, but remove an underscore
# control_1_TINY_R1.fastq --> control_1TINY_R1.fastq
cp -r sample_files_paired-end/1_starting_files input_files
for f in input_files/*; do mv $f ${f/_TINY/TINY}; done
bash setup_and_test/package_installation.bash
bash setup_and_test/full_database_download.bash
bash bash_scripts/master_script.sh
# run_DESeq_stats.R fails
Error message:
[1] "USAGE: $ run_DESeq_stats.R -I working_directory/ -O save.filename"
Working directory is /redacted/path/to/folder/samsa2/output_files/step_5_output/RefSeq_results/org_results
Error in match.names(clabs, names(xi)) :
names do not match previous names
Calls: rbind ... eval -> eval -> eval -> rbind -> rbind -> match.names
In addition: Warning message:
NAs introduced by coercion
Execution halted
'Rscript /redacted/path/to/folder/samsa2/R_scripts/run_DESeq_stats.R -I /redacted/path/to/folder/samsa2/output_files/step_5_output/RefSeq_results/org_results -O RefSeq_org_DESeq_results.tab -R /redacted/path/to/folder/samsa2/output_files/step_2_output/raw_counts.txt' exited with non-zero status 1
The problem is that in order to parse the information out of the filenames, run_DESeq_stats.R
splits them by underscore into fields, expects the second field to be numeric, and after transposing uses it as column names for rbind
.
# Using example data without messing with filenames
V1 V2 X1 X2 X3
1 control_1_TINY.cleaned.forward 2719 control 1 TINY.cleaned.forward
2 control_2_TINY.cleaned.forward 2695 control 2 TINY.cleaned.forward
3 experimental_3_TINY.cleaned.forward 2682 experimental 3 TINY.cleaned.forward
4 experimental_4_TINY.cleaned.forward 2684 experimental 4 TINY.cleaned.forward
However, if the filenames do not follow that pattern and the second field is not numeric, NA's are induced:
# Example with second underscore removed from filenames
V1 V2
1 control_1TINY.cleaned.forward 2719
2 control_2TINY.cleaned.forward 2695
3 experimental_3TINY.cleaned.forward 2682
4 experimental_4TINY.cleaned.forward 2684
# Split on underscore
V1 V2 X1 X2
1 control_1TINY.cleaned.forward 2719 control 1TINY.cleaned.forward
2 control_2TINY.cleaned.forward 2695 control 2TINY.cleaned.forward
3 experimental_3TINY.cleaned.forward 2682 experimental 3TINY.cleaned.forward
4 experimental_4TINY.cleaned.forward 2684 experimental 4TINY.cleaned.forward
# Coercing column X2 to numeric induces NA
# and prevents rbind with complete_table dataframe
V1 V2 X1 X2
1 control_1TINY.cleaned.forward 2719 control NA
2 control_2TINY.cleaned.forward 2695 control NA
3 experimental_3TINY.cleaned.forward 2682 experimental NA
4 experimental_4TINY.cleaned.forward 2684 experimental NA
I have a fix that allows more flexibility in the filenames. Are you open to pull requests?
from samsa2.
Fix by @lisakmalins added!
from samsa2.
Related Issues (20)
- Working with Archaea HOT 3
- Header parsing using blast format 6 with salltitles HOT 1
- AttributeError: module 'time' has no attribute 'clock' HOT 5
- Help with script
- Error running DIAMOND_subsystems_analysis_counter.py HOT 9
- About full_database_download.bash HOT 8
- full_database_download.bash which versions? HOT 1
- Error with R script: DESeq_stats. HOT 11
- Combine step 5 outputs from different samples HOT 1
- Issue with colors in 'make_combined_graphs.R HOT 1
- Sequencing depth differs between experimental and control HOT 1
- Error in DIAMOND_analysis_counter.py HOT 1
- issue with get_normalized_count_table.R
- Error in R-analysis step HOT 2
- error standardized_DIAMOND_analysis_counter.py HOT 4
- test_of_master_script.py HOT 2
- update HOT 1
- Information regarding New available databases - Refseq and SEED database HOT 2
- Loosing too many reads after Pear step HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from samsa2.