clbutler / rm_trips Goto Github PK
View Code? Open in Web Editor NEWRepeatMasker Trinity based Parse Script
RepeatMasker Trinity based Parse Script
Hi there!
Amazing job done with RM_TRIPS! I want to apply it to some TE annotations on chromosome-level assemblies. Should I turn off the isoform filtering (step 3)?
Hi Christopher,
Thanks for sharing your script to parse RM output. I just used it as part of the FastTE pipeline - very useful!
I went through the R script more or less line by line today as I wanted to change the output format slightly and then noticed a bug. I traced this back to the way you are reading in the RM output in line 26. Basically, if the first column (sw_score) does not start with a white space, all of the columns of that line are shifted to the left and the resulting data frame is incorrect. I found a fix to get around this issue, see below.
Lines in raw RM output:
27245 0.0 0.0 1.7 Chr1 20619926 20623101 (9804570) + TE_00000209 __ClassII_DNA_CACTA_nMITE 2683 5805 (1708) 7595
625 2.4 0.0 2.4 Chr1 20623765 20623848 (9803823) + TE_00000209 __ClassII_DNA_CACTA_nMITE 5806 5887 (1626) 7596
1409 0.6 0.0 0.0 Chr1 20624203 20624367 (9803304) + TE_00000209 __ClassII_DNA_CACTA_nMITE 5878 6042 (1471) 7597
Original representation in the R data frame (first row should start with 27245, but all columns are shifted to the left):
sw_score perc_div perc_del perc_insert qry_id qry_start qry_end qry_left matching_repeat repeat_id matching_class no_bp_in_complement in_repeat_start in_repeat_end
0 0 1.7 Chr1 20619926 20623101 -9804570 + TE_00000209 __ClassII_DNA_CACTA_nMITE 2683 5805 -1708 7595
625 2.4 0 2.4 Chr1 20623765 20623848 -9803823 + TE_00000209 __ClassII_DNA_CACTA_nMITE 5806 5887 -1626
1409 0.6 0 0 Chr1 20624203 20624367 -9803304 + TE_00000209 __ClassII_DNA_CACTA_nMITE 5878 6042 -1471
R data frame after fixing:
sw_score perc_div perc_del perc_insert qry_id qry_start qry_end qry_left matching_repeat repeat_id matching_class no_bp_in_complement in_repeat_start in_repeat_end
27245 0 0 1.7 Chr1 20619926 20623101 -9804570 + TE_00000209 __ClassII_DNA_CACTA_nMITE 2683 5805 -1708
625 2.4 0 2.4 Chr1 20623765 20623848 -9803823 + TE_00000209 __ClassII_DNA_CACTA_nMITE 5806 5887 -1626
1409 0.6 0 0 Chr1 20624203 20624367 -9803304 + TE_00000209 __ClassII_DNA_CACTA_nMITE 5878 6042 -1471
This is how I fixed it (replace line 26 with these two lines of code):
str.res <- unlist(stringr::str_trim(x))
str.res <- unlist(stringr::str_split(str.res, "\\s+"))
Hope this is useful to others as well.
Best wishes,
Jasmin
The following code gives the error
Error in rename(test2, lowextremety = qry_start) :
object 'qry_start' not found
It seems this is because the extremety headings are removed using select, and there are also still headings named "qry_start", etc. I am curious if the rename should be the other way around or if there is something else going on here. I am a little confused as to what this precise bit of code does, and was wondering if it could just be removed? Thanks for any help.
`#####test2#####
#next snippet of code keeps merged elements that should have been merged seperate
test2 <- test2 %>% dplyr::select(-c('lowextremety', 'highextremety', 'mergedfraglength'))
test2 <- rename(test2, lowextremety = qry_start)
test2 <- rename(test2, highextremety = qry_end)
test2 <- rename(test2, mergedfraglength = qry_width)`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.