Giter Site home page Giter Site logo

rivm-bioinformatics / sars2seq Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 0.0 1.44 MB

SARS2seq is a pipeline designed to process raw FastQ data from targeted SARS-CoV-2 sequencing and generate biologically correct consensus sequences of the SARS-CoV-2 genome.

Home Page: https://rivm-bioinformatics.github.io/SARS2seq/

License: GNU Affero General Public License v3.0

Python 97.83% Shell 2.17%
bioinformatics consensus-sequences coronavirus coronavirus-analysis ngs-analysis public-health python rivm snakemake snakemake-workflow virology viruses

sars2seq's People

Contributors

florianzwagemaker avatar github-actions[bot] avatar ids-bioinformatics avatar khajji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sars2seq's Issues

Unique amplicon coverages overview does not properly take primer start and stopping positions into account

As reported by @landmanf
The current scripts for retrieving the unique amplicon coverages do not properly handle the starting and stopping positions of said amplicons.

Currently, the ending position of amplicon 1 is immediately the starting position of amplicon 2. However, there should be a "void" space in between those two amplicons as this is the space where both amplicons overlap and therefore this area shouldn't be counted for either amplicon.

The issue can probably be found in the snippet below:

def Find_NonOverlap(df):
dd = df.to_dict(orient="records")
startingpoint = {}
endingpoint = {}
lastindex = list(enumerate(dd))[-1][0]
for x, v in enumerate(dd):
s = v.get("leftstop")
t_end = v.get("rightstart")
if x != lastindex:
end_override = dd[x + 1].get("leftstop")
else:
end_override = None
if end_override is not None:
if end_override in range(s, t_end):
primerstart = s
primerend = end_override
else:
primerstart = s
primerend = t_end
else:
primerstart = s
primerend = t_end
startingpoint[primerstart] = v.get("name")
endingpoint[primerend] = v.get("name")
startdf = (
pd.DataFrame.from_dict(startingpoint, orient="index")
.reset_index()
.rename(columns={0: "name", "index": "unique_start"})
)
enddf = (
pd.DataFrame.from_dict(endingpoint, orient="index")
.reset_index()
.rename(columns={0: "name", "index": "unique_end"})
)
df = pd.merge(df, startdf, on="name", how="inner")
df = pd.merge(df, enddf, on="name", how="inner")
return df

This issue serves for logging purposes

Don't include a sample if the sample name has a space

Rewrite the regex rules to make sure that a sample will not be ran by the pipeline if a sample contains whitespace in its name:

Valid filename:
Example_data.fastq.gz

Invalid filename:
Example_data .fastq.gz (note the space between samplename and file extensions)

Change the regex to something like the following for nanopore data: ([ ]*)([\S]*)\.f(ast)?q(\.gz)?

  • Group 1 matches only if there's a space in the samplename, empty group if all is well.
  • Group 2 matches the actual sample name as long as there's no space in the sample name. Empty group if there's a space.
  • Group 3 & 4: matches the file extension(s)

Using these groups allows us to not include a sample in the samplesheet if it has a space in its name

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.