Giter Site home page Giter Site logo

pneumokity's People

Contributors

antunderwood avatar carmensheppard avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pneumokity's Issues

Median multiplicity as quality metric

Look at using Median multiplicity as a measure of mixtures in mixed samples. PneumoCaT2 more sensitive to mixtures and MM cut off is set quite low, could be set higher to reduce number of mixtures found. OR could potentially call both serotypes fully ??

eg 11D type strain appears mixed in both runs 709662 and 917067 . Not called in PneumoCaT1 but appears in coverage summary... median multiplcity of second type (15C) is only 5 but hit hashes is very high.

Probably should report median multiplicity of top hits in addition to hit % as a quality metric.

18F reference sequence failing with <70% hit RED rag status

Ref sequence for 18F from PneumoCaT ENA set of sequences is failing with RED rag status <70% hit. Investigate references in DB.

Check if this happens with other 18F isolates (if available) but given ref seq shoudl be from the very organism this is tested against it should work 100%

List index out of range interpreting serotype 39 result

Analysing mash screen tsv for 709775 serotype 39 type strain:

Analysing mash screen output. Traceback (most recent call last): File "/home/carmen.sheppard/gitrepository/PneumoCaT2/pneumocat2.py", line 80, in <module> main(args, version) File "/home/carmen.sheppard/gitrepository/PneumoCaT2/pneumocat2.py", line 38, in main run_parse(analysis, tsvfile) File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/PneumoCaT2/run_scripts/run_stage1.py", line 160, in run_parse group_check(filtered_df, analysis.database) File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/PneumoCaT2/run_scripts/run_stage1.py", line 92, in group_check stage1_result = list(get_pheno_list(results, session)) File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/PneumoCaT2/run_scripts/run_stage1.py", line 34, in get_pheno_list out_res.append(pheno[0][0]) IndexError: list index out of range

Check Mixed serotype determination - for hit and group in mix.

Missed mixed serotype determination for one sample with top hits 17F 99.7% and 15B 94.5% (ID 300858) check whether this is because of missing stage 2 for 15BC or not correctly handled ie, stage 2 determination run when should not be or overriding stage1 result.

Filename checking for read 1 and read 2

implement filename checking using glob regex (user input if different from standard) to check that read 1 and read 2 are present and to get them stored correctly in file (if -i used). (plus maybe remove from sample ID).

add sample ID to stdout

Errors can give no indication of which sample failed - eg:

Running PneumoCaT 2.0a - Development Reference CTV.db database at /phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/ctvdb selected. WARNING: Existing files in output dir will be overwritten Used Mash version 2.2 Analysing mash screen output. 6A_6B_6C_6D Screen reference: /phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/ctvdb/6A_6B_6C_6D/wciN.msh Analysing mash screen output. Allele wciN unrecognised possible variant Traceback (most recent call last): File "/home/carmen.sheppard/gitrepository/pneumocat2/pneumocat2.py", line 79, in <module> main(args, version) File "/home/carmen.sheppard/gitrepository/pneumocat2/pneumocat2.py", line 49, in main start_analysis(analysis) File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/run_scripts/run_stage2.py", line 39, in start_analysis sort_genes(gene, analysis, gene.var_type, session) File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/run_scripts/screen_genes.py", line 40, in sort_genes stage2_var = get_variant_ids(hit_genes, allele_or_gene, analysis.grp_id, session)[0] File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/run_scripts/utilities.py", line 231, in get_variant_ids raise CtvdbError exceptions.CtvdbError: CtvdbError: check CTV.db and folder integrity, missing or mismatching information may be present.

Add sample info to stdout early in run

df.append() depreciated

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

->Update to concat method . Does not crash run but gives annoying screen outputs!

singularity image

Hello, I am just checking to see whether you are going to have pneumokity singularity image? thanks

Error writing empty dataframe

Some samples fail when writing output files

Sample: 308597_RR1600088809-1
Traceback (most recent call last):
File "/home/carmen.sheppard/gitrepository/PneumoCaT2/pneumocat2.py", line 80, in
main(args, version)
File "/home/carmen.sheppard/gitrepository/PneumoCaT2/pneumocat2.py", line 67, in main
handle_results(analysis)
File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/PneumoCaT2/run_scripts/tools.py", line 314, in handle_results
collate_results(analysis.csv_collate, results)
File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/PneumoCaT2/run_scripts/tools.py", line 292, in collate_results
df = pd.read_csv(collate_file)
File "/home/carmen.sheppard/.conda/envs/pneumocat2/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/carmen.sheppard/.conda/envs/pneumocat2/lib/python3.7/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/carmen.sheppard/.conda/envs/pneumocat2/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in init
self._make_engine(self.engine)
File "/home/carmen.sheppard/.conda/envs/pneumocat2/lib/python3.7/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/carmen.sheppard/.conda/envs/pneumocat2/lib/python3.7/site-packages/pandas/io/parsers.py", line 1917, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

This is in handle_results as last part of script.

Remove text from "predicted phenotype" field for group level detemination

Remove the extra text from predicted phenotype for thos which are determined to group level. Will just cause headaches for analysis downstream.

eg: Serotype within 35A_35C_42 unable to determine serotype using PneumoKITy due to requirement for sensitive sequence analysis.

Interpret to similar to others eg 24F/B or 12A/12B/44/46

"mixed serotypes" for 19F and 32F/A when not mixed

Some serotypes giving "mixed serotypes " when not - probably due to detection of more than one sequence at stage 1 (as expected) when the sample is then picked out later as a type rather than mixed.

False Ctvdberror

wciN with unrecognised allele is leading to a false ctvdb error:
Running PneumoCaT 2.0a - Development Reference CTV.db database at /phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/ctvdb selected. WARNING: Existing files in output dir will be overwritten Used Mash version 2.2 Analysing mash screen output. 6A_6B_6C_6D Screen reference: /phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/ctvdb/6A_6B_6C_6D/wciN.msh Analysing mash screen output. Allele wciN unrecognised possible variant Traceback (most recent call last): File "/home/carmen.sheppard/gitrepository/pneumocat2/pneumocat2.py", line 79, in <module> main(args, version) File "/home/carmen.sheppard/gitrepository/pneumocat2/pneumocat2.py", line 49, in main start_analysis(analysis) File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/run_scripts/run_stage2.py", line 39, in start_analysis sort_genes(gene, analysis, gene.var_type, session) File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/run_scripts/screen_genes.py", line 40, in sort_genes stage2_var = get_variant_ids(hit_genes, allele_or_gene, analysis.grp_id, session)[0] File "/phengs/hpc_storage/home/carmen.sheppard/gitrepository/pneumocat2/run_scripts/utilities.py", line 231, in get_variant_ids raise CtvdbError exceptions.CtvdbError: CtvdbError: check CTV.db and folder integrity, missing or mismatching information may be present.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.