Giter Site home page Giter Site logo

Comments (11)

pplaw avatar pplaw commented on August 20, 2024 1

Hi Felix,

Thank you very much and sorry for not having the most up-to-date SNPsplit to start with (I was on 0.3.2). It is now working fine! Thank you very much!

from snpsplit.

FelixKrueger avatar FelixKrueger commented on August 20, 2024

Hi Pui,

I theory you should be able use this information as to construct a genome, but you will need to adhere to the same rules that the script uses internally.

First of all, you cannot use / characters in names as it would probably do bad things. I would recommend an underscore _ instead, so maybe:

--strain JF1_Ms

Next, the SNPs will need to be contained within a folder called:

SNPs_JF1_Ms

inside there, the script expects one file per chromosome, e.g. like so:

chr10.txt
chr11.txt
chr12.txt
chr13.txt
chr14.txt
chr15.txt
chr16.txt
chr17.txt
chr18.txt
chr19.txt
chr1.txt
chr2.txt
chr3.txt
chr4.txt
chr5.txt
chr6.txt
chr7.txt
chr8.txt
chr9.txt
chrMT.txt
chrX.txt
chrY.txt

So you will have to split your tsv file by chromosome. And finally, these file need to adhere to the following format:

>10
42459235        10      3101362 1       G/C     1/1:101:31:0:284,101,0:255,90,0:2:51:31:0,0,22,9:0:-0.69311:.:1
42459276        10      3102112 1       T/C     1/1:127:50:0:288,164,0:255,151,0:2:44:50:0,0,22,28:0:-0.693147:.:1
42459298        10      3102661 1       C/G     1/1:127:59:0:290,192,0,.,.,.:255,178,0,.,.,.:2:48:59:0,0,34,25:0:-0.693147:.:1
42459325        10      3103479 1       A/G     1/1:127:49:0:288,161,0:255,148,0:2:57:49:0,0,17,32:0:-0.693147:.:1
...

where the first line is the name of the chromosome (like in a FastA file).

this is also explained in the option:

--skip_filtering              This option skips reading and filtering the VCF file. This assumes that a folder named
                              'SNPs_<Strain_Name>' exists in the working directory, and that text files with SNP information
                              are contained therein in the following format:
                                      SNP-ID     Chromosome  Position    Strand   Ref/SNP
                          example:   33941939        9       68878541       1       T/G

I believe column 6 is optional, but the first 5 need to be present. It is sufficient to put 1 as strand if the sequence is based relative to the top strand. So you will need to reformat your file just a little, and you should be able to run it. If you have no one to do this for you I might be able to help.

I hope this makes sense?

from snpsplit.

pplaw avatar pplaw commented on August 20, 2024

Hi Felix

Thanks a lot for your advice! It does make sense to me. I have tried to reformat the tsv file that I have and can I please check with you whether txt files look like this will be fine? (e.g. for chr1.txt, in the order of: "SNP-ID", "Chromosome", "Pos", "Strand", "Ref/SNP")

image

Thanks

Best,
Pui

from snpsplit.

FelixKrueger avatar FelixKrueger commented on August 20, 2024

Ref/SNP needs to be present without the double quotes, just

1        1      3000176    1        T/G
2        1      3000223    1        T/A
...

Otherwise it looks good to me!

from snpsplit.

pplaw avatar pplaw commented on August 20, 2024

Hi Felix,

Thanks a lot for your reply. I have now created the folder "SNPs_JF1_Ms" with the txt files in my working directory. However, when I tried to run the following command,:

SNPsplit_genome_preparation --reference /ref_genome/mm10/ --strain JF1_Ms --skip_filtering

I have an error message returned:
You need to provide a VCF file detailing SNPs positions with '--vcf_file your.file' (e.g.: --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz). Please respecify!

Can I please check with you whether I need to enter something for the --vcf flag even if I am using --skip_filtering?

Many thanks

from snpsplit.

FelixKrueger avatar FelixKrueger commented on August 20, 2024

Hmm, can you just try to specify any file as --vcf any_file, it should be skipped then. Maybe the logic needs to be changed to that specifying a VCF file is not enforced if you are using --skip_filtering. Let me know if this works, else I'll take a look.

from snpsplit.

pplaw avatar pplaw commented on August 20, 2024

Hi Felix,

Thanks a lot for your reply. I tried to specify different files or folders in my working directory to --vcf and in all cases I have the same message returned:
image

from snpsplit.

FelixKrueger avatar FelixKrueger commented on August 20, 2024

OK, I'll try to make up a hardcoded version for you and will attach it here once I've got it.

from snpsplit.

FelixKrueger avatar FelixKrueger commented on August 20, 2024

Quick question, which version of SNPsplit are you using? While trying to adapt the genome preparation script I noticed that it just seems to work straight out the box of you will (version 0.4.0), but this is because --skip_filtering has already hardcoded the list of chromosomes.

Command:

SNPsplit_genome_preparation --reference /bi//scratch/Genomes/Mouse/GRCm38/ --strain JF1_Ms --skip_filtering
Reference genome folder provided is /bi//scratch/Genomes/Mouse/GRCm38/  (absolute path is '/bi/scratch/Genomes/Mouse/GRCm38/)'

Summarising SNPsplit Genome Preparation Parameters
==================================================
Reading/filtering VCF file:             No (skipped by user)
Reference genome:                       /bi/scratch/Genomes/Mouse/GRCm38/
N-masking:                              Yes
Full SNP genome:                        No
SNP strain:                             JF1_Ms

Using the following chromosomes (HARDCODED IN!!!):
1       2       3       4       5       6       7       8       9       10      11      12      13      14      15      16      17      18      19      X       Y       MT

Skipped reading the VCF file and filtering SNPs again (specified by user)

Now reading in and storing sequence information of the genome specified in: /bi/scratch/Genomes/Mouse/GRCm38/

chr 1 (195471971 bp)
chr 10 (130694993 bp)
...

This uses standard mouse chromosomes 1-9, chrMT, chrX and chrY.

can you just try cloning the current development version and try again?

git clone https://github.com/FelixKrueger/SNPsplit.git

from snpsplit.

FelixKrueger avatar FelixKrueger commented on August 20, 2024

Maybe this is also of interest for you, I have just added provisional support for the MGP v7 variants file, which appears to include the strains you are interested in. Please see the description here:

https://github.com/FelixKrueger/SNPsplit/blob/master/CHANGELOG.md#snpsplit_genome_preparation

from snpsplit.

FelixKrueger avatar FelixKrueger commented on August 20, 2024

As a final comment, I have conducted some sorting followed by IMPLICON processing for B6/JF1 mice, and it worked just fine!

from snpsplit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.