smithlabcode / dnmtools Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 8.0 31.13 MB

Tools for analyzing DNA methylation data

Home Page: https://dnmtools.readthedocs.io

License: GNU General Public License v3.0

Makefile 1.16% M4 5.45% C++ 90.04% Python 1.74% Dockerfile 0.29% Shell 1.32%

dnmtools's People

Contributors

Stargazers

Watchers

Forkers

iromeo qindan2008 masarunakajima wook2014 healthvivo punidramesh andrewdavidsmith shaohuaihan

dnmtools's Issues

Update PMD documentation to specify expected array input

The PMD section in the docs mentions the intricacies of the PMD algorithm when array data is given as input, but it does not say much about the expected input file format. Array data needs to be converted to methcounts, and we may need to provide specific commands and examples on how to do so.

amrtester does not verify input format

I accidentally gave it a bam file and it ran to completion, taking a very long time and giving non-sense results.

radmeth is too slow

It takes several hours in the human genome to test for all 28M CpGs on a two-case test with 10 replicates each. According to perf, half of the time in the code is spent on __ieee754_log_fma, which I think has to do with log-likelihoods within GSL. We may be able to optimize these.

roimethstat seems not be able to identify overlapping region

Hi Methpipe team,

I'm David, a PhD student and i am working on calculating the methylation in the PMD regions and the flanking area. Basically, i chop the PMDs and the extended regions into smaller bins and calculate the average methylation level. However, for some regions, the coordinates seem to overlap (e.g. the end coordinate surpass the start coordinate of the next region). No matter how i sort (sort -k 1,1 -k 3,3n -k 2,2n -k 6,6 or sort -k1,1 -k2,2n) the program still stated the region of interest file isn't sorted.

My version of the Methpipe is methpipe-5.0.0. Attached are the example screenshot (i do not think the -nan in PMD definition, which i also used Methpipe, is the reason). Thank you very much for the help.

Best Regards,
David

amrfinder eliminate_amrs_by_size

Currently we arbitrarily remove AMRs that are less than half the gap limit in size. We reference this as a "hack that has produced excellent results" in the manual, but our conversation about hmr today got me thinking about size cutoffs.

Is it possible to implement something like hmr's size distribution/quantile cutoff approach in amrfinder? Would it be more appropriate?

Segmentation fault from allelicmeth

I got a "Segmentation fault" error by running allelicmeth:
allelicmeth -c hg38/chrs_split/ -o sample_2.allelic sample_2.epiread
I checked the output. After about the 32,000,000th line, it kept writing:
chr1 0 + CpG
which I think overloaded the memory.

Comparison with other tools

Would it be possible to share a comparison of dnmtools with other tools?

Based on personal experience, I found your package faster (especially abysmal) and more reliable (especially when calling HMRs on data produced by dnmtools). But colleagues are not convinced, referring to some recent benchmark studies. I was wondering if you know of any studies or internal comparison supporting my personal experience with dnmtools being faster or more reliable.

Thanks so much!

sensible output from levels command when run on symmetric CpG counts files

A common scenario is to analyze methylation levels in subsets of the CpG sites (or some other specific local context) relative to some genomic intervals. This would mean that much of the typical output from the levels command contains invalid information -- lots of nan, etc. I wonder if there is a way to either circumvent this messy output, or allow the user to request only certain output among all of what levels can provide.

selectsites crashes when input file is a dir

amrfinder output

Hello,

I have a question regarding the output from running amrfinder. One of the columns looks something like "+:136". Is this the score for the input region? Also, what does the +/- denote? Thanks.

space between AMRs for merging

Currently this is set to be 1000 bp, which seems too large. I think something closer to 100 bp is more reasonable. Especially since this is tied to minimum size filtering.

SYM - collapsing counts for other methylation context

Hi @iromeo @songqiang @saketkc @egor-dolzhenko

Thanks for this great tool, I find using the sym option of dnmtools useful for making sense of CpG context methylated position, however, I guess it does not work for CXG, CCG, and CHH context which is important for those that might want to use this tool for plant DNA methylation calling.

I need your suggestion in processing the CHG (CXG, CCG) and CHH outputs for downstream analysis since the dnmtools sym was not designed to handle the processing of CHH and CHG methylation calls. what will you advise I do since in my case I need this methylation information to understand methylation patterns in my plant of interest? Thanks.

Regards.

Update needed for documentation for format

The description for the -B option needs to be updated.
The option -B should indicate that the output will in BAM format.

diff ERROR: bad order:

Hello,

I am trying to use dnmtools diff to calculate methylation differences between two .meth files generated using dnmtools counts. The command I used is dnmtools diff "$RESULTS"/hap1.sort.meth "$RESULTS"/hap2.sort.meth -o "$RESULTS"/2023-06-11_hap1_hap2_DiffMeth.diff -A

I get the following error :
ERROR: bad order:
chrom=chr1 [id=0] pos=0
appears after
chrom=chr1 [id=0] pos=140725478431896

I tried sorting the .meth files even though they are already sorted. One thing that is strange is a the end of the error message is says "pos=140725478431896". There is definitely not a position 140725478431896 in my data, since my genome is only 350 Mb. I also noticed that if I rerun the exact same command it generates the same error but with a different pos=Number at the end.

For example, the same command with the same input files produced the following error
ERROR: bad order:
chrom=chr1 [id=0] pos=0
appears after
chrom=chr1 [id=0] pos=7811889804981395782

Output validation of methpipe programs

Although our error checking has improved since the first release, there are still cases where certain programs fail silently, truncating output or producing an empty file, attempt to write to an unknown location and never create the output file, or produce incorrect output in some other way.

This issue is meant as a discussion of what the scope of our validation should be. Currently, methbase checks for file existence but not content. Should we have a tool in methpipe that, given a directory of methpipe-related-extension files (.meth, .hmr, .pmd, etc) and a chromosome directory, reads each file to validate their content?

How would we identify truncation in files where there could be no region on a particular chromosome?

What constitutes "neighboring" CpG in radmerge

Hi, thanks for the great package, I had a quick question. In the radmerge function what constitutes a neighboring CpG? How far apart do significant CpGs need to be to be considered a part of a separate DMR?

Simplify symmetric-cpgs

There is no need to have different behavior for excluding sites marked as having been mutated. This utility should not make such a decision, and the lines of output should be the same as the number of (symmetric) CpGs in the input file, regardless of any inference about whether a site has mutated.

Question with design matrix for radmeth regression and dmr output

Hello. I previously posted about a question surrounding how to create the design matrix under the 'issues' page for methpipe, which is now archived, when you want to account for biological variation between replicates within each group. I understand radmeth regression is set up to this but I am confused on the design matrix format. I am running radmeth regression, followed by radmeth adjust and then radmeth merge to call dmrs. My factor of interest is "is_b1" and is the factor with respect to which I want to test for differential methylation. Within each of the two groups, we have replicates from male and female mice. Ultimately, I want to remove any effects due to sex of the replicates and call dmrs between groups that are genotype-dependent. Do I need to have other columns, besides my factor of interest, which signify my covariates in my design matrix? Interestingly, I already tested out what the dmr output would look like if I ran radmeth regression with different design matrices and this is what I got. See below. If you look at the third and last design matrix, you'll see that I don't have a 'base' column and just 'is_b1' and 'is_F', and its count is so different from the above two, with over 1 million DMRs called. The only different between the third matrix and the first two is that the first two have a 'base' column. I can't explain why the dmr counts are so different. We went from a dmr count in the tens to millions when there isn't a 'base' column and just 'is_b1' and 'is_F'. I also want to add I am using the default parameters for radmeth adjust and radmeth merge. To conclude, I am just trying to figure out the right way to design my matrix and call dmrs based on my factor of interest, is_b1, and accounting for covariates, which in this case is sex. Thanks and any help is appreciated.

hmr - Segmentation fault (core dumped)

After merging two metylomes with merge-methcounts command, the hmr command writes out the Segmentation error, can't handle hmr -o 1_2_knockdown.symmetric_CpGs.sorted.hmr 1_2_knockdown.symmetric_CpGs.sorted.meth

Without merging on each file it works fine.

radmeth adjust outputs -1-1 in 7th field

When there average methylation difference in radmeth is zero from the C/(C+T) estimates of each case-control, radmeth adjust outputs lines like this:

chr1    10866   +       CpG     -1      -1      -1-1    3       3       25      25

This causes issues to filter significant CpGs. For example we often do (and the documentation suggests on page 16) that we filter significant FDR-corrected CpGs using

awk '$7 <= 0.01' input-adjusted.radmeth >output-significant.radmeth

This does not filter these lines with zero average difference. Should the p-values for these lines be 1 instead?

dmr says hmr file is not sorted

I produced hmr files using the following command:
dnmtools hmr -o ${METHA}_hypo.bed ${METHA}.sym.meth

However when I run dnmtools dmr I get the error:
regions not sorted in file: 231_NO_a_BS_hypo.bed

The hmr output had chrM after chrX, so I tried sorting using: sort -k1,1V -k2,2n 231_NO_a_oxBS_hypo.bed > 231_NO_a_oxBS_hypo.sorted.bed but I still get the same error. I've attached the two bed files. There are some random contigs (like chr14_GL000225v1_random), but as far as I can tell those are sorted correctly in the sorted file.

Can you provide some explanation as to how these files need to be sorted to get dmr to run? Or can you have dmr print out the line where it encounters something it thinks is out of order?

231_NO_a_BS_hypo.bed.txt
231_NO_a_BS_hypo.sorted.bed.txt

Sensivity to CG level relationship in simulation data

Hello,
I compared abismal and bismark performance on simulation data, the data was simulated by MethylFastq. I simulated samples with 10%,50%,90% CG. What I found was that sensitivity was highest at 10% for all different bin sizes. Is this counter intuitive that samples with the highest CG levels had highest amount False Negatives?

Thank you!

dnmtools hmr_rep: what is the "CONVERGED" for replicate methylomes.

Hi Guilherme,

Many thanks for the nice tool dnmtools! Would you kind please help to solve a problem when running the command dnmtools hmr_rep. when there are 2 or 3 replicate sets, the results showed as the following:
[separating by cpg desert]
[cpgs retained: 28091984]
[deserts removed: 103730]
ITR F size B size F PARAMS B PARAMS DELTA
1 3.71 21.13 0.544 2.033 5.770 0.963 1.00e+00
2 6.29 35.22 0.519 1.768 5.695 0.956 4.88e-01
3 8.21 45.26 0.513 1.670 5.683 0.954 2.59e-03
4 9.27 50.67 0.511 1.626 5.677 0.952 4.14e-04
5 9.77 53.20 0.510 1.606 5.673 0.952 4.97e-05
6 9.99 54.30 0.509 1.597 5.672 0.951 2.09e-06
7 10.09 54.76 0.509 1.592 5.672 0.951 4.60e-06
8 10.13 54.95 0.509 1.590 5.672 0.951 2.17e-06
9 10.14 55.01 0.509 1.589 5.673 0.951 5.92e-07
10 10.15 55.04 0.509 1.588 5.673 0.951 1.04e-07

When there are 4 replicate sets, the results showed as the following:
[separating by cpg desert]
[cpgs retained: 27883296]
[deserts removed: 105941]
ITR F size B size F PARAMS B PARAMS DELTA
1 3.36 19.36 0.518 1.774 5.348 0.920 1.00e+00
2 4.69 26.45 0.508 1.651 5.509 0.935 4.53e-01
3 5.23 28.94 0.508 1.598 5.586 0.940 7.30e-04
4 5.41 29.50 0.508 1.568 5.639 0.943 1.04e-04
5 5.45 29.41 0.509 1.548 5.681 0.946 5.04e-05
6 5.45 29.14 0.510 1.534 5.715 0.948 4.10e-05
CONVERGED

And when there are 5 replicate sets, the results showed as the following:
[separating by cpg desert]
[cpgs retained: 27675526]
[deserts removed: 107588]
ITR F size B size F PARAMS B PARAMS DELTA
1 3.07 16.15 0.523 1.477 5.558 0.927 1.00e+00
2 3.95 19.62 0.519 1.342 5.803 0.945 4.45e-01
3 4.19 19.89 0.524 1.281 5.980 0.955 7.84e-04
4 4.24 19.34 0.530 1.244 6.127 0.964 2.97e-04
5 4.22 18.70 0.535 1.217 6.250 0.970 2.20e-04
CONVERGED

I wonder if this is normal？ and what is the "CONVERGED" for replicate methylomes? Thanks!

With my best.

hmr requires symmetric data, fails on lifted over .meth files

I have perhaps an unusual use case. I am using whole genome methylation data from multiple closely related vertebrates. In order to compare apples-to-apples, I need a common coordinate system so I lifted them all over to human hg38. Now I want to run them through hmr (and dmr and several other of your wonderful tools!). However hmr is complaining that error: input is not symmetric-CpGs: .

My data is inherently symmetric because I specified merged symmetric coverage when I generated the bed file (that I converted to .meth with an awk script). My original data is from nanopore, and counts generated with modkit. It was direct methylation calling, not bisulfite, but once it has been converted to .meth format, I figured it didn't matter anymore.

Anyway, dnmtools hmr monkey.meth works fine, but after I lift it over to hg38, and filtered the lines with dnmtools liftfilter I tried dnmtools hmr monkey.hg38.meth and it fails with the symmetric problem. I wrote a script to remove any lines that "looked symmetric" and still no joy. Ideas?

HMR analysis

Hello @iromeo

Thanks for this great tool, I must also comment that your documentation is so crisp and clear.

I am exploring the use of your pipeline for analyzing DNA methylation data from a plant genome. In carrying out this aspect of the pipeline, I read from the documentation that the param.txt was trained on human data, and for correctness, the param.txt should always be regenerated based on the data to be used. My question is how can this be done? Is there a pipeline to insert my data to obtain this param model?

Regards.

bsrate output the distribution over reads

This is something that needs to be included. It should be pretty straightforward. Just keep a matrix where the rows are the number of Cs covered by a read, and the columns are the number of converted cytosines, and then the entry in the matrix is the number of reads having that number of CpG sites, and have that many that are converted. Or a histogram can be kept, but that loses flexibility for later calculations.

Are mate1 and mate2 alignment stats for non-paired reads? abismal

Hello,

this is a mapstats output for I received post-abismal alignment:

pairs:
total_pairs: 10299069
mapped:
num_mapped: 8885567
num_unique: 8327681
num_ambiguous: 557886
percent_mapped: 86.2754
percent_unique: 80.8586
percent_ambiguous: 5.41686
unique_error:
edits: 29271754
total_bases: 2325133228
error_rate: 0.0125893
num_unmapped: 1413502
num_skipped: 0
percent_unmapped: 13.7246
percent_skipped: 0
mate1:
total_reads: 1971388
mapped:
num_mapped: 1065463
num_unique: 702897
num_ambiguous: 362566
percent_mapped: 54.0463
percent_unique: 35.6549
percent_ambiguous: 18.3914
unique_error:
edits: 1129707
total_bases: 98145837
error_rate: 0.0115105
num_unmapped: 905925
num_skipped: 0
percent_unmapped: 45.9537
percent_skipped: 0
mate2:
total_reads: 1971388
mapped:
num_mapped: 723658
num_unique: 404643
num_ambiguous: 319015
percent_mapped: 36.708
percent_unique: 20.5258
percent_ambiguous: 16.1823
unique_error:
edits: 1016202
total_bases: 53483055
error_rate: 0.0190004
num_unmapped: 1247730
num_skipped: 0
percent_unmapped: 63.292
percent_skipped: 0

I am wondering if mate1 and mate2 are statistics for non-paired read? Or they so supposed to be summary statistics of pair1 and pair2, which would wouldn't justify the read count gap. Also would the overall alignment rate be the summation of all mapped from paired, mate1, and mate2 divided by the total reads from the same categories? Thanks!

Speeding up dnmtools states

Hi,
Is there a way to use higher memory/cores with DNMtools states?

Best
H

roimethstat and order of chromosomes

It seems roimethstat has problems when the chromosomes are not in the same order between the methylation file and the regions file.

Segmentation fault (core dumped) while using bsrate

I am trying to use bsrate function and am getting segmentation fault error. Here is the glimpse of my code:
"
INbam="${!SGE_TASK_ID}"
samtools view -H "${INbam/.bam/}".sam > "${INbam/.bam/}".headers.txt &&
awk -F "\t" '{if ($3=="MT") print}' "${INbam/.bam/}".sam > "${INbam/.bam/}".reads.sam &&
cat "${INbam/.bam/}".headers.txt "${INbam/.bam/}".reads.sam > "${INbam/.bam/}".chrMT.sam &&
rm "${INbam/.bam/}".headers.txt "${INbam/.bam/}".reads.sam && \

bsrate -c "${INbam/.bam/}".chrMT.sam -o "${INbam/.bam/}".bsrate "${INbam/.bam/}".sam
"

I am providing the bam files using an array job. My chrMT.sam files have headers text followed by the MT reads. I have attached my input file for bsrate. Any help will be appreciated.
S9_.bsbolt-grch38.sorted.deduped.chrMT.txt

Thanks.

Akshay

The `sym` command does not verify sorted order on inputs

Describe the bug
If the sym command is run without the input being sorted within a chromosome it will give incorrect results.

To Reproduce
Steps to reproduce the behavior:

Take any .counts file that is sorted.
Run shuf from the command line to permute the lines randomly
Run dnmtools sym on that file.

Expected behavior
The appropriate behavior is for the program to exit with a complaint about the order being incorrect.

The format command hangs when the input is empty but has a BAM header

Easily reproduced. Happens in some public data where the reads are of such poor quality that none map at all. The cause is finding a maximum possible "suffix length" for reads to determine which pairs of reads are mates, and if there are no reads this becomes infinite, and then a loop that tries to find the right suffix length will loop std::numeric_limits<size_t>::max() times.

File format bug in levels command

The levels command verifies that file format to ensure it includes all cytosines, and warns the user if it doesn't, requiring the -relaxed flag. The input may be gz compressed, but the format verification assumes the file is plain text.

Targeted Methylation Sequencing

Is your feature request related to a problem? Please describe.
I tried to use dnmtools as I always use it (indexing hg38, trim reads, map, methylation count) on a targeted methylation data from ELSA-Seq (e.g. SRR15143251: https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR15143251&display=metadata) but the end result was a very small numbers of CpGs (~500) with very low coverage

Describe the solution you'd like
I am not sure what was the cause but to make sure the data is good, I used an another DNAm pipeline and it worked as expected (but much slower of course). I was wondering if dnamtools is meant to work with targeted data. If not, would it be possible to add this feature?

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
If it is expected to work already, is there any changes I need to make to the regular instructions I follow for WGBS?

Thank you for your great tool and I hope it becomes even more popular.

merge-methcounts should run to completion when chroms are "unsorted the same way" across files

merge-methcounts stops at chr19 when the order of chromosomes is 10-19, then 1, then 20-22, then 2, then 3-9, then X, Y and M in human samples. There may be an internal check for ordering that stops the program when reached, and this is not desired behavior

CpGx

Hello,

for methylation outputs, what is "CpGx" as a methylation site category?

Thank you.

Replace C style heap allocation with stl containers

This is the wise choice and was forgotten after I worked with C code for several days. After noticing some memory leaks, I realized a character array from the heap should be vector<char> v and accessed with &v[0]. This should impact format-reads.cpp and uniq.cpp. Possibly methcounts.cpp.

pmd: sorted order of cpgs in input file

The only criteria we need is for chromosomes to be kept together. It would be much more robust and lead to fewer locale problems if we check that the chrom differs rather than increases to ensure these are kept together.

Radmeth Merge DMR Criterion

Hi,

Thanks for making such an approachable tool.

I was wondering about the exact criteria for calling a DMR in radmeth merge. I noticed in the code that at least one of the sites in the DMR is required to have been significant at the set FDR cutoff level before adjustment and multiple test correction. I was wondering if you could tell me more about the rationale for this - I can see that it would be more stringent, but I don't really get what the properties of this are.

Relatedly, are there other DMR-calling criteria that you have tried on radmeth regression output with FDR correction?

dnmtools allelic error: could not find chrom: chr1

Hi,
I am using DNMTOOLS Version: 1.1.0.
The command I am running is dnmtools allelic -c GRCh38.primary_assembly.genome.fa -o "$epiread".allelic "$epiread".
The error I am getting is could not find chrom: chr1.
However, the input epiread files do have chr1 data. Head of an epiread file is shown below-

I wonder if I am not specifying the correct genome file to the allelic command. Does the flag -c need the complete fasta file of HG38 or something else?
Best.

HMRs for replicates should be able to read tabular format

This is one of the purposes of the tabular format. It would reduce the work done in hmr_rep, but would require some logic for parsing the header lines in the tabular format as output by merge-methcounts.

pmr calling question

Hi,

I'm using dnmtools to call PMRs for allele-speicific-methylation (ASM) detection in the downstream. Since our study is to look at loss of imprinting, the methylation in the ASM regions can be lower than 50%. For example, if there is a 50% loss of imprinting on the methylated allele, the overall methylation combining both the two alleles will be 25%. Is there a way to tweak dnmtools hmr -partial to allow a deviation from 50% methylation?

Is there a detailed explanation about how dnmtools hmr -partial works? For example, how does it determine the PMR boundaries? How does it judge the congruence of the CpGs in the PMRs? And how does it decide what's the range of the methylation the PMRs should be in?

Thanks.

mlml output format

The output format currently for mlml starts with 3-column bed for individual sites. It possibly should be closer to the format for methcounts. We have redundant info, and no indication of the type of site (although CpG, symmetric seems most relevant).

Docs subdirectory

The documentation subdirectory should be changed to docs in the future, as the documentation continues to be migrated to readthedocs from the previous latex/pdf format.

bigWig_to_methcounts

Is there an equivalent to bigWig_to_methcounts from your methpipe here?
We still use your BigWig tracks from UCSC and would need to convert them to methcounts and the bigWig_to_methcounts is Python2. Nothing major but just a question. Thanks for you wonderful work!

Read name suffix length in format_reads

This should be learnable from the data based on, say, a few hundred thousand reads. It would involve opening the file for the check, learning the suffix length, then starting the current process.

radmeth-adjust replaces significant p-values in scientific notation with '1' and thus hides best DMC

'methpipe' to 'dnmtools' code refactoring introduced a critical bug in radmeth-adjust. In 1.2.1 version & current master branch the adjust step replaces significant p-values written in scientific notation with '1.0' and thus hides best DMC and mark them as totally insignificant.

Example:
Fragment of DMC table after regression step:

chr5    167087304       +       CpG     0.944042        2100    390     2421    453
chr5    167087308       +       CpG     0.000236524     2109    769     2432    724
chr5    167087366       +       CpG     0.00873441      2289    238     2633    211
chr5    167087627       +       CpG     2.22045e-16     2015    473     2366    1151
chr5    167087636       +       CpG     2.22045e-16     2000    1087    2359    1797
chr5    167087645       +       CpG     1.9873e-14      1949    964     2330    1619
chr5    167087722       +       CpG     3.29441e-11     1781    1102    2106    1587
chr5    167087880       +       CpG     0.464406        1605    1237    1864    1459
chr5    167087934       +       CpG     0.779924        1723    1609    1972    1846
chr5    167087938       +       CpG     0.743816        1732    1644    1991    1885

After adjust step:

chr5    167087304       +       CpG     0.944042        0.000158571     0.0199993       2100    390     2421    453
chr5    167087308       +       CpG     0.000236524     0.000158571     0.0199993       2109    769     2432    724
chr5    167087366       +       CpG     0.00873441      0.000158571     0.0199993       2289    238     2633    211
chr5    167087627       +       CpG     1       1       1       2015    473     2366    1151
chr5    167087636       +       CpG     1       1       1       2000    1087    2359    1797
chr5    167087645       +       CpG     1       1       1       1949    964     2330    1619
chr5    167087722       +       CpG     1       1       1       1781    1102    2106    1587
chr5    167087880       +       CpG     0.464406        0.794533        0.910647        1605    1237    1864    1459
chr5    167087934       +       CpG     0.779924        0.794533        0.910647        1723    1609    1972    1846
chr5    167087938       +       CpG     0.743816        0.794533        0.910647        1732    1644    1991    1885

the bug is in is_number function, it isn't aware that number could have '-' and 'e' chars as it is scientific notation:

static bool
is_number(const string& str) {
  for (const char &c : str)
    if (c != '.' && !std::isdigit(c)) return false;
  return true;
}

Example DMC table attached.
dmc.txt

I've implemented & tested a fix, will attach it as pull request

selectsites only works as expected when chroms are lexicographically sorted

If the methcounts file and the BED file have consistent chrom order, selectsites fast-forwards the BED file, skipping chromosomes that are lexicographically inferior to the next chrom in the methcounts file. We need to steer away from requiring LC_ALL=C lexicographic ordering, as it is very rarely how FASTA files order their chromosomes.

the entropy values of all CpG sites were negative

Hi，
I installed version 5.0.0 successfully and run without error. However, the entropy values of all CpG sites were negative. Was this normal?

Methcounts format for array data

Ideally the methcounts format for array data should be as uniform with the ordinary methcounts output as possible. I think this means using a numerical value for reads and methylation level. I propose here to use 0 coverage for no value, and 1 coverage for some value, while a 0.0 level would be required for 0 coverage. This seems consistent with the use in the pmd program. Fixing this requires changes to the docs and to the pmd program source.

smithlabcode / dnmtools Goto Github PK

dnmtools's People

Contributors

Stargazers

Watchers

Forkers

dnmtools's Issues

Recommend Projects

Recommend Topics

Recommend Org