Giter Site home page Giter Site logo

Comments (18)

dsampath31 avatar dsampath31 commented on August 29, 2024

@lincj1994 Hi, any luck with this issue? my script is stuck in the same state as well for quite some time

from battenberg.

sunnaa0423 avatar sunnaa0423 commented on August 29, 2024

Me too! And all the output tab files are empty, is there someone who can help with that?

from battenberg.

jcesar101 avatar jcesar101 commented on August 29, 2024

Hi,

just to confirm, did the pipeline resume execution after some time or it didn't?

Could you please also specify what battenberg version, R version and parameters are you using when invoking the pipeline?

Where was the reference data, used in this executions, obtained from?

Kind regards and thank you.

from battenberg.

dsampath31 avatar dsampath31 commented on August 29, 2024

from battenberg.

sunnaa0423 avatar sunnaa0423 commented on August 29, 2024

Hi,

I used Battenberg v2.2.10 and R v4.3.2. I tried with BAM data of around 7G and 44G, but the pipeline got stuck at the [1] "minCount=10". The waiting time exceeded 10 hours for the 7G data and over 5 days for the 44G data.

The output files include normal_alleleFrequencies_chrn.txt (1-23), tumor_alleleFrequencies_chrn.txt (1-23), tumor_mutantBAF.tab, tumor_mutantLogR.tab, tumor_normalBAF.tab, tumor_normalLogR.tab, tumor_alleleCounts.tab. These txt files seem normal, but all tab files only have column names.
Here are the input parameters and the command executed:

nb="/input/normal_hg38_sort_rmdup.bam" 
tb="/input/tumor_hg38_sort_rmdup.bam" 
outdir="/output/" 
Rscript /battenberg_pipline_test/battenberg_wgs.R -t tumor -n normal --tb ${tb} --nb ${nb} --sex Male -o ${outdir}

The battenberg_wgs.R script in the attachment was downloaded from the inst/example directory on GitHub and has been modified to update the file paths.
battenberg_wgs.json

Thanks for your help.

from battenberg.

fswirsky avatar fswirsky commented on August 29, 2024

Hi,

I'm similarly running into the same issue - I'm running the battenber_wgs.R pipeline on a matched tumour and normal BAM, 131GB and 77GB respectively. I'm currently on a runtime of 5 days and there haven't been any changes for almost all that time, with the most recent output being:

Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Reading locis
Done reading locis
Multi pos start:
Reading locis
Done reading locis
Multi pos start:
Reading locis
Reading locis
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
[1] "minCount=10"

And this was about ~1hr into running and hasn't changed since. I'm running the most recent version of Battenbery (v2.2.9) and R (v4.3.1). Similarly, the only outputted files - like sunnaa0423 - are: normal_alleleFrequencies_chrn.txt (1-23), tumor_alleleFrequencies_chrn.txt (1-23), tumor_mutantBAF.tab, tumor_mutantLogR.tab, tumor_normalBAF.tab, tumor_normalLogR.tab, tumor_alleleCounts.tab, with the allFrequencies.text files looking normal, but all .tab files have only column names. Any help is greatly appreciated.

from battenberg.

jcesar101 avatar jcesar101 commented on August 29, 2024

Hi,

thank you for providing additional details about this issue. Considering the step where the execution hangs and that the tab files are mostly empty, it looks like there is not enough memory allocated to the job when creating the BAF and LogR data frames.

Could you please confirm how much memory is currently being allocated and, if possible, try to increase the memory to see if that fixes the problem?

from battenberg.

fswirsky avatar fswirsky commented on August 29, 2024

Hi,

Thank you for replying promptly. At the moment I've got it running on an HPC - it is running across 34 cores with 122GB per core; I'm not sure if it automatically distributes the job across cores so maybe that is the problem. The current job is using all of my recourse allowance for the HPC so I can kill the job and retry with ~1TB of memory on just 1 core instead.

from battenberg.

a3schiller avatar a3schiller commented on August 29, 2024

Hi!

I have the same issue as @fswirsky. I'm using the latest version from the development branch. I'm running in "paired" mode with both tumor and normal BAM file around 100GB. I'm running on 28 cores (--cpu 28) with 8GB per core, however it seems like only 1 core is actually used.

Thanks in advance!

from battenberg.

jcesar101 avatar jcesar101 commented on August 29, 2024

Hi,

the default number of worker threads (if --cpu argument is not specified) is 8, thus, is not clear why it's only using one core. Could this be something associated with the resource allocation on the running environment? For instance, in a SGE HPC we have to specify the number of cores that should be allocated to the job (-pe [ smp | mpi | openmp ] <number_of_cores>) in addition to the --cpu parameter in the package.

from battenberg.

a3schiller avatar a3schiller commented on August 29, 2024

Hi,

Thanks for your fast reply! I can see that 28 cores are allocated to Battenberg but that only 1 is running on 100% and the rest is running on 0%. So it seems like the cores that are allocated to Battenberg is not used properly.

Best,
Alice

from battenberg.

jcesar101 avatar jcesar101 commented on August 29, 2024

The environment heavily affects the way paralelisation is performed (shared memory allocation, disk access, inter-process communication), but it's also worth noting that not all the steps in Battenberg can be parallelised (for instance when consolidating results at sample-level), and even for those steps that do have some degree of parallelisation, it may not always benefit from increasing the number of threads, but quite the opposite --e.g., due to overhead in thread synchronisation.

I would start with a relatively small number of threads and then gradually increase this number to see how the overall execution performs. In our environment, I have noticed a significant improvement when moving from 8 to 16 cores, but more than that has the opposite effect, and this may be different in other environments.

from battenberg.

a3schiller avatar a3schiller commented on August 29, 2024

Hi again!

After a lot of troubleshooting I believe that the problem is in the function getAlleleCounts which gives both tumor and normal alleleFrequencies_chr.txt files, but all values (Count_A, Count_C, Count_G, Count_T, Good_depth) are 0 in these files - which I assume is wrong. Also, the files alleleCounts.tab, mutantBAF.tab, mutantLogR.tab, normalBAF.tab, and normalLogR.tab consist only of headers and no other data. This results in an empty SNPpos used as input to split_genome and I get stuck in the first while-loop.

Since this problem originally comes from getAlleleCounts, my guess is that there is some problem related to alleleCounter rather than Battenberg - but I have not come further than this in my troubleshooting. Still wanted to post it here if it can be of any help to you or if someone have an solution to this problem!

Best,

Alice

from battenberg.

sunnaa0423 avatar sunnaa0423 commented on August 29, 2024

Hi there!

I found that the issue with “minCount=10” mainly arises from the problematic function battenberg() -> prepare_wgs() -> getBAFsAndLogRs() ->concatenateG1000SnpFiles(). This function(concatenateG1000SnpFiles()) is located in https://github.com/Wedge-lab/battenberg/tree/master/R/util.R, but I couldn't locate it in the source files of the R package downloaded on the HPC. Also, redefining it doesn't override the internal calls. How should I fix this? I need help!

Thanks!

Below is the content of the function, with modifications made on line 6:chrom <- paste0("chr",chrom)

concatenateG1000SnpFiles<-function(inputStart, inputEnd, chr_names) {
  data = list()
  for(chrom in chr_names) {
    filename = paste(inputStart, chrom, inputEnd, sep="")
    if(file.exists(filename) && file.info(filename)$size>0) {
      chrom <- paste0("chr",chrom)    #”chr" prefix should be added, otherwise it won't intersect with objects in the file, resulting in getting stuck at "minCount=10”.
      data[[chrom]] = cbind(chromosome=chrom, read_table_generic(filename))
    }
  }
  return(as.data.frame(do.call(rbind, data)))
}

from battenberg.

jcesar101 avatar jcesar101 commented on August 29, 2024

Hi again!

After a lot of troubleshooting I believe that the problem is in the function getAlleleCounts which gives both tumor and normal alleleFrequencies_chr.txt files, but all values (Count_A, Count_C, Count_G, Count_T, Good_depth) are 0 in these files - which I assume is wrong. Also, the files alleleCounts.tab, mutantBAF.tab, mutantLogR.tab, normalBAF.tab, and normalLogR.tab consist only of headers and no other data. This results in an empty SNPpos used as input to split_genome and I get stuck in the first while-loop.

Since this problem originally comes from getAlleleCounts, my guess is that there is some problem related to alleleCounter rather than Battenberg - but I have not come further than this in my troubleshooting. Still wanted to post it here if it can be of any help to you or if someone have an solution to this problem!

Best,

Alice

Hi Alice,

apologies for not replying earlier. That could very well be a problem with the alleleCounter, either to execute the command itself (e.g., executable location/privileges or environment variables) or something during the execution (e.g., resources, dependencies).

You can try executing the following command to test this tool outside of the Battenberg pipeline:

alleleCounter -b <bam_file> -l <single_chromosome_reference_file> -o <output_file> -m 20 -q 35

from battenberg.

jcesar101 avatar jcesar101 commented on August 29, 2024

Hi there!

I found that the issue with “minCount=10” mainly arises from the problematic function battenberg() -> prepare_wgs() -> getBAFsAndLogRs() ->concatenateG1000SnpFiles(). This function(concatenateG1000SnpFiles()) is located in https://github.com/Wedge-lab/battenberg/tree/master/R/util.R, but I couldn't locate it in the source files of the R package downloaded on the HPC. Also, redefining it doesn't override the internal calls. How should I fix this? I need help!

Thanks!

Below is the content of the function, with modifications made on line 6:chrom <- paste0("chr",chrom)

concatenateG1000SnpFiles<-function(inputStart, inputEnd, chr_names) {
  data = list()
  for(chrom in chr_names) {
    filename = paste(inputStart, chrom, inputEnd, sep="")
    if(file.exists(filename) && file.info(filename)$size>0) {
      chrom <- paste0("chr",chrom)    #”chr" prefix should be added, otherwise it won't intersect with objects in the file, resulting in getting stuck at "minCount=10”.
      data[[chrom]] = cbind(chromosome=chrom, read_table_generic(filename))
    }
  }
  return(as.data.frame(do.call(rbind, data)))
}

Hi!,

to prevent errors with or without the "chr" prefix, two set of reference files were provided in the following link. Please try this alternative before modifying the code.

from battenberg.

sunnaa0423 avatar sunnaa0423 commented on August 29, 2024

Hi there!
I found that the issue with “minCount=10” mainly arises from the problematic function battenberg() -> prepare_wgs() -> getBAFsAndLogRs() ->concatenateG1000SnpFiles(). This function(concatenateG1000SnpFiles()) is located in https://github.com/Wedge-lab/battenberg/tree/master/R/util.R, but I couldn't locate it in the source files of the R package downloaded on the HPC. Also, redefining it doesn't override the internal calls. How should I fix this? I need help!
Thanks!
Below is the content of the function, with modifications made on line 6:chrom <- paste0("chr",chrom)

concatenateG1000SnpFiles<-function(inputStart, inputEnd, chr_names) {
  data = list()
  for(chrom in chr_names) {
    filename = paste(inputStart, chrom, inputEnd, sep="")
    if(file.exists(filename) && file.info(filename)$size>0) {
      chrom <- paste0("chr",chrom)    #”chr" prefix should be added, otherwise it won't intersect with objects in the file, resulting in getting stuck at "minCount=10”.
      data[[chrom]] = cbind(chromosome=chrom, read_table_generic(filename))
    }
  }
  return(as.data.frame(do.call(rbind, data)))
}

Hi!,

to prevent errors with or without the "chr" prefix, two set of reference files were provided in the following link. Please try this alternative before modifying the code.

Hi

Firstly, I did download the reference files from this link, where under the 1000G_loci_hg38 folder, there are only these three types of files as follows.
Pasted Graphic 2

I'm not sure if the two types of files are you mentioned two set of reference files , but both of them contain "chr". There's only one file named XXX_allele_indexXXX, which is read by the concatenateG1000SnpFiles() function. However, this file does not contain a column with "chr".
image

getBAFsAndLogRs <- function (tumourAlleleCountsFile.prefix, normalAlleleCountsFile.prefix, 
    figuresFile.prefix, BAFnormalFile, BAFmutantFile, logRnormalFile, 
    logRmutantFile, combinedAlleleCountsFile, chr_names, g1000file.prefix, 
    minCounts = NA, samplename = "sample1", seed = as.integer(Sys.time())) 
{
    set.seed(seed)
    input_data = concatenateAlleleCountFiles(tumourAlleleCountsFile.prefix, 
        ".txt", chr_names)
    normal_input_data = concatenateAlleleCountFiles(normalAlleleCountsFile.prefix,".txt", chr_names)

    allele_data = concatenateG1000SnpFiles(g1000file.prefix,".txt", chr_names)

    chrpos_allele = paste(allele_data[, 1], "_", allele_data[, 
        2], sep = "")
    chrpos_normal = paste(normal_input_data[, 1], "_", normal_input_data[, 
        2], sep = "")
    chrpos_tumour = paste(input_data[, 1], "_", input_data[, 
        2], sep = "")
    matched_data = Reduce(intersect, list(chrpos_allele, chrpos_normal, 
        chrpos_tumour))
    allele_data = allele_data[chrpos_allele %in% matched_data, 
        ]
    normal_input_data = normal_input_data[chrpos_tumour %in% 
        matched_data, ]
    input_data = input_data[chrpos_tumour %in% matched_data, 
    ..............
    ..............
        ]

After the function concatenateG1000SnpFiles() reads it, the chromosome column in the “allele_data” variable obtained does not contain "chr".
head(allele_data)

image

Therefore, it is not possible to intersect with other variable.
I don't know how to solve this. Please give me some detailed solutions.
Thank you.

from battenberg.

jcesar101 avatar jcesar101 commented on August 29, 2024

Hi,

the zip file downloaded from ORA includes a file named 1000G_loci_hg38.zip with files named 1kg.phase3.v5a_GRCh38nounref_loci_chr_N_.txt with these contents:

1	16103
1	51479
1	51898
1	51928
1	54490
...

And it also includes a file named 1000G_loci_hg38_chr.zip that contains the same set of files 1kg.phase3.v5a_GRCh38nounref_loci_chr_N_.txt, but with these contents:

chr1	16103
chr1	51479
chr1	51898
chr1	51928
chr1	54490
...

Therefore, in this case, try using the zip files (extracted from the downloaded large zip file) ending with "_chr.zip" as they should be already edited with the chr prefix in the chromosome names.

from battenberg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.