Hi, running TPMCalculator installed with bioconda) on a directory: h

Hi, The attached is a very simple python that you need to modify a b

Hi, You can use this simple python to create the matrix file. Execute i

Did you solve the problem with the I sent?

Hi, I've tested the tpmcalculator2matrixes.py.gz without any success. A

how to generate the Output file: [genes|transcripts]_data_per_samples.txt ? about tpmcalculator HOT 12 CLOSED

ncbi commented on June 6, 2024

how to generate the Output file: [genes|transcripts]_data_per_samples.txt ?

from tpmcalculator.

Comments (12)

r78v10a07 commented on June 6, 2024 1

Hi,
The attached script is a very simple python script that you need to modify a bit to make it work with your output files name. I created a modified version that may work.

The script works with the _genes.out files (compressed or not) expect files like this:

sample1_genes.out.gz
sample2_genes.out.gz
sample3_genes.out.gz

It will generate a matrix for the ExonTPM values with these columns:

Gene_Chr_Start   Chr   Start   End   ExonLength   sample1   sample2   sample3

and another similar for the exon reads:

Gene_Chr_Start   Chr   Start   End   ExonLength   sample1   sample2   sample3

tpmcalculator2matrixes.py.gz

Please, try and let me know.

from tpmcalculator.

r78v10a07 commented on June 6, 2024 1

Hi,
Are your chromosomes names or gene names only numbers?
Change line 30 to:

data[column]['Gene_Chr_Start']` = data[column]['Gene_Id'].map(str) + '_' + data[column]["Chr"].map(str) + '_' + data[column]["Start"].map(str)

Let me know if this works.

from tpmcalculator.

kmeusemann commented on June 6, 2024

Dear TPMcalculator team,
its quite urgent s any hints much appreciated!
best Karen

from tpmcalculator.

r78v10a07 commented on June 6, 2024

Hi,
You can use this simple python script to create the matrix file.
Execute it on the folder with all TPMCalculator results.
It will process the _sorted_genes.out files. If you want to process any other file just change the suffix in the script.

tpmcalculator2matrixes.py.gz

from tpmcalculator.

r78v10a07 commented on June 6, 2024

Did you solve the problem with the script I sent?

from tpmcalculator.

alexbougdour commented on June 6, 2024

Hi,
I've tested the tpmcalculator2matrixes.py.gz script without any success.
Alex

from tpmcalculator.

r78v10a07 commented on June 6, 2024

What are the errors?
What are your input files?

from tpmcalculator.

alexbougdour commented on June 6, 2024

The input files were bam files generated by subread-align. TPMcalculator generastes results_genes.uni, .out, .ent for each sample, but no merge file.
When running the tpmcalculator2matrixes.py script, here is was I got:

(tpmcalculator) alexandre@alexandre-Precision-Tower-5810:~/Documents/Spombe_data_tmp$ python tpmcalculator2matrixes.py
ExonTPM
Data columns: 0
Data rows: 0
Traceback (most recent call last):
File "/home/alexandre/miniconda3/envs/tpmcalculator/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Gene_Id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "tpmcalculator2matrixes.py", line 29, in
data[column]['Gene_Chr_Start'] = data[column]['Gene_Id'] + '' + data[column]["Chr"] + '' + data[column]["Start"].map(str)
File "/home/alexandre/miniconda3/envs/tpmcalculator/lib/python3.8/site-packages/pandas/core/frame.py", line 2902, in getitem
indexer = self.columns.get_loc(key)
File "/home/alexandre/miniconda3/envs/tpmcalculator/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 'Gene_Id'

from tpmcalculator.

alexbougdour commented on June 6, 2024

Thanks a lot ! It works just fine.
Alex

from tpmcalculator.

r78v10a07 commented on June 6, 2024

It uses file name as sample name.

from tpmcalculator.

venkan commented on June 6, 2024

Hi Roberto and Alex,

I have _genes.out files in a folder like below:

Sample1.sorted_genes.out
Sample2.sorted_genes.out
Sample3.sorted_genes.out

And I have this python code:

import os
import pandas

data = {}
columns = ['ExonTPM', 'ExonReads']
output_suffix = "_genes.out"
files = [ f for ds, df, files in os.walk('./') for f in files if output_suffix in f]
for column in columns:
    print(column)
    data[column] = pandas.DataFrame()
    for f in files:
        # Get sample name removing the suffix and check if the output is compressed
        if f.endswith('.gz'):
            output_suffix_real = output_suffix + '.gz'
        else:
            output_suffix_real = output_suffix
        s = f.replace(output_suffix_real, '')
        df = pandas.read_csv(f, sep='\t')
        df = df[['Gene_Id', 'Chr', 'Start', 'End', 'ExonLength', column]]
        df = df.rename(index=str, columns={column: s})
        if data[column].empty:
            data[column] = df
        else:
            data[column] = data[column].merge(df, on=['Gene_Id', 'Chr', 'Start', 'End', 'ExonLength'], how='outer')
    print('Data columns: ' + str(len(data[column].columns)))
    print('Data rows: ' + str(len(data[column])))

    # Printing TSV matrices
    data[column]['Gene_Chr_Start'] = data[column]['Gene_Id'] + '_' + data[column]["Chr"] + '_' + data[column]["Start"].map(str)
    data[column] = data[column].drop(['Gene_Id'], axis=1)
    cols = data[column].columns.tolist()
    cols = cols[-1:] + cols[:-1]
    data[column] = data[column][cols]
    data[column].to_csv( column + '.tsv', sep='\t', index=False, na_rep='0')

I used python tpmcalculator2matrixes.py

This gave an Error like below:

ExonTPM
sys:1: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
Data columns: 8
Data rows: 49773
Traceback (most recent call last):
  File "tpmcalculator2matrixes.py", line 30, in <module>
    data[column]['Gene_Chr_Start'] = data[column]['Gene_Id'] + '_' + data[column]["Chr"] + '_' + data[column]["Start"].map(str)
  File "/soft/apps/Python/2.7.11-goolf-1.7.20/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/ops.py", line 639, in wrapper
    arr = na_op(lvalues, rvalues)
  File "/soft/apps/Python/2.7.11-goolf-1.7.20/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/ops.py", line 586, in na_op
    result[mask] = op(x[mask], _values_from_object(y[mask]))
TypeError: cannot concatenate 'str' and 'int' objects

May I know what could be the issue?

from tpmcalculator.

venkan commented on June 6, 2024

@r78v10a07 Thanks a lot Roberto. It worked.

from tpmcalculator.

how to generate the Output file: [genes|transcripts]_data_per_samples.txt ? about tpmcalculator HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent