Giter Site home page Giter Site logo

Comments (12)

r78v10a07 avatar r78v10a07 commented on June 6, 2024 1

Hi,
The attached script is a very simple python script that you need to modify a bit to make it work with your output files name. I created a modified version that may work.

The script works with the _genes.out files (compressed or not) expect files like this:

sample1_genes.out.gz
sample2_genes.out.gz
sample3_genes.out.gz

It will generate a matrix for the ExonTPM values with these columns:

Gene_Chr_Start   Chr   Start   End   ExonLength   sample1   sample2   sample3   

and another similar for the exon reads:

Gene_Chr_Start   Chr   Start   End   ExonLength   sample1   sample2   sample3   

tpmcalculator2matrixes.py.gz

Please, try and let me know.

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024 1

Hi,
Are your chromosomes names or gene names only numbers?
Change line 30 to:

data[column]['Gene_Chr_Start']` = data[column]['Gene_Id'].map(str) + '_' + data[column]["Chr"].map(str) + '_' + data[column]["Start"].map(str)

Let me know if this works.

from tpmcalculator.

kmeusemann avatar kmeusemann commented on June 6, 2024

Dear TPMcalculator team,
its quite urgent s any hints much appreciated!
best Karen

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,
You can use this simple python script to create the matrix file.
Execute it on the folder with all TPMCalculator results.
It will process the _sorted_genes.out files. If you want to process any other file just change the suffix in the script.

tpmcalculator2matrixes.py.gz

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Did you solve the problem with the script I sent?

from tpmcalculator.

alexbougdour avatar alexbougdour commented on June 6, 2024

Hi,
I've tested the tpmcalculator2matrixes.py.gz script without any success.
Alex

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

What are the errors?
What are your input files?

from tpmcalculator.

alexbougdour avatar alexbougdour commented on June 6, 2024

The input files were bam files generated by subread-align. TPMcalculator generastes results_genes.uni, .out, .ent for each sample, but no merge file.
When running the tpmcalculator2matrixes.py script, here is was I got:

(tpmcalculator) alexandre@alexandre-Precision-Tower-5810:~/Documents/Spombe_data_tmp$ python tpmcalculator2matrixes.py
ExonTPM
Data columns: 0
Data rows: 0
Traceback (most recent call last):
File "/home/alexandre/miniconda3/envs/tpmcalculator/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Gene_Id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "tpmcalculator2matrixes.py", line 29, in
data[column]['Gene_Chr_Start'] = data[column]['Gene_Id'] + '' + data[column]["Chr"] + '' + data[column]["Start"].map(str)
File "/home/alexandre/miniconda3/envs/tpmcalculator/lib/python3.8/site-packages/pandas/core/frame.py", line 2902, in getitem
indexer = self.columns.get_loc(key)
File "/home/alexandre/miniconda3/envs/tpmcalculator/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 'Gene_Id'

from tpmcalculator.

alexbougdour avatar alexbougdour commented on June 6, 2024

Thanks a lot ! It works just fine.
Alex

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

It uses file name as sample name.

from tpmcalculator.

venkan avatar venkan commented on June 6, 2024

Hi Roberto and Alex,

I have _genes.out files in a folder like below:

Sample1.sorted_genes.out
Sample2.sorted_genes.out
Sample3.sorted_genes.out

And I have this python code:

import os
import pandas

data = {}
columns = ['ExonTPM', 'ExonReads']
output_suffix = "_genes.out"
files = [ f for ds, df, files in os.walk('./') for f in files if output_suffix in f]
for column in columns:
    print(column)
    data[column] = pandas.DataFrame()
    for f in files:
        # Get sample name removing the suffix and check if the output is compressed
        if f.endswith('.gz'):
            output_suffix_real = output_suffix + '.gz'
        else:
            output_suffix_real = output_suffix
        s = f.replace(output_suffix_real, '')
        df = pandas.read_csv(f, sep='\t')
        df = df[['Gene_Id', 'Chr', 'Start', 'End', 'ExonLength', column]]
        df = df.rename(index=str, columns={column: s})
        if data[column].empty:
            data[column] = df
        else:
            data[column] = data[column].merge(df, on=['Gene_Id', 'Chr', 'Start', 'End', 'ExonLength'], how='outer')
    print('Data columns: ' + str(len(data[column].columns)))
    print('Data rows: ' + str(len(data[column])))

    # Printing TSV matrices
    data[column]['Gene_Chr_Start'] = data[column]['Gene_Id'] + '_' + data[column]["Chr"] + '_' + data[column]["Start"].map(str)
    data[column] = data[column].drop(['Gene_Id'], axis=1)
    cols = data[column].columns.tolist()
    cols = cols[-1:] + cols[:-1]
    data[column] = data[column][cols]
    data[column].to_csv( column + '.tsv', sep='\t', index=False, na_rep='0')

I used python tpmcalculator2matrixes.py

This gave an Error like below:

ExonTPM
sys:1: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
Data columns: 8
Data rows: 49773
Traceback (most recent call last):
  File "tpmcalculator2matrixes.py", line 30, in <module>
    data[column]['Gene_Chr_Start'] = data[column]['Gene_Id'] + '_' + data[column]["Chr"] + '_' + data[column]["Start"].map(str)
  File "/soft/apps/Python/2.7.11-goolf-1.7.20/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/ops.py", line 639, in wrapper
    arr = na_op(lvalues, rvalues)
  File "/soft/apps/Python/2.7.11-goolf-1.7.20/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/ops.py", line 586, in na_op
    result[mask] = op(x[mask], _values_from_object(y[mask]))
TypeError: cannot concatenate 'str' and 'int' objects

May I know what could be the issue?

from tpmcalculator.

venkan avatar venkan commented on June 6, 2024

@r78v10a07 Thanks a lot Roberto. It worked.

from tpmcalculator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.