aweimann / traitar Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 25.0 32.13 MB

License: GNU General Public License v3.0

Python 100.00%

traitar's People

Contributors

Stargazers

Watchers

Forkers

hzi-bifo audy fauziharoon philippmuench foobarx xvtyzn celiosantosjr biofarmer valerie0

traitar's Issues

error while running traitar phenotype

I get the following error when trying to use traitar. Not sure what's up. Please advise.

NM! fixed it!

error:

Traceback (most recent call last):
  File "/export/scratch/tward/miniconda3/envs/traitar/bin/traitar", line 492, in <module>
    args.func(args)
  File "/export/scratch/tward/miniconda3/envs/traitar/bin/traitar", line 32, in phenolyze
    p = Traitar(args.input_dir, args.output_dir, args.sample2file, args.cpus, args.rearrange_heatmap, args.heatmap_format, args.no_heatmap_phenotype_clustering, args.no_heatmap_sample_clustering, args.gene_gff_type, args.primary_models, args.secondary_models)
  File "/export/scratch/tward/miniconda3/envs/traitar/bin/traitar", line 71, in __init__
    self.s2f = self.parse_sample_f()
  File "/export/scratch/tward/miniconda3/envs/traitar/bin/traitar", line 129, in parse_sample_f
    if not os.path.exists(os.path.join(self.input_dir,i)):
  File "/export/scratch/tward/miniconda3/envs/traitar/lib/python2.7/posixpath.py", line 68, in join
    if b.startswith('/'):
AttributeError: 'float' object has no attribute 'startswith'

command:

traitar phenotype /home/data/shotgun_fastas /home/traitar_test/traitar_file.txt from_nucleotides /home/trait_test/traitar_output

The files in the /home/data/shotgun_fastas dir are:

ls /home/data/shotgun_fastas
5425.137.R1.fna  5636.141.R1.fna  5636.205.R1.fna  5636.268.R1.fna  5636.344.R1.fna  5636.384.R1.fna  5636.441.R1.fna  5636.484.R1.fna  5636.547.R1.fna  5636.59.R1.fna   5636.77.R1.fna
5425.249.R1.fna  5636.143.R1.fna  5636.224.R1.fna  5636.269.R1.fna  5636.347.R1.fna  5636.389.R1.fna  5636.446.R1.fna  5636.487.R1.fna  5636.54.R1.fna   5636.600.R1.fna  5636.80.R1.fna
5425.286.R1.fna  5636.144.R1.fna  5636.225.R1.fna  5636.278.R1.fna  5636.349.R1.fna  5636.393.R1.fna  5636.451.R1.fna  5636.488.R1.fna  5636.551.R1.fna  5636.602.R1.fna  5636.88.R1.fna
5425.297.R1.fna  5636.146.R1.fna  5636.227.R1.fna  5636.288.R1.fna  5636.350.R1.fna  5636.395.R1.fna  5636.456.R1.fna  5636.495.R1.fna  5636.552.R1.fna  5636.604.R1.fna  5636.92.R1.fna
5425.342.R1.fna  5636.153.R1.fna  5636.229.R1.fna  5636.290.R1.fna  5636.353.R1.fna  5636.397.R1.fna  5636.457.R1.fna  5636.497.R1.fna  5636.558.R1.fna  5636.60.R1.fna   combined_seqs.fna
5425.470.R1.fna  5636.155.R1.fna  5636.231.R1.fna  5636.299.R1.fna  5636.354.R1.fna  5636.398.R1.fna  5636.458.R1.fna  5636.517.R1.fna  5636.561.R1.fna  5636.614.R1.fna
5425.475.R1.fna  5636.158.R1.fna  5636.235.R1.fna  5636.304.R1.fna  5636.356.R1.fna  5636.420.R1.fna  5636.462.R1.fna  5636.51.R1.fna   5636.563.R1.fna  5636.617.R1.fna
5425.485.R1.fna  5636.186.R1.fna  5636.239.R1.fna  5636.305.R1.fna  5636.365.R1.fna  5636.421.R1.fna  5636.463.R1.fna  5636.524.R1.fna  5636.577.R1.fna  5636.618.R1.fna
5425.550.R1.fna  5636.187.R1.fna  5636.243.R1.fna  5636.312.R1.fna  5636.367.R1.fna  5636.422.R1.fna  5636.468.R1.fna  5636.526.R1.fna  5636.584.R1.fna  5636.628.R1.fna
5636.118.R1.fna  5636.193.R1.fna  5636.244.R1.fna  5636.317.R1.fna  5636.370.R1.fna  5636.424.R1.fna  5636.472.R1.fna  5636.529.R1.fna  5636.587.R1.fna  5636.638.R1.fna
5636.121.R1.fna  5636.194.R1.fna  5636.248.R1.fna  5636.327.R1.fna  5636.372.R1.fna  5636.432.R1.fna  5636.476.R1.fna  5636.530.R1.fna  5636.589.R1.fna  5636.640.R1.fna
5636.126.R1.fna  5636.195.R1.fna  5636.258.R1.fna  5636.330.R1.fna  5636.378.R1.fna  5636.434.R1.fna  5636.478.R1.fna  5636.536.R1.fna  5636.58.R1.fna   5636.643.R1.fna
5636.127.R1.fna  5636.198.R1.fna  5636.263.R1.fna  5636.332.R1.fna  5636.380.R1.fna  5636.436.R1.fna  5636.479.R1.fna  5636.53.R1.fna   5636.591.R1.fna  5636.646.R1.fna
5636.131.R1.fna  5636.200.R1.fna  5636.264.R1.fna  5636.334.R1.fna  5636.381.R1.fna  5636.439.R1.fna  5636.483.R1.fna  5636.543.R1.fna  5636.598.R1.fna  5636.655.R1.fna

Attached is the traitar_file.txt
traitar_file.txt

Download fails if folder doesn't exist

traitar config

Print version

To be included in pipelines, or to reproduce results, a (semantic) version number would be great.

File "/home/aweimann/.local/lib/python2.7/site-packages/traitar/get_external_data.py", line 15, in download
response = urllib2.urlopen("ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam27.0/Pfam-A.hmm.gz", timeout = 5, stream = True)
TypeError: urlopen() got an unexpected keyword argument 'stream'

feature track generation

although run for the first time target directory

output dir traitar_out/phenotype_prediction/phypat+PGL/feat_gffs already exists; press 1 to continue with data from a previous run; press 2 to remove this directory; press 3 to abort
is prompted

also GNU parallel is run suspiciously often

legend placement

Reconsider placement of the legends in the heatmap plot i.e. swap sample color key and phenotype color key

requirements

@kmooren reported that dependencies are installed although already met by system packages.

Come up with a README for basic usage

Auto-install python dependencies

pandas
tornado
nose
matplotlib
scipy
requests

67 Traits?

Probably doesn't belong here, but a list of all traits (plus their pfam yes/no combinations) would be awesome.

improve heatmap

combined heatmap for phypat and phypat+GGL with 4 different colors.
use discrete colors instead of color gradient.

ERROR: reduce the number of sample categories to less than 15

Hi,

I have installed v1.1.2 on my local cluster and it works with the sample data. When I run it using my own data however I get the above message even though I have only 14 Samples and Categories.

I have noticed the following in the traitar code though at line 140:

if len(uq) > 12:
sys.exit("reduce the number of sample categories to less than 15")

Is this a typo?

I am currently attempting to run with 9 samples and categories which seems to have got further.

Thanks,

Matt

Pfam track generation

In addtion to the Pfam feature track generation relevant for specific phenotypes, generate one track for the entire Pfam annotation.

Error during execution

The prodigal step finished fine but soon after I got the message about the Pfam annotation, I got a very long error:
running Pfam annotation with hmmer. This step can take a while. A rough estimate for sequential Pfam annotation of genome samples of ~3 Mbs is 10 min per genome.
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 3162, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 3446, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2808, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2219, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2917, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2156, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 1826, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2299, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
ls: cannot access /home/maria/Desktop/Traitar_test_20160425/pfam_annotation/*_filtered_best.dat: No such file or directory
running phenotype prediction
Traceback (most recent call last):
File "/usr/local/bin/predict.py", line 110, in
annotate_and_predict((pt1, pt2), tarfile.open(args.model_tar, mode = "r:gz"), args.annotation_matrix,args.pfam_pts_mapping_f, args.out_dir, args.voters)
File "/usr/local/bin/predict.py", line 88, in annotate_and_predict
aggr_dfs = aggregate(pred_df, k)
File "/usr/local/bin/predict.py", line 31, in aggregate
maj_pred_dfs[0].iloc[:,i / k] = pred_df.iloc[:, i: i + k].apply(filter_pred, axis = 1, is_majority = True, k = k)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 98, in setitem
self._setitem_with_indexer(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 416, in _setitem_with_indexer
value = self._align_frame(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 587, in _align_frame
raise ValueError('Incompatible indexer with DataFrame')
ValueError: Incompatible indexer with DataFrame
Traceback (most recent call last):
File "/usr/local/bin/predict.py", line 110, in
annotate_and_predict((pt1, pt2), tarfile.open(args.model_tar, mode = "r:gz"), args.annotation_matrix,args.pfam_pts_mapping_f, args.out_dir, args.voters)
File "/usr/local/bin/predict.py", line 88, in annotate_and_predict
aggr_dfs = aggregate(pred_df, k)
File "/usr/local/bin/predict.py", line 31, in aggregate
maj_pred_dfs[0].iloc[:,i / k] = pred_df.iloc[:, i: i + k].apply(filter_pred, axis = 1, is_majority = True, k = k)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 98, in setitem
self._setitem_with_indexer(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 416, in _setitem_with_indexer
value = self._align_frame(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 587, in _align_frame
raise ValueError('Incompatible indexer with DataFrame')
ValueError: Incompatible indexer with DataFrame
Traceback (most recent call last):
File "/usr/local/bin/merge_preds.py", line 79, in
comb_preds(args.phypat_dir, args.phypat_GGL_dir, args.out_dir, args.voters)
File "/usr/local/bin/merge_preds.py", line 19, in comb_preds
m1_scores = ps.read_csv("%s/predictions_majority-vote_mean-score.txt"%phypat_dir, index_col = 0, sep = "\t")
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in init
self._make_engine(self.engine)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in init
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.cinit (pandas/parser.c:3200)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5559)
IOError: File /home/maria/Desktop/Traitar_test_20160425/phenotype_prediction/phypat/predictions_majority-vote_mean-score.txt does not exist
running feature track generation
Traceback (most recent call last):
File "/usr/local/bin/traitar", line 329, in
args.func(args)
File "/usr/local/bin/traitar", line 19, in phenolyze
p.run(args.mode)
File "/usr/local/bin/traitar", line 164, in run
self.run_feature_track_generation(self.s2f.loc[:,"sample_name"], mode)
File "/usr/local/bin/traitar", line 249, in run_feature_track_generation
phypat_preds = ps.read_csv(os.path.join(self.phypat_dir, "predictions_majority-vote.txt"), index_col = 0, sep = "\t")
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in init
self._make_engine(self.engine)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in init
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.cinit (pandas/parser.c:3200)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5559)
IOError: File /home/maria/Desktop/Traitar_test_20160425/phenotype_prediction/phypat/predictions_majority-vote.txt does not exist

Alternative gene prediction software

e.g. (meta)genemark

Showcase example output

Include a pretty heatmap as an example output, embedding it in README.md

make temporary file redundant for domtblout2gene_generic.py

no prodigal required if from_genes

We don't require prodigal if in from_genes mode.

Heatmap

Introduce heatmap with dendrogram as a primary visualization of the output.

check if config and pfam hmm path are available

Print help if (sub)program is called w/o args

I get an ugly python error, instead of the (intended) usage.

example data

Change example data to strains that are in GIDEON but weren't used for the training. This way we have the actual labels in case somebody wants to check the predictions

Print smth. before the hmmsearch step

As this step takes quite a while, write something to stderr before you start it.
prodigal finishes rapidly, while hmmsearch takes forever (at least w/o parallel).

Running hmmsearch to search against Pfam HMMs, this might take a while!

Undocumented sample file

Document how the sample file should look like in the repo's README (and ideally in the program's help, too - from what I can see, it seems to be just a tsv: filename\tsome_name\n)

only one sample supplied

account for case where there is only one sample i.e. no clustering

Python 3.x

Consider migrating to python 3

traitar pfam /tmp/traitar-pfam uses too much RAM

The above command gets killed after exhausting system memory on a desktop 4 GiB machine. The download procedure should not use that much memory.

Optimize config

Optimize Pfam HMM db download in config mode and check config if the program is started in phenotype mode.

GNU parallel --will-cite

to get rid of annoying message.

traitAR stdout

Make sure only useful output gets to the user e.g. discard Prodigal stdout, heatmap logs etc.

Review paths

Review how external scripts, data models etc. are referenced to make sure that the program can be executed from any location.

Update refs

s/bioRxiv/mSystems/ 😄

feature track generation

breaks if continue option is selected and prompts directory already exists with no previous results.

Heatmap generation fails

Hi,

I'm using the v.1.11 release, installed in a virtual environment alongisde the python dependencies, using python 2.7.3.

It works until heatmap generation, I get the following error message:

[...]
running heatmap generation
traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py:789: RuntimeWarning: divide by zero encountered in double_scalars
  automin = (y[2] - y[1]) / clen
traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py:790: RuntimeWarning: divide by zero encountered in double_scalars
  automax = (y[-2] - y[-3]) / clen
Traceback (most recent call last):
  File "traitar/bin/heatmap.py", line 482, in <module>
    heatmap(matrix, row_header, column_header, primary_pt_models, args.color_f, args.row_method, args.column_method, args.row_metric, args.column_metric,   args.out_f, args.sample_f, secondary_pt_models)
  File "traitar/bin/heatmap.py", line 288, in heatmap
    cb = mpl.colorbar.ColorbarBase(axsl, cmap=cmap_p, norm=norm, spacing='proportional', ticks=bounds, boundaries=bounds)
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 323, in __init__
    self.draw_all()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 346, in draw_all
    X, Y = self._mesh()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 814, in _mesh
    y = self._proportional_y()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 789, in _proportional_y
    automin = (y[2] - y[1]) / clen
IndexError: index 2 is out of bounds for axis 0 with size 2
traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py:789: RuntimeWarning: divide by zero encountered in double_scalars
  automin = (y[2] - y[1]) / clen
traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py:790: RuntimeWarning: divide by zero encountered in double_scalars
  automax = (y[-2] - y[-3]) / clen
Traceback (most recent call last):
  File "traitar/bin/heatmap.py", line 482, in <module>
    heatmap(matrix, row_header, column_header, primary_pt_models, args.color_f, args.row_method, args.column_method, args.row_metric, args.column_metric,   args.out_f, args.sample_f, secondary_pt_models)
  File "traitar/bin/heatmap.py", line 288, in heatmap
    cb = mpl.colorbar.ColorbarBase(axsl, cmap=cmap_p, norm=norm, spacing='proportional', ticks=bounds, boundaries=bounds)
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 323, in __init__
    self.draw_all()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 346, in draw_all
    X, Y = self._mesh()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 814, in _mesh
    y = self._proportional_y()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 789, in _proportional_y
    automin = (y[2] - y[1]) / clen
IndexError: index 2 is out of bounds for axis 0 with size 2
Traceback (most recent call last):
  File "traitar/bin/heatmap.py", line 482, in <module>
    heatmap(matrix, row_header, column_header, primary_pt_models, args.color_f, args.row_method, args.column_method, args.row_metric, args.column_metric,   args.out_f, args.sample_f, secondary_pt_models)
  File "traitar/bin/heatmap.py", line 288, in heatmap
    cb = mpl.colorbar.ColorbarBase(axsl, cmap=cmap_p, norm=norm, spacing='proportional', ticks=bounds, boundaries=bounds)
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 323, in __init__
    self.draw_all()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 346, in draw_all
    X, Y = self._mesh()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 814, in _mesh
    y = self._proportional_y()
  File "traitar/local/lib/python2.7/site-packages/matplotlib/colorbar.py", line 789, in _proportional_y
    automin = (y[2] - y[1]) / clen
IndexError: index 2 is out of bounds for axis 0 with size 2

I am using the following command:

traitar phenotype traitar_lao traitar_lao_sub/samples.txt from_genes traitar_lao_sub_OUT/ -c 1

For the test data set heatmap generation works.

Attached a subset of the dataset I'm using (first 4 samples instead of 33), as otherwise the file couldn't be uploaded.
traitar_lao_sub.zip

Is this maybe related to the fact that I only have 1 category in the samples.txt file?

Heatmap breaks if no predictions are made

There is a minimum number of positive predictions (I think two) otherwise the heatmap call will fail

Get rid of future warnings

Improve naming of result files

The result files generated by traitar are partly not self explanatory

improve the naming
add description in the README

config problem

abort traitar phenotype if traitar pfam has not been run

IOError: [Errno 2] No such file or directory: '/home/aaron/traitar/traitar/config.json'

Improve heatmap details

Rethink color bars e.g. make optional according to user input
Introduce limit for the number of samples visualized in the heatmap or more advanced scale everything according to the number of samples

Error message

Hello. I have installed traitar in a mac, and runs OK, except I get this error message during heatmap generation

traitar phenotype /work/01_SAG/05_traitar/01_input /work/01_SAG/05_traitar/Traitar_samples.txt from_nucleotides /work/01_SAG/05_traitar/02_output -c 4

/usr/local/sources/bioinfo/lib/python2.7/site-packages/matplotlib/axes/_base.py:3045: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
'bottom=%s, top=%s') % (bottom, top))
Traceback (most recent call last):
File "/usr/local/sources/bioinfo/bin/heatmap.py", line 482, in
heatmap(matrix, row_header, column_header, primary_pt_models, args.color_f, args.row_method, args.column_method, args.row_metric, args.column_metric, args.out_f, args.sample_f, secondary_pt_models)
File "/usr/local/sources/bioinfo/bin/heatmap.py", line 278, in heatmap
dr = dr[idx1]
UnboundLocalError: local variable 'idx1' referenced before assignment
/usr/local/sources/bioinfo/lib/python2.7/site-packages/matplotlib/axes/_base.py:3045: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
'bottom=%s, top=%s') % (bottom, top))
Traceback (most recent call last):
File "/usr/local/sources/bioinfo/bin/heatmap.py", line 482, in
heatmap(matrix, row_header, column_header, primary_pt_models, args.color_f, args.row_method, args.column_method, args.row_metric, args.column_metric, args.out_f, args.sample_f, secondary_pt_models)
File "/usr/local/sources/bioinfo/bin/heatmap.py", line 278, in heatmap
dr = dr[idx1]
UnboundLocalError: local variable 'idx1' referenced before assignment
/usr/local/sources/bioinfo/lib/python2.7/site-packages/matplotlib/axes/_base.py:3045: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
'bottom=%s, top=%s') % (bottom, top))
Traceback (most recent call last):
File "/usr/local/sources/bioinfo/bin/heatmap.py", line 482, in
heatmap(matrix, row_header, column_header, primary_pt_models, args.color_f, args.row_method, args.column_method, args.row_metric, args.column_metric, args.out_f, args.sample_f, secondary_pt_models)
File "/usr/local/sources/bioinfo/bin/heatmap.py", line 278, in heatmap
dr = dr[idx1]
UnboundLocalError: local variable 'idx1' referenced before assignment

Integrate example phenotype prediction

Missing parentheses in call to 'print'

Hi there,

Just read about traitar and wanted to give it a shot! Installation went fine, however it looks like I'm not able to make it run. After calling traitar I get the following error:

File "/usr/local/bin/traitar", line 173 print self.user_message % out_dir ^ SyntaxError: Missing parentheses in call to 'print'

I've played a little bit around with this error by commenting out this print line, but then another line pops up. If I'm correct, this seems like an error related to using the Python2 print vs. the Python3 print function.

traitar installed as a python3.4 module using pip. Should I install using pip2 instead?

Thanks, looking forward to use traitar!

Sander

check if external programs are available

Include checks to see if Prodigal, hmmer and parallel are available.
If parallel execution is desired and parallel is not installed, fall back to sequential execution

Packaging of config file

Taken from #48:

Unfortunately my own config file was packaged into the Traitar source distribution, which is why it's looking for the aaron folder.

Is this something you can easily change in future releases?
We should ship a vanilla version of the Traitar software.

From my understanding, the config file gets created during/after the initial installation?
Then it shouldn't be a problem for new users, but maybe for users who update Traitar...

-c leads to "Unknown option" message

Hi,

I'm not sure "-c" for using multiple processors is working as expected. There are two "Unknown option" messages appearing:

bach@serendipity:~/traitar/test$ traitar phenotype indir samples.txt from_genes outdir -c 12
running_ annotation with hmmer. This step can take a while. A rough estimate for sequential Pfam annotation of genome samples of ~3 Mbs is 10 min per genome.
Unknown option: will-cite
Unknown option: will-cite

Best,
B.

PS: (this was on the "traitar-master" ZIP downloaded from github)

traitAR name

I can't help but think of traitAntibioticResistance when reading the name, especially given the current coolness of anything AR-related. What is AR in traitAR?

Doesnt work for me at the first step

lala@kw1322:~/projects/traitar_results$ traitar phenotype ../IMG_annotation/ samples.txt from_genes .
Traceback (most recent call last):
File "/usr/local/bin/traitar", line 492, in
args.func(args)
File "/usr/local/bin/traitar", line 32, in phenolyze
p = Traitar(args.input_dir, args.output_dir, args.sample2file, args.cpus, args.rearrange_heatmap, args.heatmap_format, args.no_heatmap_phenotype_clustering, args.no_heatmap_sample_clustering, args.gene_gff_type, args.primary_models, args.secondary_models)
File "/usr/local/bin/traitar", line 71, in init
self.s2f = self.parse_sample_f()
File "/usr/local/bin/traitar", line 142, in parse_sample_f
if len(i) > 30:
TypeError: object of type 'float' has no len()