Giter Site home page Giter Site logo

artpoon / gotoh2 Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 2.0 111 KB

Lightweight and customizable Python/C extension for pairwise alignment of genetic sequences using the Gotoh algorithm

License: GNU Affero General Public License v3.0

C 34.66% Python 65.28% Shell 0.06%

gotoh2's People

Contributors

artpoon avatar rouxcil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gotoh2's Issues

Can't load module from package root

If I launch the Python interpreter from ~/git/gotoh2, I get the following result:

>>> from gotoh2 import aligner
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "gotoh2/aligner.py", line 1, in <module>
    import Cgotoh2
ImportError: No module named Cgotoh2

However, the test script works fine, and if I launch the interpreter elsewhere things seem to be fine.
I must be doing something dumb with the setup script...

Benchmarking script fails with unused matrices

Encountered this when running through the integrase sequences in IN.txt for benchmark testing.
This exception arose with the sequence at index 26 (here I've hard-coded it for testing):

Traceback (most recent call last):
  File "benchmark.py", line 16, in <module>
    g2.align(ref, seqs[26])
  File "/usr/local/lib/python2.7/dist-packages/gotoh2-0.1-py2.7-linux-x86_64.egg/gotoh2/aligner.py", line 73, in align
    self.matrix
RuntimeError: Traceback failed, try local alignment

This usually means that the traceback failed (returning a NULL value as alignment score) but this routine doesn't print the usual debugging statement. I've added a couple of print statements and found that the cost and bits matrices haven't been modified from their initial values.

Missing file U54771.txt for unit test

Looks like another file got missed. I've been happy with TravisCI as a way to make sure the test suite has all its dependencies in git. You just add a config file to call your tests and then register the repository with TravisCI. It runs the tests after every commit and sends you an e-mail if you broke something.

$ . runTests.sh 
.......E
======================================================================
ERROR: test_issue6 (test.TestIssues)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/data/don/git/gotoh2/tests/test.py", line 87, in test_issue6
    with open('U54771.txt') as f:
IOError: [Errno 2] No such file or directory: 'U54771.txt'

----------------------------------------------------------------------
Ran 8 tests in 0.334s

FAILED (errors=1)

Convert_fasta doesn't strip \r line terminators

Probably would be a quick fix (gotoh2_utils.py line 90)
from

sequence += line.strip('\n').upper()

to

sequence += line.strip('\n').strip('\r').upper()
>>> handle = open(cwd+'/data/weeklydumps/baseline/GISAID-0417_0508.fasta')
>>> fasta=convert_fasta(handle)
>>> handle.close()
>>> fasta[0]
['hCoV-19/Australia/NT12/2020|EPI_ISL_426900|2020-03-25\r', ''NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAA\rAA........

Aligner failed on an HIV RT sequence

Accession number is AY812749

Traceback (most recent call last):
  File "script.py", line 104, in <module>
    # re-align the protein sequences to prevent gaps from breaking codons
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/gotoh2-0.1-py3.5-macosx-10.12-x86_64.egg/gotoh2.py", line 84, in align
    self.matrix
RuntimeError: Traceback failed, try local alignment

Not working for sequences of about 100bp or more

To reproduce:

from gotoh2.aligner import Aligner
g2 = Aligner()
g2.is_global = False
s1 = "TTTTTAGATGGGATAGATAAAGCTCAAGAAGAACATGAAAGATATCACAGCAATTGGAGAGCAATGGCTAGTGATTTTAATCTGCCACCTATAGTAGCAA"
s2 = "TTTTTGGATGGAATAGATAAGGCTCAAGAAGAACATGAGAAATATCACAACAATTGGAGAGCAATGGCTAGTGATTTTAACCTACCACCCGTGGTAGCAA"

We can align slightly shorter sequences:

>>> g2.align(s1[:90], s2[:90])
('TTTTTAG-ATGGGA-TAGATAAAG-CTCAAGAAGAACATGA-AAGATATCACA-GCAATTGGAGAGCAATGGCTAGTGATTTTAATC-T-GCCACC-T', 'TTTTT-GGATGG-AATAGATAA-GGCTCAAGAAGAACATGAGAA-ATATCACAA-CAATTGGAGAGCAATGGCTAGTGATTTTAA-CCTA-CCACCC-', 380)

but we can't do all 100:

>>> g2.align(s1, s2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/gotoh2-0.1-py2.7-linux-x86_64.egg/gotoh2/aligner.py", line 73, in align
    self.matrix
RuntimeError: Traceback failed, try local alignment

Wrong path calculation

Working through some test cases, I've come across the following error. For sequences ACGT and ACT, match/mismatch scores of -5/+4 and gap open/extend penalties +5,+1, we get the following cost matrix:

    *   A   C   T
*   0   6   7   8 
A   6  -5   1   2 
C   7   1 -10  -4 
G   8   2  -4  -6 
T   9   3  -3  -9 

results in the following traceback:

  i j type
0 4 3 Diagonal
1 3 2 Vertical
2 2 2 Diagonal
3 1 1 Diagonal

But the best path should be V, D, D, D (-9, -6, -10, -5, 0).
no it's not

Alignment sometimes fails when one sequence contains the other

I would expect that it should always be able to align one string against its substring. Even weirder, it's successful against a longer string.

Here's a script to reproduce the failure:

from gotoh2.aligner import Aligner


GAP_OPEN_COST = 10
V3LOOP_REF = ('TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT'
              'TCTATGCAACAGGAGACATAGTAGGAGATATAAGACAGGCACATTGT')


def main():
    seed_ref = ''.join([
        "GTACCCCACTCTGTGTTACTCCAAACTGCACGAATGATATCCGTACTACTGCTAACAGTACTAAG",
        "AACAACAGTAGTATTAGTAAAGAAATGATGAGTTGTTCTTTCAATATGACCACAGAAGTAAGAGA",
        "TAAGAAAGAGAAGGTAAATGCACTTTTTTATAAACTTGATATAGTACCACTTAATATTAGTTCGG",
        "GTAATAATAATAGCTCTGATGATAATAACAGTTCTGGTAAATATTATAGGTTAATAAATTGTAAT",
        "ACCTCAGCCGTAACACAGGCCTGTCCAAAAGTCTCTTTTGACCCAATTCCTATACATTATTGTGC",
        "TCCAGCGGGTTATGCGATTCTAAAGTGTAATAATAAGACCTTCAATGGAACAGGACCATGCAATA",
        "ATGTCAGCACAGTACAATGTACACATGGAATTAAACCAGTGGTATCGACTCAACTACTGTTAAAT",
        "GGTAGTCTAGCAGAAGAAGAAATAATAATTAGATCTCAAAATATAACAGACAATGTCAAAACAAT",
        "AATAGTACATCTTAATGAATCTGTAGAAATTAATTGCACAAGACCCAACAACAATACAAGAAAAA",
        "GTATAAGGATAGGACCAGGACAAGCATTCTATGCAACAGGAGACATAGTAGGAGATATAAGACAG",
        "GCACATTGTAACATTAGTAGAACAGCATGGAACAAAACCTTACAAAGAGTAAGTAAAAGATTATC",
        "AGAGTACTTCCCTAATAAAACAATAAAATTTGAAAGACACTCAGGAGGAGACCTAGAAATTACAA",
        "CACATAGCTTTAATTGTAGAGGAGAATTTTTCTATTGCAATACATCAAGCCTGTTTAATAGTGAA",
        "TTAGACAGTAATGGTACATTCAAAATTAATGGGACAGAAAATGGAACTGGAACAGAAAATTCAAA",
        "CATCACACTCCAATGCAGAATAAAACAAATTATAAACATGTGGCAGGAGGTAGGACGAGCAATGT",
        "ATGCCCCTCCCATTGCAGGAGAAATAACATGTAGATCAAATATCACAGGATTACTACTAACAAGG",
        "GATGGAGGAGACACGAGTGACGAGATATTCAGGCCTGGAGGAGGAGATATGAGGGACAATTGGAG"])
    aligner = Aligner()
    aligner.gap_open_penalty = GAP_OPEN_COST
    print(V3LOOP_REF in seed_ref)
    _, _, score = aligner.align(V3LOOP_REF, seed_ref)
    print(score)
    seed_ref = seed_ref[450:]
    print(V3LOOP_REF in seed_ref)
    try:
        _, _, score = aligner.align(V3LOOP_REF, seed_ref)
        print(score)
    except RuntimeError as ex:
        print(ex.message)

main()

Results:

True
-495
True
Traceback failed, try local alignment

This reference is part of HIV1-C-BR-JX140663 that we've been using to map reads for our G2P algorithm. Related to this: the other Gotoh library we use doesn't seem to penalize big gaps at the end. I don't know if that's a problem or not.

Build failed: can't find Biopp.csv

I cloned the repo and tried to run the setup. It complains about the Biopp.csv file that's listed as a data file in setup.py.

Here's the end of the installation output:

running install_data
creating build/bdist.linux-x86_64/egg/gotoh2/models
copying gotoh2/models/HYPHY_NUC.csv -> build/bdist.linux-x86_64/egg/gotoh2/models/
copying gotoh2/models/NWALIGN.csv -> build/bdist.linux-x86_64/egg/gotoh2/models/
error: can't copy 'gotoh2/models/Biopp.csv': doesn't exist or not a regular file

Calling align() causes crash

To reproduce:

>>>from gotoh2.aligner import Aligner
>>>al = Aligner()
>>>al.align('ACGT', 'ACT')  # unit test
my_settings.alphabet = ACGT?
('ACGT', 'AC-T', 13)
>>> al.align('ACGT', 'ACGTTTTTT')
my_settings.alphabet = ACGT?
*** Error in `python': free(): corrupted unsorted chunks: 0x000000000161e5f0 ***

Refactor broke traceback

Original motivation for issue #1.
I've substituted a C struct for the alignment matrices being passed around among the core functions, which tidies things up a fair amount. For the first unit test, the R matrix is computed correctly:

0 6 7 8 
6 -5 1 2 
7 1 -10 -4 
8 2 -4 -6 
9 3 -3 -9 

but the traceback is wrong:

alen i j type
0 4 3 Vertical
1 3 3 Diagonal
2 2 2 D
3 1 1 D

ImportError under Python 3

Feel free to close this if you just want to target Python 2.

I got farther with the current version than I did last time I tried, the installation now works. However, it looks like the module can't be imported.

I tried running the tests under Python 3, and after commenting out the broken print statements, I got this error:

E
======================================================================
ERROR: test (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test
Traceback (most recent call last):
  File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/mnt/data/don/git/gotoh2/tests/test.py", line 2, in <module>
    from gotoh2.aligner import Aligner
  File "/home/don/v3scratch/lib/python3.5/site-packages/gotoh2-0.1-py3.5-linux-x86_64.egg/gotoh2/aligner.py", line 1, in <module>
    import Cgotoh2
ImportError: No module named 'Cgotoh2'


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)

I would guess that you are a victim of the Python 3 changes to module initialization.

Missing NL4-3.txt file in unit tests

This is similar to issue #8, you're missing another file. You also still have the print output in your tests.

...
---AAAGGG---
TTAAAAGGGGTT
-14
..E.
======================================================================
ERROR: test_pol (test.TestHIV)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/data/don/git/gotoh2/tests/test.py", line 73, in test_pol
    with open('NL4-3.txt', 'rU') as f:
IOError: [Errno 2] No such file or directory: 'NL4-3.txt'

----------------------------------------------------------------------
Ran 7 tests in 0.002s

FAILED (errors=1)

In the test, it looks like you're just reading a single line from those two files. It might be easier to include two string literals in the test, then you don't have to worry about keeping track of the files.

Failing simple alignment with gop=10 (default) but not with gop=8

from gotoh2 import Aligner, map_coordinates

g1 = Aligner(gop=10)  # default gop=10
g2 = Aligner(gop=8)   

ref   = 'TACGTA'
query = 'TACTA'  # G removed
a1 = g1.align(ref, query)
a2 = g2.align(ref, query)
print(a1)
print(a2)

('TACGTA', 'TACT-A', 14)
('TACGTA', 'TAC-TA', 16)

Implement local alignment

Already half done (zeroing out top row and left-most column). The next step is to locate the minimum value along bottom row and right-most column to find start point of alignment (global defaults to M,N where M is length of sequence 1 and N is length of sequence 2).

Test failed, can't find HXB2-RT.txt

I removed the Biopp.csv requirement from setup.py to see if things would work without it. The install worked, but one of the tests failed.

~/git/gotoh2$ . runTests.sh 
...
---AAAGGG---
TTAAAAGGGGTT
16
..E.
======================================================================
ERROR: test_pol (test.TestHIV)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/data/don/git/gotoh2/tests/test.py", line 71, in test_pol
    with open('HXB2-RT.txt', 'rU') as f:
IOError: [Errno 2] No such file or directory: 'HXB2-RT.txt'

----------------------------------------------------------------------
Ran 7 tests in 0.002s

FAILED (errors=1)

You've also got some debugging output polluting your unit test report.

Unexpected behaviour around terminal gaps

I was trying to understand the scoring when a read is a small portion of the reference, and I found some strange behaviour in both Gotoh2 and the Gotoh library we use in MiCall. It looks like Gotoh2 is forcing the first nucleotides of each sequence to align, instead of allowing a large gap at the start. Strangely, it's not symmetrical: Gotoh2 doesn't force the last nucleotides to align.

So I think there's a bug with forcing the first nucleotides to align, but I also have an enhancement request to allow terminal gaps without cost. I would like to be able to use a large reference to score a bunch of small reads, and make the length of the reference not affect the score, only the length of the read and the number of matches, substitutions, and internal gaps. Feel free to pull that into a separate issue, or just decline that part if it's not useful for you.

Here's the script I used:

try:
    from gotoh2.aligner import Aligner
except ImportError:
    Aligner = None

try:
    from gotoh import align_it
except ImportError:
    align_it = None


GAP_OPEN_COST = 10
GAP_EXTEND_COST = 3


def align(seq1, seq2, terminal_cost):
    if Aligner is not None:
        aligner = Aligner()
        aligner.gap_open_penalty = GAP_OPEN_COST
        aligner.gap_extend_penalty = GAP_EXTEND_COST
        return aligner.align(seq1, seq2)
    return align_it(seq1,
                    seq2,
                    GAP_OPEN_COST,
                    GAP_EXTEND_COST,
                    terminal_cost)


def display_alignment(seq1, seq2, terminal_cost):
    aseq1, aseq2, score = align(seq1, seq2, terminal_cost)
    print(score)
    print(aseq1)
    print(aseq2)
    print('')


def run_scenario(terminal_cost, is_reversed=False):
    if Aligner is not None:
        title = 'Gotoh2'
    else:
        title = 'Gotoh'
    ref = 'TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT'
    if is_reversed:
        title += ' reversed'
        ref = ''.join(reversed(ref))
    if terminal_cost == 0:
        title += ' without terminal cost'
    print(title)
    display_alignment(ref, ref, terminal_cost)
    display_alignment(ref, ref[1:-1], terminal_cost)
    display_alignment(ref, ref[2:-1], terminal_cost)
    display_alignment(ref, ref[2:-2], terminal_cost)
    display_alignment(ref, ref[10:-10], terminal_cost)
    display_alignment(ref[1:-1], ref[10:-10], terminal_cost)
    display_alignment(ref[2:-1], ref[10:-10], terminal_cost)
    display_alignment(ref[2:-2], ref[10:-10], terminal_cost)



def main():
    run_scenario(terminal_cost=1)
    run_scenario(terminal_cost=1, is_reversed=True)
    run_scenario(terminal_cost=0)


main()

Here are the results for Gotoh2:

Gotoh2
290
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT

254
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
G-CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

246
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
C--ACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

238
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
C--ACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC--

110
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
C---------C-AACAACAATACAAGAAAAAGTATAAGGATAGGACCA----------

116
GCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
CCA---------ACAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

119
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
CC-AA-------CAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

122
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC
C-CAA-------CAACAATACAAGAAAAAGTATAAGGATAGGACCA--------

Gotoh2 reversed
290
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT

254
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
A-CGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG-

246
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
C--GAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG-

238
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
C--GAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACAC--

110
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
A-C---------CAGGATAGGAATATGAAAAAGAACATAACAACAACC----------

116
ACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG
ACCA---GGA------TAGGAATATGAAAAAGAACATAACAACAACC---------

119
CGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG
A---C-----CAGGATAGGAATATGAAAAAGAACATAACAACAACC---------

122
CGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACAC
A---C-----CAGGATAGGAATATGAAAAAGAACATAACAACAACC--------

Gotoh2 without terminal cost
290
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT

254
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
G-CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

246
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
C--ACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

238
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
C--ACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC--

110
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
C---------C-AACAACAATACAAGAAAAAGTATAAGGATAGGACCA----------

116
GCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
CCA---------ACAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

119
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
CC-AA-------CAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

122
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC
C-CAA-------CAACAATACAAGAAAAAGTATAAGGATAGGACCA--------

The original Gotoh has its own quirks. I don't understand why the score goes up from 290 to 293 when the read gets two bases shorter if the terminal cost is zero. I don't understand why shortening the reference sometimes reduces the score when the terminal cost is zero.

Here are the results when I run with the original Gotoh:

Gotoh
290
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT

280
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
-GCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

275
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
--CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

270
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
--CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC--

190
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
----------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA----------

190
GCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
---------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

190
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
--------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

190
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC
--------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA--------

Gotoh reversed
290
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT

280
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
-ACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG-

275
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
--CGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG-

270
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
--CGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACAC--

190
TACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACGT
----------ACCAGGATAGGAATATGAAAAAGAACATAACAACAACC----------

190
ACGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG
---------ACCAGGATAGGAATATGAAAAAGAACATAACAACAACC---------

190
CGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACACG
--------ACCAGGATAGGAATATGAAAAAGAACATAACAACAACC---------

190
CGAACAGGACCAGGATAGGAATATGAAAAAGAACATAACAACAACCCAGAACAC
--------ACCAGGATAGGAATATGAAAAAGAACATAACAACAACC--------

Gotoh without terminal cost
290
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT

293
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
-GCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

291
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
--CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA-

286
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
--CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC--

230
TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT
----------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA----------

227
GCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
---------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

224
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCA
--------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA---------

224
CACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGC
--------CCAACAACAATACAAGAAAAAGTATAAGGATAGGACCA--------

Strange bit matrix behaviour with local alignment

If we run:

        self.g2.set_model('HYPHY_NUC')
        self.g2.gap_open_penalty = 5
        self.g2.gap_extend_penalty = 1
        self.g2.is_global = False
        result = self.g2.align('AT', 'ATTTTT')
        expected = ('AT----', 'ATTTTT', 10)
        self.assertEqual(expected, result)

then the result is fine. If we append a T then we throw an exception:

traceback failed: i=2 j=2 bit=64
RuntimeError: Traceback failed, try local alignment

where I've added extra debugging text

Failed to build on macOS Catalina

In file included from /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8/Python.h:11:
In file included from /Library/Developer/CommandLineTools/usr/lib/clang/12.0.0/include/limits.h:21:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/limits.h:63:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/cdefs.h:807:2: error: 
      Unsupported architecture
#error Unsupported architecture
 ^
In file included from src/_gotoh2.c:10:
In file included from /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8/Python.h:11:
In file included from /Library/Developer/CommandLineTools/usr/lib/clang/12.0.0/include/limits.h:21:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/limits.h:64:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/machine/limits.h:8:2: error: 
      architecture not supported
#error architecture not supported
 ^
In file included from src/_gotoh2.c:10:
In file included from /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8/Python.h:25:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:64:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:71:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_types.h:27:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:33:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/machine/_types.h:34:2: error: 
      architecture not supported
#error architecture not supported
 ^
In file included from src/_gotoh2.c:10:
In file included from /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8/Python.h:25:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:64:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:71:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_types.h:27:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:55:9: error: 
      unknown type name '__int64_t'
typedef __int64_t       __darwin_blkcnt_t;      /* total blocks */
        ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:56:9: error: 
      unknown type name '__int32_t'; did you mean '__int128_t'?
typedef __int32_t       __darwin_blksize_t;     /* preferred block size */
        ^
note: '__int128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:57:9: error: 
      unknown type name '__int32_t'; did you mean '__int128_t'?
typedef __int32_t       __darwin_dev_t;         /* dev_t */
        ^
note: '__int128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:60:9: error: 
      unknown type name '__uint32_t'; did you mean '__uint128_t'?
typedef __uint32_t      __darwin_gid_t;         /* [???] process and gro...
        ^
note: '__uint128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:61:9: error: 
      unknown type name '__uint32_t'; did you mean '__uint128_t'?
typedef __uint32_t      __darwin_id_t;          /* [XSI] pid_t, uid_t, o...
        ^
note: '__uint128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:62:9: error: 
      unknown type name '__uint64_t'
typedef __uint64_t      __darwin_ino64_t;       /* [???] Used for 64 bi...
        ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:68:9: error: 
      unknown type name '__darwin_natural_t'
typedef __darwin_natural_t __darwin_mach_port_name_t; /* Used by mach */
        ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:70:9: error: 
      unknown type name '__uint16_t'; did you mean '__uint128_t'?
typedef __uint16_t      __darwin_mode_t;        /* [???] Some file attributes */
        ^
note: '__uint128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:71:9: error: 
      unknown type name '__int64_t'
typedef __int64_t       __darwin_off_t;         /* [???] Used for file sizes */
        ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:72:9: error: 
      unknown type name '__int32_t'; did you mean '__int128_t'?
typedef __int32_t       __darwin_pid_t;         /* [???] process and gro...
        ^
note: '__int128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:73:9: error: 
      unknown type name '__uint32_t'; did you mean '__uint128_t'?
typedef __uint32_t      __darwin_sigset_t;      /* [???] signal set */
        ^
note: '__uint128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:74:9: error: 
      unknown type name '__int32_t'; did you mean '__int128_t'?
typedef __int32_t       __darwin_suseconds_t;   /* [???] microseconds */
        ^
note: '__int128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:75:9: error: 
      unknown type name '__uint32_t'; did you mean '__uint128_t'?
typedef __uint32_t      __darwin_uid_t;         /* [???] user IDs */
        ^
note: '__uint128_t' declared here
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types.h:76:9: error: 
      unknown type name '__uint32_t'; did you mean '__uint128_t'?
typedef __uint32_t      __darwin_useconds_t;    /* [???] microseconds */
        ^
note: '__uint128_t' declared here
In file included from src/_gotoh2.c:10:
In file included from /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8/Python.h:25:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:64:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:71:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_types.h:43:9: error: 
      unknown type name '__uint32_t'; did you mean '__uint128_t'?
typedef __uint32_t      __darwin_wctype_t;
        ^
note: '__uint128_t' declared here
In file included from src/_gotoh2.c:10:
In file included from /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8/Python.h:25:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:64:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:75:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_va_list.h:31:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/machine/types.h:37:2: error: 
      architecture not supported
#error architecture not supported
 ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
error: command 'xcrun' failed with exit status 1

Problems installing on Langley (Ubuntu 16.04.6 LTS)

running install
Checking .pth file support in /usr/local/lib/python3.5/dist-packages/
/usr/bin/python3 -E -c pass
TEST PASSED: /usr/local/lib/python3.5/dist-packages/ appears to support .pth files
running bdist_egg
running egg_info
writing top-level names to gotoh2.egg-info/top_level.txt
writing gotoh2.egg-info/PKG-INFO
writing dependency_links to gotoh2.egg-info/dependency_links.txt
reading manifest file 'gotoh2.egg-info/SOURCES.txt'
writing manifest file 'gotoh2.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
failed to import Cython: /usr/local/lib/python3.5/dist-packages/Cython/Compiler/Scanning.so: undefined symbol: _Py_ZeroStruct
error: Cython does not appear to be installed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.