Giter Site home page Giter Site logo

Comments (5)

benmwebb avatar benmwebb commented on August 15, 2024

Maybe I don't understand you completely - could you come up with a self-contained example (perhaps as a gist) that reproduces the problem? You can't have gaps in seq_id by construction - in mmCIF seq_id always starts at 1 and is contiguous, and indexes into the entity sequence. If there are gaps in the PDB numbering you either need to add the missing residues to the entity sequence (and then just not provide coordinates for them) or map the non-contiguous PDB numbering to contiguous mmCIF numbering, using auth_seq_id_map - see https://python-ihm.readthedocs.io/en/latest/usage.html#residue-numbering

from python-ihm.

mtrellet avatar mtrellet commented on August 15, 2024

Thanks for your reply Ben!

You were right, the issue comes from the mapping indeed, I've just put the starting shift in the auth_seq_id_map and not the full mapping. See: https://gist.github.com/mtrellet/5926bb29e8f3952acf3eace7676af44f/919c4fd98f98f45011a20522d494501343c7c21a (1st revision)

I've just tested with a list containing the numbering gap instead:
See: https://gist.github.com/mtrellet/5926bb29e8f3952acf3eace7676af44f/b5b0c8ef4ada3265eed7cbf6936b76b21f3f823e (2nd revision)

The list used as auth_seq_id_map (gap in bold):

[5, 6, 7, 8, 9, 10, 11 ... 1383, 1384, 1385, 1506, 1507 ... 1532, 1533, 1534]

However, two issues at first glance:

  1. I get a shift in the _pdbx_poly_seq_scheme.pdb_seq_num field of my mmCIF file that should start at 5 (see list above).
_pdbx_poly_seq_scheme.asym_id
_pdbx_poly_seq_scheme.entity_id
_pdbx_poly_seq_scheme.seq_id
_pdbx_poly_seq_scheme.mon_id
_pdbx_poly_seq_scheme.pdb_seq_num
_pdbx_poly_seq_scheme.auth_seq_num
_pdbx_poly_seq_scheme.pdb_mon_id
_pdbx_poly_seq_scheme.auth_mon_id
_pdbx_poly_seq_scheme.pdb_strand_id
A 1 1 U 6 6 U U A
A 1 2 G 7 7 G G A
A 1 3 A 8 8 A A A
A 1 4 A 9 9 A A A

If I traceback the writer function (_PolySeqSchemeDumper.dump) I can see that the auth_seq_num (where the error occurs in the mmCIF file) is set with the following line:
auth_seq_num = asym._get_auth_seq_id(num+1)
Removing the +1 outputs the proper _pdbx_poly_seq_scheme.pdb_seq_num. However, this also outputs 0 as first pdb_seq_num when no auth_seq_id_map is provided... So there must be a better solution..!

  1. I still get the index out of range error at the same step than before.

I've printed the seq_id_range of my AsymUnits after initializing them with the auth_seq_id_map and it seems fine. My guess is that the error I get when dumping the system is that the get_atoms function of my MyModel class is overwriting the seq_id and then introduces the gap in seq_id. And it later on leads to the error I reported in my first post. So I guess I need to modify the get_atoms() function to not reassign the seq_id field. However, since it is supposed to be a contiguous sequence starting at 1, why not setting it up as an iterable counter? This should not be possible to overwrite it, or am I getting that wrong again?

Thanks again for your time!

from python-ihm.

benmwebb avatar benmwebb commented on August 15, 2024

In both cases, you need to use seq_id (not auth_seq_id or "PDB" numbering, e.g. r.id[1]) when referencing any IHM object - e.g. when you create your AsymUnitRange or Atom objects. One way to achieve this would be to keep a reverse mapping.

e.g. in your second case replace

seqA_ids = []

for r in s.get_residues():
    if r.get_parent().id == "A" and r.resname != "HOH":
        seqA.append(r.resname.strip())
        seqA_ids.append(r.id[1])

with

seq_id_from_pdb_A = {}
pdb_from_seq_id_A = {}

seq_id = 0
for r in s.get_residues():
    if r.get_parent().id == "A" and r.resname != "HOH":
        seq_id += 1
        seqA.append(r.resname.strip())
        seq_id_from_from_pdb_A[r.id(1)] = seq_id
        pdb_from_seq_id_A[seq_id] = r.id(1)

Then use seq_id_from_pdb_A[r.id(1)] when making your Atom or AsymUnitRange objects, and pass pdb_from_seq_id_A as auth_seq_id_map to the AsymUnit constructor.

The list used as auth_seq_id_map (gap in bold):

[5, 6, 7, 8, 9, 10, 11 ... 1383, 1384, 1385, 1506, 1507 ... 1532, 1533, 1534]

auth_seq_id_map is used as a map, i.e. auth_seq_id = auth_seq_id_map[seq_id]. Since Python numbering starts at 0 while mmCIF numbering starts at 1, if you want to use a list or tuple here you'll want to add a null entry at the start of the list (since there will never be a seq_id = 0 used to index that value). That's why your numbering ends up off by 1. I'll add some clarification to the API docs here.

My guess is that the error I get when dumping the system is that the get_atoms function of my MyModel class is overwriting the seq_id and then introduces the gap in seq_id

Right, it's because you're using the PDB numbering, not seq_id, here:

atomsA.append((c.id, r.id[1], a.name[0], a.name, a.coord[0], a.coord[1], a.coord[2]))

This should be

atomsA.append((c.id, seq_id_from_pdb_A[[r.id[1]], a.name[0], a.name, a.coord[0], a.coord[1], a.coord[2]))

However, since it is supposed to be a contiguous sequence starting at 1, why not setting it up as an iterable counter?

That wouldn't work because seq_id is per-residue, and get_atoms() could return one, multiple, or no atoms for each residue (your sequence has to be contiguous, but you don't have to have coordinates for every residue).

from python-ihm.

mtrellet avatar mtrellet commented on August 15, 2024

auth_seq_id_map is used as a map, i.e. auth_seq_id = auth_seq_id_map[seq_id]. Since Python numbering starts at 0 while mmCIF numbering starts at 1, if you want to use a list or tuple here you'll want to add a null entry at the start of the list (since there will never be a seq_id = 0 used to index that value). That's why your numbering ends up off by 1. I'll add some clarification to the API docs here.

Yes indeed, that what I was tempted to do (start with a null in my list) I just was not sure it was a "good practice" and that a neater way could exist. And actually a dictionary is way better (also for genericity purposes).

That wouldn't work because seq_id is per-residue, and get_atoms() could return one, multiple, or no atoms for each residue (your sequence has to be contiguous, but you don't have to have coordinates for every residue).

Very true, forgot this "detail". Thanks for the explanation!

Just to summarize, everything worked fine with the new auth_seq_id_map, thanks a lot for your complete answer! I updated the gist in case someone is interested...

And another (almost) related question: Do we plan to output the _atom_site.auth_seq_id records or this would be too much redundant with the _pdbx_poly_seq_scheme.auth_seq_num? I know we've done it in the model we deposited on PDB-dev for instance.

from python-ihm.

benmwebb avatar benmwebb commented on August 15, 2024

Do we plan to output the _atom_site.auth_seq_id records

I don't have the library output this because (as you say) it's redundant, and I prefer not to make the files any larger than they have to be. But it would be easy enough to add if needed.

from python-ihm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.