Giter Site home page Giter Site logo

haddocking / haddock-tools Goto Github PK

View Code? Open in Web Editor NEW
45.0 21.0 17.0 222 KB

Set of useful HADDOCK utility scripts

License: Apache License 2.0

Python 89.47% Makefile 0.35% C++ 4.35% Shell 1.68% Awk 4.16%
haddock computational-structural-biology integrative-modeling python structural-biology utrecht-university

haddock-tools's Introduction

haddock-tools



Important❗

Please note that most scripts here are out-dated!

While some of them might still work, it is recommended to check our utilities currently in development, such as:



Set of useful HADDOCK utility scripts, which requires Python 3.7+.

About

This is a collection of scripts useful for pre- and post-processing and analysis for HADDOCK runs. Requests for new scripts will be taken into consideration, depending on the effort and general usability of the script.

Installation

Download the zip archive or clone the repository with git. This last is the recommended option as it is then extremely simple to get updates.

# To download
git clone https://github.com/haddocking/haddock-tools

# To compile the executables
cd haddock-tools
make

# To update
cd haddock-tools && git pull origin master

Scripts

Restraints-related

passive_from_active.py

A python script to obtain a list of passive residues providing a PDB file and a list of active residues. This will automatically calculate a list of surface residues from the PDB to filter out buried residues except if a surface list is provided. By default, neighbors of the active residues are searched within 6.5 Angstroms and surface residues are residues whose relative side chain accessibility or main chain accessibility is above 15%.

Requirements:

pip install freesasa

pip install biopython

Usage:

./passive_from_active.py [-h] [-c CHAIN_ID] [-s SURFACE_LIST]
                              pdb_file active_list

positional arguments:
  pdb_file              PDB file
  active_list           List of active residues IDs (int) separated by commas

optional arguments:
  -h, --help            show this help message and exit
  -c CHAIN_ID, --chain-id CHAIN_ID
                        Chain id to be used in the PDB file (default: All)
  -s SURFACE_LIST, --surface-list SURFACE_LIST
                        List of surface residues IDs (int) separated by commas

active-passive_to_ambig.py

A python script to create ambiguous interaction restraints for use in HADDOCK based on list of active and passive residues (refer to the HADDOCK software page for more information)

Usage:

     ./active-passive_to_ambig.py <active-passive-file1> <active-passive-file2>

where is a file consisting of two space-delimited lines with the first line active residues numbers and the second line passive residue numbers. One file per input structure should thus be provided.

restrain_bodies.py

A python script to creates distance restraints to lock several chains together. Useful to avoid unnatural flexibility or movement due to sequence/numbering gaps during the refinement stage of HADDOCK.

Usage:

./restrain_bodies.py [-h] [--exclude EXCLUDE [EXCLUDE ...]] [--verbose] structures [structures ...]

  positional arguments:
    structures            PDB structures to restraint

  optional arguments:
    -h, --help            show this help message and exit
    --exclude EXCLUDE [EXCLUDE ...], -e EXCLUDE [EXCLUDE ...] Chains to exclude from the calculation
    --verbose, -v

restrain_ligand.py

Calculates distances between neighboring residues of a ligand molecule and produces a set of unambiguous distance restraints for HADDOCK to keep it in place during semi-flexible refinement. Produces, at most, one restraint per ligand atom.

Usage:

./restrain_ligand.py [-h] -l LIGAND [-p] pdbf

positional arguments:
  pdbf                  PDB file

optional arguments:
  -h, --help            show this help message and exit
  -l LIGAND, --ligand LIGAND
                        Ligand residue name
  -p, --pml             Write Pymol file with restraints

haddock_tbl_validation

The validate_tbl.py script in that directoy will check the correctness of your restraints (CNS format) for HADDOCK.

Usage:

usage: python validate_tbl.py [-h] [--pcs] file

This script validates a restraint file (*.tbl).

positional arguments:
  file        TBL file to be validated

  optional arguments:
    -h, --help  show this help message and exit
    --pcs       PCS mode

calc-accessibility.py

$ python3 haddock-CSB-tools/calc-accessibility.py -h                                                                                                                                                                                                               [17:06:52]
usage: calc-accessibility.py [-h] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--cutoff CUTOFF] pdb_input

positional arguments:
  pdb_input             PDB structure

optional arguments:
  -h, --help            show this help message and exit
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
  --cutoff CUTOFF       Relative cutoff for sidechain accessibility

$ python3 haddock-CSB-tools/calc-accessibility.py complex_1w.pdb --cutoff 0.4                                                                                                                                                                                      [17:10:51]
02/11/2020 17:10:57 L157 INFO - Calculate accessibility...
02/11/2020 17:10:57 L228 INFO - Chain: A - 115 residues
02/11/2020 17:10:57 L228 INFO - Chain: B - 81 residues
02/11/2020 17:10:57 L234 INFO - Applying cutoff to side_chain_rel - 0.4
02/11/2020 17:10:57 L244 INFO - Chain A - 82,83,84,85,86,87,88,90,91,94,95,98,99,102,104,106,109,113,116,117,118,122,128,129,130,132,139,141,144,145,148,149,150,151,153,156,158,160,162,163,167,168,169,170,171,173,174,175,176,178,179,180,181,183,184,186,188,194,196
02/11/2020 17:10:57 L244 INFO - Chain B - 1,2,4,5,8,11,12,15,18,21,23,24,25,26,27,30,31,33,34,37,38,41,43,44,45,46,47,50,63,64,67,69,70,73,74,76,77,78,79,80,81

create_cif.py

Converts the cluster*.pdb files in a run directory to IHM mmCIF format

Warning: Limited functionally, still work in progress! Tested for hetero-complexes with ambig restraints.

Needs ihm and biopython, install it with

$ pip install ihm --install-option="--without-ext"
$ pip install biopython
$ python3 create_cif.py -h
usage: create_cif.py [-h] run_directory

positional arguments:
  run_directory  Location of the uncompressed run, ex:
                 /home/rodrigo/runs/47498-protein-protein

optional arguments:
  -h, --help     show this help message and exit


$ python3 haddock-CSB-tools/create_cif.py ~/projects/cif_parser/47518-cif
[23/02/2021 13:45:25] INFO Converting the cluster*.pdb structures to .cif
[23/02/2021 13:45:25] INFO Looking for models in /Users/rodrigo/projects/cif_parser/47518-cif
[23/02/2021 13:45:25] INFO Found 4 structures
[23/02/2021 13:45:25] INFO Looking for the tblfile field in /Users/rodrigo/projects/cif_parser/47518-cif/job_params.json
[23/02/2021 13:45:25] INFO tblfile field found, extracting information
[23/02/2021 13:45:25] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_1.pdb
[23/02/2021 13:45:26] INFO Saving as cluster1_1.cif
[23/02/2021 13:45:26] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_2.pdb
[23/02/2021 13:45:26] INFO Saving as cluster1_2.cif
[23/02/2021 13:45:27] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_3.pdb
[23/02/2021 13:45:27] INFO Saving as cluster1_3.cif
[23/02/2021 13:45:28] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_4.pdb
[23/02/2021 13:45:29] INFO Saving as cluster1_4.cif

PDB-related

contact-segid

A c++ program to calculate all heavy atom interchain contacts (where the chain identification is taken from the segid) within a given distance cutoff in Angstrom.

Usage:

   contact-segid <pdb file> <cutoff>

contact-chainID

A c++ program to calculate all heavy atom interchain contacts (where the chain identification is taken from the chainID) within a given distance cutoff in Angstrom.

Usage:

   contact-chainID <pdb file> <cutoff>

molprobity.py

A python script to predict the protonation state of Histidine residues for HADDOCK. It uses molprobity for this, calling the Reduce software which should in the path.

Usage:

    ./molprobity.py <PDBfile>

Example:

./molprobity.py 1F3G.pdb
## Executing Reduce to assign histidine protonation states
## Input PDB: 1F3G.pdb
HIS ( 90 )	-->	HISD
HIS ( 75 )	-->	HISE

An optimized file is also written to disk, in this example it would be called 1F3G_optimized.pdb.

pdb_blank_chain

Simple perl script to remove the chainID from a PDB file

Usage:

    pdb_blank_chain inputfile > outputfile

pdb_blank_segid

Simple perl script to remove the segid from a PDB file

Usage:

    pdb_blank_segid inputfile > outputfile

pdb_blank_chain-segid

Simple perl script to remove both the chainID and segid from a PDB file

Usage:

    pdb_blank_chain-segid inputfile > outputfile

pdb_chain-to-segid

Simple perl script to copy the chainID to the segid in a PDB file

Usage:

    pdb_chain-to-segid inputfile > outputfile

pdb_segid-to-chain

Simple perl script to copy the segid to the chainID in a PDB file

Usage:

    pdb_segid-to-chain inputfile > outputfile

pdb_chain-segid

Simple perl script to copy the chainID to segid in case the latter is empty (or vice-verse) in a PDB file

Usage:

    pdb_chain-segid inputfile > outputfile

pdb_setchain

Simple perl script to set the chainID in a PDB file

Usage:

     pdb_setchain -v CHAIN=chainID inputfile > outputfile

joinpdb

Simple perl script to concatenate separate single structure PDB files into a multi-model PDB file. Usage:

     joinpdb  -o outputfile  [inputfiles]

    where inputfiles are a list of PDB files to be concatenated

pdb_mutate.py

A python script to mutate residues for HADDOCK. A mutation list file is used as input, and the output is/are corresponding PDB file(s) of mutant(s). The format of mutation in the mutation list file is "PDBid ChainID ResidueID ResidueNameWT ResidueNameMut".

Usage:

    ./pdb_mutate.py <mutation list file>

Example:

./pdb_mutate.py mut_1A22.list

## In  mut_1A22.list, the residue 14, 18 and 21 in chain A will be mutated to ALA:
## 1A22.pdb A 14 MET ALA
## 1A22.pdb A 18 HIS ALA
## 1A22.pdb A 21 HIS ALA

pdb_strict_format.py

A python script to check format of PDB files with respect to HADDOCK format rules. A PDB file is used as input, and the output is a console message if an error or a warning is triggered by a bad formmated line. The script uses wwPDB format guidelines wwwPDB guidelines and check resid against a list of known ligands and amino-acids recognized by HADDOCK.

Usage:

./pdb_strict_format.py [-h] [-nc] pdb

This script validates a PDB file (*.pdb).

positional arguments:
  pdb                PDB file

optional arguments:
  -h, --help         show this help message and exit
  -nc, --no_chainid  Ignore empty chain ids

param_to_json.py

A python script to transform a haddockparam.web file into a JSON structure. It is possible to use it as a class and then access extra functions like: change_value(key, value) ; update(subdict_to_replace) ; dump_keys() ; get_value(key) ; write_json()

Usage:

./param_to_json.py [-h] [-o OUTPUT] [-g GET] [-e [EXAMPLE]] web

This script parses a HADDOCK parameter file (*.web) and transforms it to JSON
format. It also allows to change a parameter of the haddockparam.web

positional arguments:
  web                   HADDOCK parameter file

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Path of JSON output file
  -g GET, --get GET     Get value of a particular parameter
  -e [EXAMPLE], --example [EXAMPLE]
                        Print an example

renumber_model.py

A python script to match chains of a model to a given reference. The numbering relationship is obtained via sequence alignment with BLOSUM62 matrix. This is only indicated for complexes with high similarity.

This script supports multi-chain complexes but expects the chains to match sequentially between the reference and the model.

Ref  Model
A   A
B   B
C   C

Usage:

$ python renumber_model.py example_data/renumber_model/ref.pdb example_data/renumber_model/to_refine.BL00010001.pdb

 [2022-07-13 16:29:05,492 renumber_model:L211 INFO] Getting sequence numbering relationship via BLOSUM62 alignment
 [2022-07-13 16:29:05,525 renumber_model:L114 DEBUG] Writing alignment to blosum62_A.aln
 [2022-07-13 16:29:05,526 renumber_model:L151 DEBUG] Sequence identity between chain A of example_data/ref.pdb and chain A of example_data/to_refine.BL00010001.pdb is 100.00%
 [2022-07-13 16:29:05,527 renumber_model:L114 DEBUG] Writing alignment to blosum62_C.aln
 [2022-07-13 16:29:05,528 renumber_model:L151 DEBUG] Sequence identity between chain C of example_data/ref.pdb and chain C of example_data/to_refine.BL00010001.pdb is 100.00%
 [2022-07-13 16:29:05,529 renumber_model:L114 DEBUG] Writing alignment to blosum62_D.aln
 [2022-07-13 16:29:05,529 renumber_model:L151 DEBUG] Sequence identity between chain D of example_data/ref.pdb and chain D of example_data/to_refine.BL00010001.pdb is 98.18%
 [2022-07-13 16:29:05,531 renumber_model:L114 DEBUG] Writing alignment to blosum62_E.aln
 [2022-07-13 16:29:05,531 renumber_model:L151 DEBUG] Sequence identity between chain E of example_data/ref.pdb and chain E of example_data/to_refine.BL00010001.pdb is 95.50%
 [2022-07-13 16:29:05,531 renumber_model:L213 INFO] Renumbering model according to numbering relationship
 [2022-07-13 16:29:05,531 renumber_model:L178 INFO] Renumbered model name: to_refine.BL00010001_renumbered.pdb
 [2022-07-13 16:29:05,539 renumber_model:L199 WARNING] Ignored residues [43, 82, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202] in model's chain D
 [2022-07-13 16:29:05,539 renumber_model:L199 WARNING] Ignored residues [17, 42, 77, 80, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237] in model's chain E
 [2022-07-13 16:29:05,539 renumber_model:L216 INFO] Renumbering complete
 [2022-07-13 16:29:05,540 renumber_model:L217 INFO] DO NOT trust this renumbering blindly!
 [2022-07-13 16:29:05,540 renumber_model:L218 INFO] Check the .aln files for more information

A _renumbered.pdb file is created in the same directory as the input file together with multiple blosum62_ChainID.aln files.

License

Apache Licence 2.0

haddock-tools's People

Contributors

amjjbonvin avatar jbibbe4 avatar joaorodrigues avatar mieczyslaw avatar mtrellet avatar pkoukos avatar rvhonorato avatar skelm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

haddock-tools's Issues

floats in restraints file

@amjjbonvin @JoaoRodrigues I have noticed that restrain_bodies.py after defining contact names, adds a distance then 0.0 and 0.0. restrain_ligands.py does similarly, but adds 1.0 1.0 after distance. For active-passive-to-ambig.py three floats 2.0 2.0 2.0 are hardcoded (so no real distance information?).

  1. Are those floats in use by CNS? If so, what's the meaning of distance=2.0 for ambig restraints?
  2. Do 0.0, 1.0, 2.0 mean anything or this is only to match a format?
  3. Can ambiguous restraints be created between small molecules as well or only between protein or DNA chains?

Thanks.

molprobity.py gives a TypeError

The bug
When I use the molprobity.py script to predict the His protonation states, it gives a TypeError (Python 3.9.4 on MacOS Catalina, 10.15.7):

## Executing Reduce to assign histidine protonation states
## Input PDB: e2aP2_1f3g-clean.pdb
Traceback (most recent call last):
File "/Users/jbibbe/software/haddock-tools/molprobity.py", line 143, in <module> fout.write(line+'\n')
TypeError: can't concat str to bytes

This same error also happens in Python 3.7.10.

When I use Python 2.7.13, I get the following error:

Traceback (most recent call last):
File "/Users/jbibbe/software/haddock-tools/molprobity.py", line 137, in <module> hadded, process_error = run_molprobity(open_fhandle)
File "/Users/jbibbe/software/haddock-tools/molprobity.py", line 69, in run_molprobity tmp_file.write(cmd_stdin)
TypeError: expected a string or other character buffer object

Quick fix
I fixed this for Python 3 by adding a line at line 138:
hadded = hadded.decode()
I haven’t tried to fix it for Python 2.7.13.

To Reproduce
molprobity.py file.pdb

Expected behavior
The script should return a list of the histidines in the protein structure, with their protonation states. Example:

## Executing Reduce to assign histidine protonation states
## Input PDB: 2J8S-renumbered-clean.pdb 
HIS ( 526 )	-->	HISD
HIS ( 1042 )	-->	HISD
HIS ( 2525 )	-->	HISD
HIS ( 4526 )	-->	HISD
HIS ( 338 )	-->	HISE
HIS ( 505 )	-->	HISE
HIS ( 525 )	-->	HISE

Desktop:

  • OS: MacOS Catalina (10.15.7)
  • Python Version: 3.9.4, 3.7.10 and 2.7.13.

HADDOCK local crashing during run.

I am running haddock-local iteratively for around 100 protein-protein complexes. After a few successful docking runs, the terminal freezes with the following error.

Pls share a possible solution. Thanks.

calculating structure 190
      queue command:
      /bin/csh MARK3_run11_it1_refine_190.job
Exception in thread Thread-1193:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 765, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/DATA/yogesh/alphaCov/haddock/haddock2.4-2021-05/Haddock/Main/QueueThread.py", line 10, in runqueuecommand
    process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1216, in _execute_child
    errpipe_read, errpipe_write = self.pipe_cloexec()
  File "/usr/lib64/python2.7/subprocess.py", line 1168, in pipe_cloexec
    r, w = os.pipe()
OSError: [Errno 24] Too many open files

restrain_ligand.py & restrain_bodies.py not compatible with Python 3

I am using the newest version at 9a08ad3. I noticed that restrain_ligand.py & restrain_bodies.py crash with Python 3.8.5 (was working fine with Python2):

  1. Syntax error:
File "/haddock-tools/restrain_ligand.py", line 23
    except ImportError, e:
  1. When above corrected (with 'as'), the next one would be:
File "haddock-tools/restrain_ligand.py", line 62, in <module>
    ligand_com = np.asarray(ligand_com, dtype=np.float32)
  File /lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'map'
  1. Can't compare integer and None (when verbose not given):
File "haddock-tools/restrain_bodies.py", line 193, in <module>
    elif args.verbose > 1:
TypeError: '>' not supported between instances of 'NoneType' and 'int'

New issues may come out when those above are corrected. This is required to make it compatible with HADDOCK 2.5.

Descriptive Headers

More of a recommendation than a real 'issue'. Each script should have its own description and usage written at the top of the file, and print this if ran without arguments or with -h or so. This avoids a lengthy list of usage/descriptions in the front page and makes the scripts much more user-friendly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.