Импортируем необходимые пакеты.
import sys
import modeller
import _modeller
import modeller.automodel
import nglview
Скачаем pdb лизоцима из радужной форели и последовательность лизоцима из виргинской американской куропатки.
%%bash
wget http://www.pdb.org/pdb/files/1lmp.pdb
wget http://www.uniprot.org/uniprot/P00700.fasta
--2017-12-09 16:09:11-- http://www.pdb.org/pdb/files/1lmp.pdb Resolving www.pdb.org (www.pdb.org)... 128.6.244.52 Connecting to www.pdb.org (www.pdb.org)|128.6.244.52|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.rcsb.org/pdb/files/1lmp.pdb [following] --2017-12-09 16:09:11-- https://www.rcsb.org/pdb/files/1lmp.pdb Resolving www.rcsb.org (www.rcsb.org)... 132.249.213.110 Connecting to www.rcsb.org (www.rcsb.org)|132.249.213.110|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: http://files.rcsb.org/view/1lmp.pdb [following] --2017-12-09 16:09:13-- http://files.rcsb.org/view/1lmp.pdb Resolving files.rcsb.org (files.rcsb.org)... 132.249.213.140 Connecting to files.rcsb.org (files.rcsb.org)|132.249.213.140|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: `1lmp.pdb.1' 0K .......... .......... .......... .......... .......... 121K 50K .......... .......... .......... .......... .......... 238K 100K .......... .......... ........ 137K=0.8s 2017-12-09 16:09:14 (155 KB/s) - `1lmp.pdb.1' saved [131301] --2017-12-09 16:09:14-- http://www.uniprot.org/uniprot/P00700.fasta Resolving www.uniprot.org (www.uniprot.org)... 128.175.240.211, 193.62.193.81 Connecting to www.uniprot.org (www.uniprot.org)|128.175.240.211|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 205 [text/plain] Saving to: `P00700.fasta.1' 0K 100% 20.7M=0s 2017-12-09 16:09:15 (20.7 MB/s) - `P00700.fasta.1' saved [205/205]
env=modeller.environ()
env.io.hetatm=True
# Создаем выравнивание
alignm=modeller.alignment(env)
alignm.append(file='P00700.fasta', align_codes='all',alignment_format='FASTA')
mdl = modeller.model(env, file='1lmp.pdb', model_segment=('FIRST:'+'A', 'LAST:'+'A'))
alignm.append_model(mdl, atom_files='1lmp.pdb', align_codes='1lmp')
alignm[0].code = 'model'
alignm[1].code = 'reference'
# Производим выравнивание
alignm.salign()
alignm.write(file='all_in_one.ali', alignment_format='PIR')
# Выбираем объект для моделирования
s = alignm[0]
pdb = alignm[1]
# Создаем объект automodel
a = modeller.automodel.automodel(env, alnfile='all_in_one.ali', knowns = pdb.code, sequence = s.code)
a.name='mod'+s.code
a.starting_model = 1
a.ending_model = 1
a.make()
MODELLER 9.19, 2017/07/19, r11078 PROTEIN STRUCTURE MODELLING BY SATISFACTION OF SPATIAL RESTRAINTS Copyright(c) 1989-2017 Andrej Sali All Rights Reserved Written by A. Sali with help from B. Webb, M.S. Madhusudhan, M-Y. Shen, G.Q. Dong, M.A. Marti-Renom, N. Eswar, F. Alber, M. Topf, B. Oliva, A. Fiser, R. Sanchez, B. Yerkovich, A. Badretdinov, F. Melo, J.P. Overington, E. Feyfant University of California, San Francisco, USA Rockefeller University, New York, USA Harvard University, Cambridge, USA Imperial Cancer Research Fund, London, UK Birkbeck College, University of London, London, UK Kind, OS, HostName, Kernel, Processor: 4, Linux shadbox 3.2.0-29-generic x86_64 Date and time of compilation : 2017/07/19 14:40:43 MODELLER executable type : x86_64-intel8 Job starting time (YY/MM/DD HH:MM:SS): 2017/12/09 16:08:53 read_pd_459W> Residue type NAG not recognized. 'automodel' model building will treat this residue as a rigid body. To use real parameters, add the residue type to ${LIB}/restyp.lib, its topology to ${LIB}/top_*.lib, and suitable forcefield parameters to ${LIB}/par.lib. rdpdb___459W> Residue type NDG not recognized. 'automodel' model building will treat this residue as a rigid body. To use real parameters, add the residue type to ${LIB}/restyp.lib, its topology to ${LIB}/top_*.lib, and suitable forcefield parameters to ${LIB}/par.lib. SALIGN_____> adding the next group to the alignment; iteration 1 fndatmi_285W> Only 129 residues out of 132 contain atoms of type CA (This is usually caused by non-standard residues, such as ligands, or by PDB files with missing atoms.) fndatmi_285W> Only 129 residues out of 132 contain atoms of type CA (This is usually caused by non-standard residues, such as ligands, or by PDB files with missing atoms.) check_ali___> Checking the sequence-structure alignment. Implied intrachain target CA(i)-CA(i+1) distances longer than 8.0 angstroms: ALN_POS TMPL RID1 RID2 NAM1 NAM2 DIST ---------------------------------------------- END OF TABLE read_to_681_> topology.submodel read from topology file: 3 fndatmi_285W> Only 129 residues out of 132 contain atoms of type CA (This is usually caused by non-standard residues, such as ligands, or by PDB files with missing atoms.) patch_s_522_> Number of disulfides patched in MODEL: 4 The following 1 residues contain 6-membered rings with poor geometries after transfer from templates. Rebuilding rings from internal coordinates: <Residue 34 (type PHE)> mdtrsr__446W> A potential that relies on one protein is used, yet you have at least one known structure available. MDT, not library, potential is used. iup2crm_280W> No topology library in memory or assigning a BLK residue. Default CHARMM atom type assigned: C1 --> CT2 This message is written only for the first such atom. 0 atoms in HETATM/BLK residues constrained to protein atoms within 2.30 angstroms and protein CA atoms within 10.00 angstroms 0 atoms in residues without defined topology constrained to be rigid bodies condens_443_> Restraints marked for deletion were removed. Total number of restraints before, now: 12088 11216 iupac_m_397W> Atoms were not swapped because of the uncertainty of how to handle the H atom. iupac_m_397W> Atoms were not swapped because of the uncertainty of how to handle the H atom. >> ENERGY; Differences between the model's features and restraints: Number of all residues in MODEL : 129 Number of all, selected real atoms : 998 998 Number of all, selected pseudo atoms : 0 0 Number of all static, selected restraints : 11216 11216 COVALENT_CYS : F NONBONDED_SEL_ATOMS : 1 Number of non-bonded pairs (excluding 1-2,1-3,1-4): 2138 Dynamic pairs routine : 2, NATM x NATM cell sorting Atomic shift for contacts update (UPDATE_DYNAMIC) : 0.390 LENNARD_JONES_SWITCH : 6.500 7.500 COULOMB_JONES_SWITCH : 6.500 7.500 RESIDUE_SPAN_RANGE : 0 99999 NLOGN_USE : 15 CONTACT_SHELL : 4.000 DYNAMIC_PAIRS,_SPHERE,_COULOMB,_LENNARD,_MODELLER : T T F F F SPHERE_STDV : 0.050 RADII_FACTOR : 0.820 Current energy : 714.0712 Summary of the restraint violations: NUM ... number of restraints. NUMVI ... number of restraints with RVIOL > VIOL_REPORT_CUT[i]. RVIOL ... relative difference from the best value. NUMVP ... number of restraints with -Ln(pdf) > VIOL_REPORT_CUT2[i]. RMS_1 ... RMS(feature, minimally_violated_basis_restraint, NUMB). RMS_2 ... RMS(feature, best_value, NUMB). MOL.PDF ... scaled contribution to -Ln(Molecular pdf). # RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i ------------------------------------------------------------------------------------------------------ 1 Bond length potential : 1018 0 0 0.006 0.006 11.612 1.000 2 Bond angle potential : 1377 0 6 2.161 2.161 131.42 1.000 3 Stereochemical cosine torsion poten: 656 0 24 47.763 47.763 232.03 1.000 4 Stereochemical improper torsion pot: 416 0 0 1.276 1.276 16.373 1.000 5 Soft-sphere overlap restraints : 2138 0 0 0.001 0.001 0.55872 1.000 6 Lennard-Jones 6-12 potential : 0 0 0 0.000 0.000 0.0000 1.000 7 Coulomb point-point electrostatic p: 0 0 0 0.000 0.000 0.0000 1.000 8 H-bonding potential : 0 0 0 0.000 0.000 0.0000 1.000 9 Distance restraints 1 (CA-CA) : 2404 0 0 0.107 0.107 33.792 1.000 10 Distance restraints 2 (N-O) : 2562 0 2 0.161 0.161 76.374 1.000 11 Mainchain Phi dihedral restraints : 0 0 0 0.000 0.000 0.0000 1.000 12 Mainchain Psi dihedral restraints : 0 0 0 0.000 0.000 0.0000 1.000 13 Mainchain Omega dihedral restraints: 128 0 3 4.757 4.757 34.153 1.000 14 Sidechain Chi_1 dihedral restraints: 105 0 0 58.104 58.104 11.912 1.000 15 Sidechain Chi_2 dihedral restraints: 73 0 0 69.261 69.261 32.700 1.000 16 Sidechain Chi_3 dihedral restraints: 26 0 1 87.355 87.355 20.242 1.000 17 Sidechain Chi_4 dihedral restraints: 17 0 0 84.750 84.750 12.145 1.000 18 Disulfide distance restraints : 4 0 0 0.008 0.008 0.40520E-01 1.000 19 Disulfide angle restraints : 8 0 0 2.612 2.612 1.2053 1.000 20 Disulfide dihedral angle restraints: 4 0 0 25.934 25.934 2.6060 1.000 21 Lower bound distance restraints : 0 0 0 0.000 0.000 0.0000 1.000 22 Upper bound distance restraints : 0 0 0 0.000 0.000 0.0000 1.000 23 Distance restraints 3 (SDCH-MNCH) : 1595 0 0 0.350 0.350 38.386 1.000 24 Sidechain Chi_5 dihedral restraints: 0 0 0 0.000 0.000 0.0000 1.000 25 Phi/Psi pair of dihedral restraints: 127 1 9 19.447 22.570 12.389 1.000 26 Distance restraints 4 (SDCH-SDCH) : 696 0 0 0.653 0.653 46.132 1.000 27 Distance restraints 5 (X-Y) : 0 0 0 0.000 0.000 0.0000 1.000 28 NMR distance restraints 6 (X-Y) : 0 0 0 0.000 0.000 0.0000 1.000 29 NMR distance restraints 7 (X-Y) : 0 0 0 0.000 0.000 0.0000 1.000 30 Minimal distance restraints : 0 0 0 0.000 0.000 0.0000 1.000 31 Non-bonded restraints : 0 0 0 0.000 0.000 0.0000 1.000 32 Atomic accessibility restraints : 0 0 0 0.000 0.000 0.0000 1.000 33 Atomic density restraints : 0 0 0 0.000 0.000 0.0000 1.000 34 Absolute position restraints : 0 0 0 0.000 0.000 0.0000 1.000 35 Dihedral angle difference restraint: 0 0 0 0.000 0.000 0.0000 1.000 36 GBSA implicit solvent potential : 0 0 0 0.000 0.000 0.0000 1.000 37 EM density fitting potential : 0 0 0 0.000 0.000 0.0000 1.000 38 SAXS restraints : 0 0 0 0.000 0.000 0.0000 1.000 39 Symmetry restraints : 0 0 0 0.000 0.000 0.0000 1.000 # Heavy relative violation of each residue is written to: model.V99990001 # The profile is NOT normalized by the number of restraints. # The profiles are smoothed over a window of residues: 1 # The sum of all numbers in the file: 10882.5967 List of the violated restraints: A restraint is violated when the relative difference from the best value (RVIOL) is larger than CUTOFF. ICSR ... index of a restraint in the current set. RESNO ... residue numbers of the first two atoms. ATM ... IUPAC atom names of the first two atoms. FEAT ... the value of the feature in the model. restr ... the mean of the basis restraint with the smallest difference from the model (local minimum). viol ... difference from the local minimum. rviol ... relative difference from the local minimum. RESTR ... the best value (global minimum). VIOL ... difference from the best value. RVIOL ... relative difference from the best value. ------------------------------------------------------------------------------------------------- Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000 # ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL 1 3584 101D 102G C N 772 774 60.56 82.20 86.23 5.23 -62.40 127.51 21.17 1 102G 102G N CA 774 775 -74.96 8.50 -41.20 report______> Distribution of short non-bonded contacts: DISTANCE1: 0.00 2.10 2.20 2.30 2.40 2.50 2.60 2.70 2.80 2.90 3.00 3.10 3.20 3.30 3.40 DISTANCE2: 2.10 2.20 2.30 2.40 2.50 2.60 2.70 2.80 2.90 3.00 3.10 3.20 3.30 3.40 3.50 FREQUENCY: 0 0 0 0 0 8 8 33 76 100 80 116 135 184 164 << end of ENERGY. >> Summary of successfully produced models: Filename molpdf ---------------------------------------- model.B99990001.pdb 714.07123
Посмотрим, что получилось.
nglview.show_structure_file('model.B99990001.pdb')
Добавим лиганд
stri = ''
for i in alignm[0].residues:
stri += i.code
stri += '...'
alignm.append_sequence(stri)
# Выбираем объект для моделирования
s = alignm[2]
pdb = alignm[1]
s.code = 'model_ligand'
alignm.salign()
alignm.write(file='all_in_one.ali', alignment_format='PIR')
# Создаем объект automodel
a = modeller.automodel.automodel(env, alnfile='all_in_one.ali', knowns= pdb.code , sequence = s.code )
a.name='mod'+s.code
a.starting_model = 1
a.ending_model = 1
a.make()
SALIGN_____> adding the next group to the alignment; iteration 1 SALIGN_____> adding the next group to the alignment; iteration 2 automodel__W> Topology and/or parameter libraries already in memory. These will be used instead of the automodel defaults. If this is not what you want, clear them before creating the automodel object with env.libs.topology.clear() and env.libs.parameters.clear() fndatmi_285W> Only 129 residues out of 132 contain atoms of type CA (This is usually caused by non-standard residues, such as ligands, or by PDB files with missing atoms.) fndatmi_285W> Only 129 residues out of 132 contain atoms of type CA (This is usually caused by non-standard residues, such as ligands, or by PDB files with missing atoms.) check_ali___> Checking the sequence-structure alignment. Implied intrachain target CA(i)-CA(i+1) distances longer than 8.0 angstroms: ALN_POS TMPL RID1 RID2 NAM1 NAM2 DIST ---------------------------------------------- END OF TABLE getf_______W> RTF restraint not found in the atoms list: residue type, indices: 10 129 atom names : C +N atom indices : 996 0 getf_______W> RTF restraint not found in the atoms list: residue type, indices: 10 129 atom names : C CA +N O atom indices : 996 991 0 997 fndatmi_285W> Only 129 residues out of 132 contain atoms of type CA (This is usually caused by non-standard residues, such as ligands, or by PDB files with missing atoms.) patch_s_522_> Number of disulfides patched in MODEL: 4 The following 1 residues contain 6-membered rings with poor geometries after transfer from templates. Rebuilding rings from internal coordinates: <Residue 34 (type PHE)> mdtrsr__446W> A potential that relies on one protein is used, yet you have at least one known structure available. MDT, not library, potential is used. iup2crm_280W> No topology library in memory or assigning a BLK residue. Default CHARMM atom type assigned: C1 --> CT2 This message is written only for the first such atom. 43 atoms in HETATM/BLK residues constrained to protein atoms within 2.30 angstroms and protein CA atoms within 10.00 angstroms 43 atoms in residues without defined topology constrained to be rigid bodies condens_443_> Restraints marked for deletion were removed. Total number of restraints before, now: 13489 12617 iupac_m_397W> Atoms were not swapped because of the uncertainty of how to handle the H atom. >> ENERGY; Differences between the model's features and restraints: Number of all residues in MODEL : 132 Number of all, selected real atoms : 1041 1041 Number of all, selected pseudo atoms : 0 0 Number of all static, selected restraints : 12617 12617 COVALENT_CYS : F NONBONDED_SEL_ATOMS : 1 Number of non-bonded pairs (excluding 1-2,1-3,1-4): 2306 Dynamic pairs routine : 2, NATM x NATM cell sorting Atomic shift for contacts update (UPDATE_DYNAMIC) : 0.390 LENNARD_JONES_SWITCH : 6.500 7.500 COULOMB_JONES_SWITCH : 6.500 7.500 RESIDUE_SPAN_RANGE : 0 99999 NLOGN_USE : 15 CONTACT_SHELL : 4.000 DYNAMIC_PAIRS,_SPHERE,_COULOMB,_LENNARD,_MODELLER : T T F F F SPHERE_STDV : 0.050 RADII_FACTOR : 0.820 Current energy : 718.2312 Summary of the restraint violations: NUM ... number of restraints. NUMVI ... number of restraints with RVIOL > VIOL_REPORT_CUT[i]. RVIOL ... relative difference from the best value. NUMVP ... number of restraints with -Ln(pdf) > VIOL_REPORT_CUT2[i]. RMS_1 ... RMS(feature, minimally_violated_basis_restraint, NUMB). RMS_2 ... RMS(feature, best_value, NUMB). MOL.PDF ... scaled contribution to -Ln(Molecular pdf). # RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i ------------------------------------------------------------------------------------------------------ 1 Bond length potential : 1018 0 0 0.007 0.007 12.176 1.000 2 Bond angle potential : 1377 0 6 2.156 2.156 132.36 1.000 3 Stereochemical cosine torsion poten: 656 0 24 48.002 48.002 235.21 1.000 4 Stereochemical improper torsion pot: 416 0 0 1.329 1.329 17.402 1.000 5 Soft-sphere overlap restraints : 2306 1 2 0.008 0.008 16.756 1.000 6 Lennard-Jones 6-12 potential : 0 0 0 0.000 0.000 0.0000 1.000 7 Coulomb point-point electrostatic p: 0 0 0 0.000 0.000 0.0000 1.000 8 H-bonding potential : 0 0 0 0.000 0.000 0.0000 1.000 9 Distance restraints 1 (CA-CA) : 2404 0 0 0.102 0.102 31.228 1.000 10 Distance restraints 2 (N-O) : 2562 0 0 0.146 0.146 68.217 1.000 11 Mainchain Phi dihedral restraints : 0 0 0 0.000 0.000 0.0000 1.000 12 Mainchain Psi dihedral restraints : 0 0 0 0.000 0.000 0.0000 1.000 13 Mainchain Omega dihedral restraints: 128 0 3 4.623 4.623 32.267 1.000 14 Sidechain Chi_1 dihedral restraints: 105 0 1 63.317 63.317 20.015 1.000 15 Sidechain Chi_2 dihedral restraints: 73 0 0 68.738 68.738 35.370 1.000 16 Sidechain Chi_3 dihedral restraints: 26 0 0 85.081 85.081 17.552 1.000 17 Sidechain Chi_4 dihedral restraints: 17 0 0 102.301 102.301 9.9326 1.000 18 Disulfide distance restraints : 4 0 0 0.006 0.006 0.26394E-01 1.000 19 Disulfide angle restraints : 8 0 0 3.234 3.234 1.8475 1.000 20 Disulfide dihedral angle restraints: 4 0 0 26.706 26.706 2.3136 1.000 21 Lower bound distance restraints : 0 0 0 0.000 0.000 0.0000 1.000 22 Upper bound distance restraints : 0 0 0 0.000 0.000 0.0000 1.000 23 Distance restraints 3 (SDCH-MNCH) : 1595 0 0 0.383 0.383 39.213 1.000 24 Sidechain Chi_5 dihedral restraints: 0 0 0 0.000 0.000 0.0000 1.000 25 Phi/Psi pair of dihedral restraints: 127 1 10 18.612 19.696 -0.63497E-01 1.000 26 Distance restraints 4 (SDCH-SDCH) : 696 0 1 0.556 0.556 33.124 1.000 27 Distance restraints 5 (X-Y) : 1401 0 0 0.030 0.030 13.287 1.000 28 NMR distance restraints 6 (X-Y) : 0 0 0 0.000 0.000 0.0000 1.000 29 NMR distance restraints 7 (X-Y) : 0 0 0 0.000 0.000 0.0000 1.000 30 Minimal distance restraints : 0 0 0 0.000 0.000 0.0000 1.000 31 Non-bonded restraints : 0 0 0 0.000 0.000 0.0000 1.000 32 Atomic accessibility restraints : 0 0 0 0.000 0.000 0.0000 1.000 33 Atomic density restraints : 0 0 0 0.000 0.000 0.0000 1.000 34 Absolute position restraints : 0 0 0 0.000 0.000 0.0000 1.000 35 Dihedral angle difference restraint: 0 0 0 0.000 0.000 0.0000 1.000 36 GBSA implicit solvent potential : 0 0 0 0.000 0.000 0.0000 1.000 37 EM density fitting potential : 0 0 0 0.000 0.000 0.0000 1.000 38 SAXS restraints : 0 0 0 0.000 0.000 0.0000 1.000 39 Symmetry restraints : 0 0 0 0.000 0.000 0.0000 1.000 # Heavy relative violation of each residue is written to: model_ligand.V99990001 # The profile is NOT normalized by the number of restraints. # The profiles are smoothed over a window of residues: 1 # The sum of all numbers in the file: 11280.1475 List of the violated restraints: A restraint is violated when the relative difference from the best value (RVIOL) is larger than CUTOFF. ICSR ... index of a restraint in the current set. RESNO ... residue numbers of the first two atoms. ATM ... IUPAC atom names of the first two atoms. FEAT ... the value of the feature in the model. restr ... the mean of the basis restraint with the smallest difference from the model (local minimum). viol ... difference from the local minimum. rviol ... relative difference from the local minimum. RESTR ... the best value (global minimum). VIOL ... difference from the best value. RVIOL ... relative difference from the best value. ------------------------------------------------------------------------------------------------- Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000 # ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL 1 3518 35E 36S C N 276 278 -134.27 -64.10 78.72 8.13 -64.10 78.72 8.13 1 36S 36S N CA 278 279 0.68 -35.00 -35.00 report______> Distribution of short non-bonded contacts: DISTANCE1: 0.00 2.10 2.20 2.30 2.40 2.50 2.60 2.70 2.80 2.90 3.00 3.10 3.20 3.30 3.40 DISTANCE2: 2.10 2.20 2.30 2.40 2.50 2.60 2.70 2.80 2.90 3.00 3.10 3.20 3.30 3.40 3.50 FREQUENCY: 0 0 0 0 0 8 7 37 75 124 111 128 159 171 174 << end of ENERGY. >> Summary of successfully produced models: Filename molpdf ---------------------------------------- model_ligand.B99990001.pdb 718.23120
Белок с лигандом:
nglview.show_structure_file('model_ligand.B99990001.pdb')