Deep Learning for Biology, Part I: Protein Language Models

biology
LLMs
deep-learning
Author

Weyland Joyner

Published

April 2, 2026

Introduction

This notebook is based off of Charles Ravarani and Natasha Latysheva’s excellent recent O’Reilly book, Deep Learning for Biology.

One interesting thing is that in this book they use PyTorch and JAX pretty interchangeably, often interspersed with each other. That was an interesting way for me to get some experience with JAX but I’m unclear on why they made that decision.

Otherwise this book is a really fun way to learn about deep learning outside of GPT-style natural language systems.

Protein Language Models

In this notebook we’ll import and use Meta’s ESM (Evolutionary Scale Modeling). It’s basically BERT with an amino acid vocabulary.

!pip install py3Dmol
!pip install adjustText
!pip install obonet
Requirement already satisfied: py3Dmol in /usr/local/lib/python3.12/dist-packages (2.5.4)
Requirement already satisfied: adjustText in /usr/local/lib/python3.12/dist-packages (1.3.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from adjustText) (2.0.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.12/dist-packages (from adjustText) (3.10.0)
Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from adjustText) (1.16.3)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (4.62.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (1.5.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (26.0)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (11.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (3.3.2)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.12/dist-packages (from matplotlib->adjustText) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.7->matplotlib->adjustText) (1.17.0)
Collecting obonet
  Downloading obonet-1.1.1-py3-none-any.whl.metadata (6.7 kB)
Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (from obonet) (3.6.1)
Downloading obonet-1.1.1-py3-none-any.whl (9.2 kB)
Installing collected packages: obonet
Successfully installed obonet-1.1.1
import py3Dmol
import requests
import jax
import torch
import seaborn as sns
from transformers import AutoTokenizer, EsmModel, EsmForMaskedLM
import pandas as pd
from sklearn.manifold import TSNE
from adjustText import adjust_text
from IPython.display import display, HTML
import matplotlib.pyplot as plt
import numpy as np
import os

Let’s Fetch Some Proteins

def fetch_protein_structure(pdb_id: str) -> str:
    url = f'http://files.rcsb.org/download/{pdb_id}.pdb'
    response = requests.get(url)
    return response.text
fetch_protein_structure('1BKV')
'HEADER    STRUCTURAL PROTEIN                      13-JUL-98   1BKV              \nTITLE     COLLAGEN                                                              \nCOMPND    MOL_ID: 1;                                                            \nCOMPND   2 MOLECULE: T3-785;                                                    \nCOMPND   3 CHAIN: A, B, C;                                                      \nCOMPND   4 ENGINEERED: YES;                                                     \nCOMPND   5 OTHER_DETAILS: SYNTHETIC PEPTIDE CONTAINS REGION FROM HUMAN TYPE III \nCOMPND   6 COLLAGEN                                                             \nSOURCE    MOL_ID: 1                                                             \nKEYWDS    COLLAGEN, HYDROXYPROLINE, HYDROGEN BONDING, TRIPLE HELIX, TYPE III    \nKEYWDS   2 COLLAGEN, STRUCTURAL PROTEIN                                         \nEXPDTA    X-RAY DIFFRACTION                                                     \nAUTHOR    R.Z.KRAMER,J.BELLA,P.MAYVILLE,B.BRODSKY,H.M.BERMAN                    \nREVDAT   7   03-APR-24 1BKV    1       REMARK LINK                              \nREVDAT   6   22-DEC-10 1BKV    1       LINK                                     \nREVDAT   5   24-FEB-09 1BKV    1       VERSN                                    \nREVDAT   4   01-APR-03 1BKV    1       JRNL                                     \nREVDAT   3   23-JUN-00 1BKV    3       SEQRES MODRES HET    LINK                \nREVDAT   3 2                   3       ATOM   DBREF                             \nREVDAT   2   27-APR-99 1BKV    1       JRNL                                     \nREVDAT   1   16-FEB-99 1BKV    0                                                \nJRNL        AUTH   R.Z.KRAMER,J.BELLA,P.MAYVILLE,B.BRODSKY,H.M.BERMAN           \nJRNL        TITL   SEQUENCE DEPENDENT CONFORMATIONAL VARIATIONS OF COLLAGEN     \nJRNL        TITL 2 TRIPLE-HELICAL STRUCTURE.                                    \nJRNL        REF    NAT.STRUCT.BIOL.              V.   6   454 1999              \nJRNL        REFN                   ISSN 1072-8368                               \nJRNL        PMID   10331873                                                     \nJRNL        DOI    10.1038/8259                                                 \nREMARK   2                                                                      \nREMARK   2 RESOLUTION.    2.00 ANGSTROMS.                                       \nREMARK   3                                                                      \nREMARK   3 REFINEMENT.                                                          \nREMARK   3   PROGRAM     : CNS 0.3                                              \nREMARK   3   AUTHORS     : BRUNGER,ADAMS,CLORE,DELANO,GROS,GROSSE-              \nREMARK   3               : KUNSTLEVE,JIANG,KUSZEWSKI,NILGES,PANNU,              \nREMARK   3               : READ,RICE,SIMONSON,WARREN                            \nREMARK   3                                                                      \nREMARK   3  REFINEMENT TARGET : NULL                                            \nREMARK   3                                                                      \nREMARK   3  DATA USED IN REFINEMENT.                                            \nREMARK   3   RESOLUTION RANGE HIGH (ANGSTROMS) : 2.00                           \nREMARK   3   RESOLUTION RANGE LOW  (ANGSTROMS) : 50.00                          \nREMARK   3   DATA CUTOFF            (SIGMA(F)) : 2.000                          \nREMARK   3   DATA CUTOFF HIGH         (ABS(F)) : NULL                           \nREMARK   3   DATA CUTOFF LOW          (ABS(F)) : NULL                           \nREMARK   3   COMPLETENESS (WORKING+TEST)   (%) : 92.0                           \nREMARK   3   NUMBER OF REFLECTIONS             : 4643                           \nREMARK   3                                                                      \nREMARK   3  FIT TO DATA USED IN REFINEMENT.                                     \nREMARK   3   CROSS-VALIDATION METHOD          : THROUGHOUT                      \nREMARK   3   FREE R VALUE TEST SET SELECTION  : RANDOM                          \nREMARK   3   R VALUE            (WORKING SET) : 0.228                           \nREMARK   3   FREE R VALUE                     : 0.277                           \nREMARK   3   FREE R VALUE TEST SET SIZE   (%) : 9.600                           \nREMARK   3   FREE R VALUE TEST SET COUNT      : 486                             \nREMARK   3   ESTIMATED ERROR OF FREE R VALUE  : NULL                            \nREMARK   3                                                                      \nREMARK   3  FIT IN THE HIGHEST RESOLUTION BIN.                                  \nREMARK   3   TOTAL NUMBER OF BINS USED           : 10                           \nREMARK   3   BIN RESOLUTION RANGE HIGH       (A) : 2.00                         \nREMARK   3   BIN RESOLUTION RANGE LOW        (A) : 2.07                         \nREMARK   3   BIN COMPLETENESS (WORKING+TEST) (%) : 90.00                        \nREMARK   3   REFLECTIONS IN BIN    (WORKING SET) : 348                          \nREMARK   3   BIN R VALUE           (WORKING SET) : 0.2650                       \nREMARK   3   BIN FREE R VALUE                    : 0.2810                       \nREMARK   3   BIN FREE R VALUE TEST SET SIZE  (%) : NULL                         \nREMARK   3   BIN FREE R VALUE TEST SET COUNT     : 29                           \nREMARK   3   ESTIMATED ERROR OF BIN FREE R VALUE : NULL                         \nREMARK   3                                                                      \nREMARK   3  NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT.                    \nREMARK   3   PROTEIN ATOMS            : 563                                     \nREMARK   3   NUCLEIC ACID ATOMS       : 0                                       \nREMARK   3   HETEROGEN ATOMS          : 18                                      \nREMARK   3   SOLVENT ATOMS            : 111                                     \nREMARK   3                                                                      \nREMARK   3  B VALUES.                                                           \nREMARK   3   FROM WILSON PLOT           (A**2) : NULL                           \nREMARK   3   MEAN B VALUE      (OVERALL, A**2) : 36.40                          \nREMARK   3   OVERALL ANISOTROPIC B VALUE.                                       \nREMARK   3    B11 (A**2) : -6.57600                                             \nREMARK   3    B22 (A**2) : 9.38300                                              \nREMARK   3    B33 (A**2) : -2.80800                                             \nREMARK   3    B12 (A**2) : 0.00000                                              \nREMARK   3    B13 (A**2) : -15.60100                                            \nREMARK   3    B23 (A**2) : 0.00000                                              \nREMARK   3                                                                      \nREMARK   3  ESTIMATED COORDINATE ERROR.                                         \nREMARK   3   ESD FROM LUZZATI PLOT        (A) : NULL                            \nREMARK   3   ESD FROM SIGMAA              (A) : NULL                            \nREMARK   3   LOW RESOLUTION CUTOFF        (A) : NULL                            \nREMARK   3                                                                      \nREMARK   3  CROSS-VALIDATED ESTIMATED COORDINATE ERROR.                         \nREMARK   3   ESD FROM C-V LUZZATI PLOT    (A) : NULL                            \nREMARK   3   ESD FROM C-V SIGMAA          (A) : NULL                            \nREMARK   3                                                                      \nREMARK   3  RMS DEVIATIONS FROM IDEAL VALUES.                                   \nREMARK   3   BOND LENGTHS                 (A) : 0.007                           \nREMARK   3   BOND ANGLES            (DEGREES) : 1.100                           \nREMARK   3   DIHEDRAL ANGLES        (DEGREES) : NULL                            \nREMARK   3   IMPROPER ANGLES        (DEGREES) : 1.030                           \nREMARK   3                                                                      \nREMARK   3  ISOTROPIC THERMAL MODEL : NULL                                      \nREMARK   3                                                                      \nREMARK   3  ISOTROPIC THERMAL FACTOR RESTRAINTS.    RMS    SIGMA                \nREMARK   3   MAIN-CHAIN BOND              (A**2) : NULL  ; NULL                 \nREMARK   3   MAIN-CHAIN ANGLE             (A**2) : NULL  ; NULL                 \nREMARK   3   SIDE-CHAIN BOND              (A**2) : NULL  ; NULL                 \nREMARK   3   SIDE-CHAIN ANGLE             (A**2) : NULL  ; NULL                 \nREMARK   3                                                                      \nREMARK   3  BULK SOLVENT MODELING.                                              \nREMARK   3   METHOD USED : NULL                                                 \nREMARK   3   KSOL        : NULL                                                 \nREMARK   3   BSOL        : NULL                                                 \nREMARK   3                                                                      \nREMARK   3  NCS MODEL : NULL                                                    \nREMARK   3                                                                      \nREMARK   3  NCS RESTRAINTS.                         RMS   SIGMA/WEIGHT          \nREMARK   3   GROUP  1  POSITIONAL            (A) : NULL  ; NULL                 \nREMARK   3   GROUP  1  B-FACTOR           (A**2) : NULL  ; NULL                 \nREMARK   3                                                                      \nREMARK   3  PARAMETER FILE  1  : PROTEIN_REP.PARAM                              \nREMARK   3  PARAMETER FILE  2  : NULL                                           \nREMARK   3  TOPOLOGY FILE  1   : TOPHCSDX.PRO                                   \nREMARK   3  TOPOLOGY FILE  2   : NULL                                           \nREMARK   3                                                                      \nREMARK   3  OTHER REFINEMENT REMARKS: ADDITIONAL PARAMETERS USED FOR            \nREMARK   3  HYDROXYPROLINE. THREE EXTREMELY STRONG REFLECTIONS (-3 1 1, 4 0     \nREMARK   3  2, AND 5 1 0) WERE UNDERESTIMATED AND HENCE WERE EXCLUDED FROM      \nREMARK   3  REFINEMENT.                                                         \nREMARK   4                                                                      \nREMARK   4 1BKV COMPLIES WITH FORMAT V. 3.30, 13-JUL-11                         \nREMARK 100                                                                      \nREMARK 100 THIS ENTRY HAS BEEN PROCESSED BY BNL.                                \nREMARK 100 THE DEPOSITION ID IS D_1000171873.                                   \nREMARK 200                                                                      \nREMARK 200 EXPERIMENTAL DETAILS                                                 \nREMARK 200  EXPERIMENT TYPE                : X-RAY DIFFRACTION                  \nREMARK 200  DATE OF DATA COLLECTION        : DEC-96                             \nREMARK 200  TEMPERATURE           (KELVIN) : 108.0                              \nREMARK 200  PH                             : 8.50                               \nREMARK 200  NUMBER OF CRYSTALS USED        : 1                                  \nREMARK 200                                                                      \nREMARK 200  SYNCHROTRON              (Y/N) : N                                  \nREMARK 200  RADIATION SOURCE               : ROTATING ANODE                     \nREMARK 200  BEAMLINE                       : NULL                               \nREMARK 200  X-RAY GENERATOR MODEL          : RIGAKU RUH2R                       \nREMARK 200  MONOCHROMATIC OR LAUE    (M/L) : M                                  \nREMARK 200  WAVELENGTH OR RANGE        (A) : 1.5418                             \nREMARK 200  MONOCHROMATOR                  : NULL                               \nREMARK 200  OPTICS                         : MIRRORS                            \nREMARK 200                                                                      \nREMARK 200  DETECTOR TYPE                  : IMAGE PLATE                        \nREMARK 200  DETECTOR MANUFACTURER          : RIGAKU RAXIS IV                    \nREMARK 200  INTENSITY-INTEGRATION SOFTWARE : DENZO                              \nREMARK 200  DATA SCALING SOFTWARE          : SCALEPACK                          \nREMARK 200                                                                      \nREMARK 200  NUMBER OF UNIQUE REFLECTIONS   : 4853                               \nREMARK 200  RESOLUTION RANGE HIGH      (A) : 2.000                              \nREMARK 200  RESOLUTION RANGE LOW       (A) : 50.000                             \nREMARK 200  REJECTION CRITERIA  (SIGMA(I)) : 0.000                              \nREMARK 200                                                                      \nREMARK 200 OVERALL.                                                             \nREMARK 200  COMPLETENESS FOR RANGE     (%) : 95.9                               \nREMARK 200  DATA REDUNDANCY                : NULL                               \nREMARK 200  R MERGE                    (I) : 0.05300                            \nREMARK 200  R SYM                      (I) : NULL                               \nREMARK 200  <I/SIGMA(I)> FOR THE DATA SET  : NULL                               \nREMARK 200                                                                      \nREMARK 200 IN THE HIGHEST RESOLUTION SHELL.                                     \nREMARK 200  HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 2.00                     \nREMARK 200  HIGHEST RESOLUTION SHELL, RANGE LOW  (A) : 2.06                     \nREMARK 200  COMPLETENESS FOR SHELL     (%) : 93.0                               \nREMARK 200  DATA REDUNDANCY IN SHELL       : NULL                               \nREMARK 200  R MERGE FOR SHELL          (I) : 0.13300                            \nREMARK 200  R SYM FOR SHELL            (I) : NULL                               \nREMARK 200  <I/SIGMA(I)> FOR SHELL         : NULL                               \nREMARK 200                                                                      \nREMARK 200 DIFFRACTION PROTOCOL: SINGLE WAVELENGTH                              \nREMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MOLECULAR REPLACEMENT        \nREMARK 200 SOFTWARE USED: CNS 0.3                                               \nREMARK 200 STARTING MODEL: IDEALIZED 7-FOLD TRIPLE HELIX                        \nREMARK 200                                                                      \nREMARK 200 REMARK: NULL                                                         \nREMARK 280                                                                      \nREMARK 280 CRYSTAL                                                              \nREMARK 280 SOLVENT CONTENT, VS   (%): 43.50                                     \nREMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.10                     \nREMARK 280                                                                      \nREMARK 280 CRYSTALLIZATION CONDITIONS: PH 8.50                                  \nREMARK 290                                                                      \nREMARK 290 CRYSTALLOGRAPHIC SYMMETRY                                            \nREMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: C 1 2 1                          \nREMARK 290                                                                      \nREMARK 290      SYMOP   SYMMETRY                                                \nREMARK 290     NNNMMM   OPERATOR                                                \nREMARK 290       1555   X,Y,Z                                                   \nREMARK 290       2555   -X,Y,-Z                                                 \nREMARK 290       3555   X+1/2,Y+1/2,Z                                           \nREMARK 290       4555   -X+1/2,Y+1/2,-Z                                         \nREMARK 290                                                                      \nREMARK 290     WHERE NNN -> OPERATOR NUMBER                                     \nREMARK 290           MMM -> TRANSLATION VECTOR                                  \nREMARK 290                                                                      \nREMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS                            \nREMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM             \nREMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY                \nREMARK 290 RELATED MOLECULES.                                                   \nREMARK 290   SMTRY1   1  1.000000  0.000000  0.000000        0.00000            \nREMARK 290   SMTRY2   1  0.000000  1.000000  0.000000        0.00000            \nREMARK 290   SMTRY3   1  0.000000  0.000000  1.000000        0.00000            \nREMARK 290   SMTRY1   2 -1.000000  0.000000  0.000000        0.00000            \nREMARK 290   SMTRY2   2  0.000000  1.000000  0.000000        0.00000            \nREMARK 290   SMTRY3   2  0.000000  0.000000 -1.000000        0.00000            \nREMARK 290   SMTRY1   3  1.000000  0.000000  0.000000       58.54500            \nREMARK 290   SMTRY2   3  0.000000  1.000000  0.000000        7.81450            \nREMARK 290   SMTRY3   3  0.000000  0.000000  1.000000        0.00000            \nREMARK 290   SMTRY1   4 -1.000000  0.000000  0.000000       58.54500            \nREMARK 290   SMTRY2   4  0.000000  1.000000  0.000000        7.81450            \nREMARK 290   SMTRY3   4  0.000000  0.000000 -1.000000        0.00000            \nREMARK 290                                                                      \nREMARK 290 REMARK: NULL                                                         \nREMARK 300                                                                      \nREMARK 300 BIOMOLECULE: 1                                                       \nREMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM                \nREMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN                  \nREMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON               \nREMARK 300 BURIED SURFACE AREA.                                                 \nREMARK 350                                                                      \nREMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN           \nREMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE                \nREMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS          \nREMARK 350 GIVEN BELOW.  BOTH NON-CRYSTALLOGRAPHIC AND                          \nREMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN.                               \nREMARK 350                                                                      \nREMARK 350 BIOMOLECULE: 1                                                       \nREMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: TRIMERIC                          \nREMARK 350 SOFTWARE DETERMINED QUATERNARY STRUCTURE: TRIMERIC                   \nREMARK 350 SOFTWARE USED: PISA                                                  \nREMARK 350 TOTAL BURIED SURFACE AREA: 5250 ANGSTROM**2                          \nREMARK 350 SURFACE AREA OF THE COMPLEX: 5260 ANGSTROM**2                        \nREMARK 350 CHANGE IN SOLVENT FREE ENERGY: -27.0 KCAL/MOL                        \nREMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C                               \nREMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000            \nREMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000            \nREMARK 350   BIOMT3   1  0.000000  0.000000  1.000000        0.00000            \nREMARK 375                                                                      \nREMARK 375 SPECIAL POSITION                                                     \nREMARK 375 THE FOLLOWING ATOMS ARE FOUND TO BE WITHIN 0.15 ANGSTROMS            \nREMARK 375 OF A SYMMETRY RELATED ATOM AND ARE ASSUMED TO BE ON SPECIAL          \nREMARK 375 POSITIONS.                                                           \nREMARK 375                                                                      \nREMARK 375 ATOM RES CSSEQI                                                      \nREMARK 375 C    ACY B 401  LIES ON A SPECIAL POSITION.                          \nREMARK 375 CH3  ACY B 401  LIES ON A SPECIAL POSITION.                          \nREMARK 375 C    ACY B 405  LIES ON A SPECIAL POSITION.                          \nREMARK 375 CH3  ACY B 405  LIES ON A SPECIAL POSITION.                          \nREMARK 375      HOH A 134  LIES ON A SPECIAL POSITION.                          \nREMARK 375      HOH A 189  LIES ON A SPECIAL POSITION.                          \nREMARK 375      HOH A 211  LIES ON A SPECIAL POSITION.                          \nREMARK 400                                                                      \nREMARK 400 COMPOUND                                                             \nREMARK 400 HYDROGEN BONDS BETWEEN PEPTIDE CHAINS FOLLOW THE RICH AND            \nREMARK 400 CRICK MODEL II FOR COLLAGEN.                                         \nREMARK 465                                                                      \nREMARK 465 MISSING RESIDUES                                                     \nREMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE                       \nREMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN               \nREMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)                \nREMARK 465                                                                      \nREMARK 465   M RES C SSSEQI                                                     \nREMARK 465     PRO A     1                                                      \nREMARK 600                                                                      \nREMARK 600 HETEROGEN                                                            \nREMARK 600                                                                      \nREMARK 600 TWO ACETIC ACID MOLECULES SIT ON A CRYSTALLOGRAPHIC TWO              \nREMARK 600 FOLD.  AS A RESULT THE ASYMMETRIC UNIT FOR THESE TWO ACETIC          \nREMARK 600 ACIDS CONTAIN ONLY ONE OXYGEN ATOM EACH.                             \nREMARK 610                                                                      \nREMARK 610 MISSING HETEROATOM                                                   \nREMARK 610 THE FOLLOWING RESIDUES HAVE MISSING ATOMS (M=MODEL NUMBER;           \nREMARK 610 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER;          \nREMARK 610 I=INSERTION CODE):                                                   \nREMARK 610   M RES C SSEQI                                                      \nREMARK 610     ACY B  401                                                       \nREMARK 610     ACY B  405                                                       \nREMARK 800                                                                      \nREMARK 800 SITE                                                                 \nREMARK 800 SITE_IDENTIFIER: AC1                                                 \nREMARK 800 EVIDENCE_CODE: SOFTWARE                                              \nREMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY B 401                 \nREMARK 800                                                                      \nREMARK 800 SITE_IDENTIFIER: AC2                                                 \nREMARK 800 EVIDENCE_CODE: SOFTWARE                                              \nREMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY C 402                 \nREMARK 800                                                                      \nREMARK 800 SITE_IDENTIFIER: AC3                                                 \nREMARK 800 EVIDENCE_CODE: SOFTWARE                                              \nREMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY B 403                 \nREMARK 800                                                                      \nREMARK 800 SITE_IDENTIFIER: AC4                                                 \nREMARK 800 EVIDENCE_CODE: SOFTWARE                                              \nREMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY B 404                 \nREMARK 800                                                                      \nREMARK 800 SITE_IDENTIFIER: AC5                                                 \nREMARK 800 EVIDENCE_CODE: SOFTWARE                                              \nREMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY B 405                 \nDBREF  1BKV A    1    30  PDB    1BKV     1BKV             1     30             \nDBREF  1BKV B   31    60  PDB    1BKV     1BKV            31     60             \nDBREF  1BKV C   61    90  PDB    1BKV     1BKV            61     90             \nSEQRES   1 A   30  PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA          \nSEQRES   2 A   30  ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP          \nSEQRES   3 A   30  GLY PRO HYP GLY                                              \nSEQRES   1 B   30  PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA          \nSEQRES   2 B   30  ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP          \nSEQRES   3 B   30  GLY PRO HYP GLY                                              \nSEQRES   1 C   30  PRO HYP GLY PRO HYP GLY PRO HYP GLY ILE THR GLY ALA          \nSEQRES   2 C   30  ARG GLY LEU ALA GLY PRO HYP GLY PRO HYP GLY PRO HYP          \nSEQRES   3 C   30  GLY PRO HYP GLY                                              \nMODRES 1BKV HYP A    2  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP A    5  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP A    8  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP A   20  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP A   23  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP A   26  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP A   29  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP B   32  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP B   35  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP B   38  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP B   50  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP B   53  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP B   56  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP B   59  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP C   62  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP C   65  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP C   68  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP C   80  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP C   83  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP C   86  PRO  4-HYDROXYPROLINE                                   \nMODRES 1BKV HYP C   89  PRO  4-HYDROXYPROLINE                                   \nHET    HYP  A   2       8                                                       \nHET    HYP  A   5       8                                                       \nHET    HYP  A   8       8                                                       \nHET    HYP  A  20       8                                                       \nHET    HYP  A  23       8                                                       \nHET    HYP  A  26       8                                                       \nHET    HYP  A  29       8                                                       \nHET    HYP  B  32       8                                                       \nHET    HYP  B  35       8                                                       \nHET    HYP  B  38       8                                                       \nHET    HYP  B  50       8                                                       \nHET    HYP  B  53       8                                                       \nHET    HYP  B  56       8                                                       \nHET    HYP  B  59       8                                                       \nHET    HYP  C  62       8                                                       \nHET    HYP  C  65       8                                                       \nHET    HYP  C  68       8                                                       \nHET    HYP  C  80       8                                                       \nHET    HYP  C  83       8                                                       \nHET    HYP  C  86       8                                                       \nHET    HYP  C  89       8                                                       \nHET    ACY  B 401       3                                                       \nHET    ACY  B 403       4                                                       \nHET    ACY  B 404       4                                                       \nHET    ACY  B 405       3                                                       \nHET    ACY  C 402       4                                                       \nHETNAM     HYP 4-HYDROXYPROLINE                                                 \nHETNAM     ACY ACETIC ACID                                                      \nHETSYN     HYP HYDROXYPROLINE                                                   \nFORMUL   1  HYP    21(C5 H9 N O3)                                               \nFORMUL   4  ACY    5(C2 H4 O2)                                                  \nFORMUL   9  HOH   *111(H2 O)                                                    \nLINK         C   HYP A   2                 N   GLY A   3     1555   1555  1.33  \nLINK         C   PRO A   4                 N   HYP A   5     1555   1555  1.35  \nLINK         C   HYP A   5                 N   GLY A   6     1555   1555  1.33  \nLINK         C   PRO A   7                 N   HYP A   8     1555   1555  1.34  \nLINK         C   HYP A   8                 N   GLY A   9     1555   1555  1.33  \nLINK         C   PRO A  19                 N   HYP A  20     1555   1555  1.35  \nLINK         C   HYP A  20                 N   GLY A  21     1555   1555  1.33  \nLINK         C   PRO A  22                 N   HYP A  23     1555   1555  1.34  \nLINK         C   HYP A  23                 N   GLY A  24     1555   1555  1.33  \nLINK         C   PRO A  25                 N   HYP A  26     1555   1555  1.34  \nLINK         C   HYP A  26                 N   GLY A  27     1555   1555  1.33  \nLINK         C   PRO A  28                 N   HYP A  29     1555   1555  1.35  \nLINK         C   HYP A  29                 N   GLY A  30     1555   1555  1.33  \nLINK         C   PRO B  31                 N   HYP B  32     1555   1555  1.34  \nLINK         C   HYP B  32                 N   GLY B  33     1555   1555  1.33  \nLINK         C   PRO B  34                 N   HYP B  35     1555   1555  1.35  \nLINK         C   HYP B  35                 N   GLY B  36     1555   1555  1.33  \nLINK         C   PRO B  37                 N   HYP B  38     1555   1555  1.34  \nLINK         C   HYP B  38                 N   GLY B  39     1555   1555  1.33  \nLINK         C   PRO B  49                 N   HYP B  50     1555   1555  1.34  \nLINK         C   HYP B  50                 N   GLY B  51     1555   1555  1.33  \nLINK         C   PRO B  52                 N   HYP B  53     1555   1555  1.34  \nLINK         C   HYP B  53                 N   GLY B  54     1555   1555  1.33  \nLINK         C   PRO B  55                 N   HYP B  56     1555   1555  1.34  \nLINK         C   HYP B  56                 N   GLY B  57     1555   1555  1.33  \nLINK         C   PRO B  58                 N   HYP B  59     1555   1555  1.34  \nLINK         C   HYP B  59                 N   GLY B  60     1555   1555  1.33  \nLINK         C   PRO C  61                 N   HYP C  62     1555   1555  1.34  \nLINK         C   HYP C  62                 N   GLY C  63     1555   1555  1.33  \nLINK         C   PRO C  64                 N   HYP C  65     1555   1555  1.34  \nLINK         C   HYP C  65                 N   GLY C  66     1555   1555  1.33  \nLINK         C   PRO C  67                 N   HYP C  68     1555   1555  1.34  \nLINK         C   HYP C  68                 N   GLY C  69     1555   1555  1.33  \nLINK         C   PRO C  79                 N   HYP C  80     1555   1555  1.34  \nLINK         C   HYP C  80                 N   GLY C  81     1555   1555  1.33  \nLINK         C   PRO C  82                 N   HYP C  83     1555   1555  1.34  \nLINK         C   HYP C  83                 N   GLY C  84     1555   1555  1.33  \nLINK         C   PRO C  85                 N   HYP C  86     1555   1555  1.34  \nLINK         C   HYP C  86                 N   GLY C  87     1555   1555  1.33  \nLINK         C   PRO C  88                 N   HYP C  89     1555   1555  1.35  \nLINK         C   HYP C  89                 N   GLY C  90     1555   1555  1.33  \nCISPEP   1 PRO B   58    HYP B   59          0         0.04                     \nSITE     1 AC1  4 ARG B  44  HOH B 112  HOH B 125  ACY B 405                    \nSITE     1 AC2  5 ARG A  14  ILE C  70  THR C  71  HOH C 145                    \nSITE     2 AC2  5 HOH C 195                                                     \nSITE     1 AC3  4 HOH B 111  ACY B 404  HOH C 117  HOH C 210                    \nSITE     1 AC4  6 LEU A  16  GLY B  51  PRO B  52  HYP B  53                    \nSITE     2 AC4  6 HOH B 119  ACY B 403                                          \nSITE     1 AC5  5 ALA B  43  ARG B  44  HOH B 125  HOH B 131                    \nSITE     2 AC5  5 ACY B 401                                                     \nCRYST1  117.090   15.629   39.715  90.00 104.46  90.00 C 1 2 1      12          \nORIGX1      1.000000  0.000000  0.000000        0.00000                         \nORIGX2      0.000000  1.000000  0.000000        0.00000                         \nORIGX3      0.000000  0.000000  1.000000        0.00000                         \nSCALE1      0.008540  0.000000  0.002202        0.00000                         \nSCALE2      0.000000  0.063984  0.000000        0.00000                         \nSCALE3      0.000000  0.000000  0.026003        0.00000                         \nHETATM    1  N   HYP A   2      26.451  17.823  31.262  0.80 58.37           N  \nHETATM    2  CA  HYP A   2      26.216  16.381  31.120  0.80 57.92           C  \nHETATM    3  C   HYP A   2      26.872  15.787  29.876  0.80 57.67           C  \nHETATM    4  O   HYP A   2      27.597  16.471  29.151  0.80 57.53           O  \nHETATM    5  CB  HYP A   2      24.692  16.266  31.068  0.80 57.93           C  \nHETATM    6  CG  HYP A   2      24.233  17.600  30.581  0.80 57.98           C  \nHETATM    7  CD  HYP A   2      25.203  18.596  31.154  0.80 58.25           C  \nHETATM    8  OD1 HYP A   2      22.926  17.876  31.079  0.80 57.65           O  \nATOM      9  N   GLY A   3      26.601  14.509  29.633  0.80 57.40           N  \nATOM     10  CA  GLY A   3      27.162  13.836  28.479  0.80 57.46           C  \nATOM     11  C   GLY A   3      28.390  13.010  28.819  0.80 57.80           C  \nATOM     12  O   GLY A   3      29.457  13.568  29.071  0.80 58.19           O  \nATOM     13  N   PRO A   4      28.271  11.673  28.844  1.00 57.77           N  \nATOM     14  CA  PRO A   4      29.409  10.806  29.159  1.00 58.02           C  \nATOM     15  C   PRO A   4      30.195  10.423  27.901  1.00 57.97           C  \nATOM     16  O   PRO A   4      29.614  10.243  26.828  1.00 58.52           O  \nATOM     17  CB  PRO A   4      28.752   9.593  29.813  1.00 57.51           C  \nATOM     18  CG  PRO A   4      27.380   9.521  29.174  1.00 56.47           C  \nATOM     19  CD  PRO A   4      27.046  10.892  28.600  1.00 57.40           C  \nHETATM   20  N   HYP A   5      31.532  10.308  28.015  1.00 57.11           N  \nHETATM   21  CA  HYP A   5      32.359   9.942  26.859  1.00 55.47           C  \nHETATM   22  C   HYP A   5      31.935   8.606  26.262  1.00 54.04           C  \nHETATM   23  O   HYP A   5      31.475   7.715  26.978  1.00 53.93           O  \nHETATM   24  CB  HYP A   5      33.774   9.893  27.431  1.00 56.19           C  \nHETATM   25  CG  HYP A   5      33.721  10.785  28.623  1.00 56.36           C  \nHETATM   26  CD  HYP A   5      32.359  10.532  29.212  1.00 56.90           C  \nHETATM   27  OD1 HYP A   5      33.822  12.155  28.210  1.00 56.15           O  \nATOM     28  N   GLY A   6      32.093   8.474  24.950  1.00 52.43           N  \nATOM     29  CA  GLY A   6      31.712   7.245  24.282  1.00 50.96           C  \nATOM     30  C   GLY A   6      32.655   6.084  24.538  1.00 50.13           C  \nATOM     31  O   GLY A   6      33.615   6.211  25.300  1.00 49.21           O  \nATOM     32  N   PRO A   7      32.402   4.930  23.904  1.00 49.25           N  \nATOM     33  CA  PRO A   7      33.244   3.743  24.073  1.00 49.03           C  \nATOM     34  C   PRO A   7      34.567   3.899  23.333  1.00 48.01           C  \nATOM     35  O   PRO A   7      34.653   4.618  22.334  1.00 48.56           O  \nATOM     36  CB  PRO A   7      32.403   2.622  23.474  1.00 49.64           C  \nATOM     37  CG  PRO A   7      31.611   3.309  22.404  1.00 49.67           C  \nATOM     38  CD  PRO A   7      31.296   4.687  22.960  1.00 48.81           C  \nHETATM   39  N   HYP A   8      35.622   3.228  23.815  1.00 46.01           N  \nHETATM   40  CA  HYP A   8      36.909   3.349  23.126  1.00 43.33           C  \nHETATM   41  C   HYP A   8      36.764   2.890  21.677  1.00 40.36           C  \nHETATM   42  O   HYP A   8      35.998   1.974  21.384  1.00 39.31           O  \nHETATM   43  CB  HYP A   8      37.837   2.447  23.939  1.00 44.01           C  \nHETATM   44  CG  HYP A   8      37.162   2.315  25.281  1.00 44.46           C  \nHETATM   45  CD  HYP A   8      35.701   2.343  24.989  1.00 45.23           C  \nHETATM   46  OD1 HYP A   8      37.504   3.425  26.112  1.00 44.81           O  \nATOM     47  N   GLY A   9      37.487   3.538  20.771  1.00 38.47           N  \nATOM     48  CA  GLY A   9      37.406   3.165  19.371  1.00 36.29           C  \nATOM     49  C   GLY A   9      37.916   1.759  19.135  1.00 35.63           C  \nATOM     50  O   GLY A   9      38.461   1.134  20.044  1.00 36.69           O  \nATOM     51  N   ILE A  10      37.738   1.252  17.919  1.00 34.41           N  \nATOM     52  CA  ILE A  10      38.203  -0.088  17.592  1.00 33.55           C  \nATOM     53  C   ILE A  10      39.723  -0.086  17.479  1.00 33.07           C  \nATOM     54  O   ILE A  10      40.333   0.930  17.139  1.00 32.34           O  \nATOM     55  CB  ILE A  10      37.614  -0.590  16.257  1.00 33.76           C  \nATOM     56  CG1 ILE A  10      38.148   0.254  15.101  1.00 34.52           C  \nATOM     57  CG2 ILE A  10      36.093  -0.534  16.306  1.00 35.04           C  \nATOM     58  CD1 ILE A  10      37.878  -0.337  13.730  1.00 34.71           C  \nATOM     59  N   THR A  11      40.325  -1.231  17.775  1.00 31.79           N  \nATOM     60  CA  THR A  11      41.770  -1.378  17.708  1.00 31.79           C  \nATOM     61  C   THR A  11      42.261  -1.242  16.266  1.00 30.50           C  \nATOM     62  O   THR A  11      41.560  -1.624  15.329  1.00 30.35           O  \nATOM     63  CB  THR A  11      42.182  -2.733  18.294  1.00 32.86           C  \nATOM     64  OG1 THR A  11      41.484  -2.930  19.531  1.00 33.44           O  \nATOM     65  CG2 THR A  11      43.680  -2.778  18.556  1.00 33.19           C  \nATOM     66  N   GLY A  12      43.456  -0.678  16.106  1.00 27.95           N  \nATOM     67  CA  GLY A  12      44.029  -0.464  14.789  1.00 26.83           C  \nATOM     68  C   GLY A  12      44.395  -1.735  14.047  1.00 27.17           C  \nATOM     69  O   GLY A  12      44.463  -2.820  14.636  1.00 27.45           O  \nATOM     70  N   ALA A  13      44.631  -1.600  12.746  1.00 25.14           N  \nATOM     71  CA  ALA A  13      44.994  -2.736  11.905  1.00 25.09           C  \nATOM     72  C   ALA A  13      46.437  -3.149  12.149  1.00 23.54           C  \nATOM     73  O   ALA A  13      47.242  -2.367  12.646  1.00 22.46           O  \nATOM     74  CB  ALA A  13      44.808  -2.381  10.435  1.00 26.15           C  \nATOM     75  N   ARG A  14      46.762  -4.384  11.792  1.00 23.65           N  \nATOM     76  CA  ARG A  14      48.114  -4.876  11.965  1.00 24.06           C  \nATOM     77  C   ARG A  14      49.022  -4.113  11.017  1.00 24.64           C  \nATOM     78  O   ARG A  14      48.594  -3.703   9.927  1.00 22.15           O  \nATOM     79  CB  ARG A  14      48.179  -6.372  11.651  1.00 24.43           C  \nATOM     80  CG  ARG A  14      49.164  -7.124  12.517  1.00 26.44           C  \nATOM     81  CD  ARG A  14      49.573  -8.449  11.898  1.00 22.21           C  \nATOM     82  NE  ARG A  14      50.792  -8.961  12.516  1.00 21.33           N  \nATOM     83  CZ  ARG A  14      50.989 -10.231  12.850  1.00 19.04           C  \nATOM     84  NH1 ARG A  14      50.048 -11.133  12.624  1.00 19.10           N  \nATOM     85  NH2 ARG A  14      52.132 -10.596  13.412  1.00 20.66           N  \nATOM     86  N   GLY A  15      50.271  -3.921  11.429  1.00 22.20           N  \nATOM     87  CA  GLY A  15      51.210  -3.220  10.578  1.00 23.26           C  \nATOM     88  C   GLY A  15      51.546  -4.036   9.337  1.00 23.11           C  \nATOM     89  O   GLY A  15      51.262  -5.233   9.260  1.00 21.10           O  \nATOM     90  N   LEU A  16      52.143  -3.373   8.359  1.00 23.03           N  \nATOM     91  CA  LEU A  16      52.551  -4.007   7.110  1.00 24.12           C  \nATOM     92  C   LEU A  16      53.664  -5.032   7.363  1.00 23.81           C  \nATOM     93  O   LEU A  16      54.486  -4.866   8.271  1.00 23.89           O  \nATOM     94  CB  LEU A  16      53.054  -2.924   6.150  1.00 26.78           C  \nATOM     95  CG  LEU A  16      53.391  -3.269   4.698  1.00 31.26           C  \nATOM     96  CD1 LEU A  16      52.110  -3.338   3.874  1.00 31.49           C  \nATOM     97  CD2 LEU A  16      54.332  -2.210   4.128  1.00 32.29           C  \nATOM     98  N   ALA A  17      53.696  -6.095   6.565  1.00 23.95           N  \nATOM     99  CA  ALA A  17      54.733  -7.111   6.720  1.00 22.28           C  \nATOM    100  C   ALA A  17      56.089  -6.453   6.483  1.00 22.38           C  \nATOM    101  O   ALA A  17      56.198  -5.500   5.718  1.00 20.24           O  \nATOM    102  CB  ALA A  17      54.520  -8.251   5.725  1.00 23.07           C  \nATOM    103  N   GLY A  18      57.121  -6.961   7.150  1.00 22.29           N  \nATOM    104  CA  GLY A  18      58.440  -6.385   6.992  1.00 21.79           C  \nATOM    105  C   GLY A  18      59.074  -6.648   5.636  1.00 23.03           C  \nATOM    106  O   GLY A  18      58.588  -7.480   4.870  1.00 21.78           O  \nATOM    107  N   PRO A  19      60.156  -5.927   5.299  1.00 23.48           N  \nATOM    108  CA  PRO A  19      60.856  -6.103   4.020  1.00 22.96           C  \nATOM    109  C   PRO A  19      61.670  -7.404   4.010  1.00 23.41           C  \nATOM    110  O   PRO A  19      61.911  -8.002   5.061  1.00 21.80           O  \nATOM    111  CB  PRO A  19      61.758  -4.881   3.947  1.00 22.85           C  \nATOM    112  CG  PRO A  19      62.057  -4.588   5.384  1.00 22.18           C  \nATOM    113  CD  PRO A  19      60.774  -4.866   6.112  1.00 20.03           C  \nHETATM  114  N   HYP A  20      62.070  -7.879   2.816  1.00 25.31           N  \nHETATM  115  CA  HYP A  20      62.858  -9.114   2.743  1.00 25.35           C  \nHETATM  116  C   HYP A  20      64.192  -8.942   3.444  1.00 25.13           C  \nHETATM  117  O   HYP A  20      64.732  -7.838   3.491  1.00 25.43           O  \nHETATM  118  CB  HYP A  20      63.054  -9.338   1.238  1.00 25.81           C  \nHETATM  119  CG  HYP A  20      61.968  -8.587   0.595  1.00 25.68           C  \nHETATM  120  CD  HYP A  20      61.786  -7.352   1.471  1.00 24.66           C  \nHETATM  121  OD1 HYP A  20      60.785  -9.382   0.612  1.00 26.17           O  \nATOM    122  N   GLY A  21      64.717 -10.036   3.982  1.00 25.65           N  \nATOM    123  CA  GLY A  21      66.000  -9.986   4.656  1.00 27.30           C  \nATOM    124  C   GLY A  21      67.111  -9.584   3.701  1.00 27.59           C  \nATOM    125  O   GLY A  21      66.874  -9.410   2.511  1.00 27.21           O  \nATOM    126  N   PRO A  22      68.341  -9.413   4.195  1.00 29.89           N  \nATOM    127  CA  PRO A  22      69.425  -9.027   3.291  1.00 31.44           C  \nATOM    128  C   PRO A  22      69.893 -10.204   2.448  1.00 33.09           C  \nATOM    129  O   PRO A  22      69.644 -11.361   2.792  1.00 31.48           O  \nATOM    130  CB  PRO A  22      70.519  -8.542   4.230  1.00 33.38           C  \nATOM    131  CG  PRO A  22      70.283  -9.320   5.490  1.00 32.28           C  \nATOM    132  CD  PRO A  22      68.800  -9.552   5.588  1.00 30.14           C  \nHETATM  133  N   HYP A  23      70.566  -9.918   1.321  1.00 33.94           N  \nHETATM  134  CA  HYP A  23      71.065 -10.984   0.447  1.00 35.24           C  \nHETATM  135  C   HYP A  23      71.903 -11.984   1.248  1.00 35.63           C  \nHETATM  136  O   HYP A  23      72.541 -11.622   2.244  1.00 36.58           O  \nHETATM  137  CB  HYP A  23      71.905 -10.246  -0.593  1.00 37.08           C  \nHETATM  138  CG  HYP A  23      71.451  -8.831  -0.552  1.00 37.74           C  \nHETATM  139  CD  HYP A  23      70.874  -8.573   0.808  1.00 35.34           C  \nHETATM  140  OD1 HYP A  23      70.448  -8.606  -1.559  1.00 40.15           O  \nATOM    141  N   GLY A  24      71.888 -13.242   0.818  1.00 35.75           N  \nATOM    142  CA  GLY A  24      72.647 -14.267   1.506  1.00 34.90           C  \nATOM    143  C   GLY A  24      74.142 -14.121   1.277  1.00 34.41           C  \nATOM    144  O   GLY A  24      74.568 -13.253   0.518  1.00 33.33           O  \nATOM    145  N   PRO A  25      74.966 -14.955   1.930  1.00 34.67           N  \nATOM    146  CA  PRO A  25      76.420 -14.893   1.773  1.00 35.68           C  \nATOM    147  C   PRO A  25      76.878 -15.528   0.462  1.00 37.82           C  \nATOM    148  O   PRO A  25      76.123 -16.259  -0.186  1.00 38.03           O  \nATOM    149  CB  PRO A  25      76.940 -15.650   2.991  1.00 34.52           C  \nATOM    150  CG  PRO A  25      75.871 -16.635   3.289  1.00 35.74           C  \nATOM    151  CD  PRO A  25      74.560 -16.014   2.873  1.00 35.57           C  \nHETATM  152  N   HYP A  26      78.128 -15.260   0.057  1.00 38.89           N  \nHETATM  153  CA  HYP A  26      78.661 -15.821  -1.187  1.00 39.69           C  \nHETATM  154  C   HYP A  26      78.575 -17.341  -1.198  1.00 40.31           C  \nHETATM  155  O   HYP A  26      78.630 -17.981  -0.148  1.00 40.06           O  \nHETATM  156  CB  HYP A  26      80.100 -15.312  -1.219  1.00 40.72           C  \nHETATM  157  CG  HYP A  26      80.070 -14.075  -0.341  1.00 40.26           C  \nHETATM  158  CD  HYP A  26      79.120 -14.419   0.748  1.00 39.04           C  \nHETATM  159  OD1 HYP A  26      79.553 -12.962  -1.072  1.00 41.27           O  \nATOM    160  N   GLY A  27      78.432 -17.911  -2.390  1.00 41.90           N  \nATOM    161  CA  GLY A  27      78.330 -19.354  -2.521  1.00 44.10           C  \nATOM    162  C   GLY A  27      79.675 -20.053  -2.477  1.00 46.30           C  \nATOM    163  O   GLY A  27      80.712 -19.392  -2.441  1.00 46.47           O  \nATOM    164  N   PRO A  28      79.694 -21.396  -2.484  1.00 47.76           N  \nATOM    165  CA  PRO A  28      80.952 -22.152  -2.442  1.00 48.29           C  \nATOM    166  C   PRO A  28      81.748 -22.018  -3.738  1.00 49.26           C  \nATOM    167  O   PRO A  28      81.196 -21.658  -4.779  1.00 48.56           O  \nATOM    168  CB  PRO A  28      80.505 -23.599  -2.192  1.00 47.67           C  \nATOM    169  CG  PRO A  28      79.025 -23.528  -1.874  1.00 47.23           C  \nATOM    170  CD  PRO A  28      78.522 -22.287  -2.538  1.00 47.55           C  \nHETATM  171  N   HYP A  29      83.062 -22.299  -3.687  1.00 50.91           N  \nHETATM  172  CA  HYP A  29      83.915 -22.205  -4.880  1.00 52.30           C  \nHETATM  173  C   HYP A  29      83.539 -23.258  -5.919  1.00 53.04           C  \nHETATM  174  O   HYP A  29      83.078 -24.346  -5.571  1.00 52.67           O  \nHETATM  175  CB  HYP A  29      85.332 -22.419  -4.340  1.00 52.51           C  \nHETATM  176  CG  HYP A  29      85.221 -22.274  -2.851  1.00 52.23           C  \nHETATM  177  CD  HYP A  29      83.831 -22.707  -2.500  1.00 51.46           C  \nHETATM  178  OD1 HYP A  29      85.409 -20.901  -2.488  1.00 53.27           O  \nATOM    179  N   GLY A  30      83.737 -22.930  -7.192  1.00 53.79           N  \nATOM    180  CA  GLY A  30      83.403 -23.862  -8.253  1.00 54.84           C  \nATOM    181  C   GLY A  30      84.464 -24.921  -8.477  1.00 55.58           C  \nATOM    182  O   GLY A  30      85.662 -24.614  -8.282  1.00 55.67           O  \nATOM    183  OXT GLY A  30      84.096 -26.059  -8.851  1.00 55.63           O  \nTER     184      GLY A  30                                                      \nATOM    185  N   PRO B  31      23.597  17.268  26.235  1.00 53.99           N  \nATOM    186  CA  PRO B  31      23.594  16.548  24.942  1.00 53.79           C  \nATOM    187  C   PRO B  31      24.551  15.362  25.049  1.00 53.39           C  \nATOM    188  O   PRO B  31      25.259  15.218  26.050  1.00 53.59           O  \nATOM    189  CB  PRO B  31      24.060  17.510  23.854  1.00 53.68           C  \nATOM    190  CG  PRO B  31      24.420  18.791  24.624  1.00 53.27           C  \nATOM    191  CD  PRO B  31      24.358  18.526  26.135  1.00 54.22           C  \nHETATM  192  N   HYP B  32      24.574  14.490  24.027  1.00 52.23           N  \nHETATM  193  CA  HYP B  32      25.479  13.339  24.080  1.00 51.36           C  \nHETATM  194  C   HYP B  32      26.904  13.739  24.462  1.00 50.73           C  \nHETATM  195  O   HYP B  32      27.375  14.819  24.095  1.00 50.09           O  \nHETATM  196  CB  HYP B  32      25.409  12.748  22.668  1.00 51.78           C  \nHETATM  197  CG  HYP B  32      24.629  13.741  21.843  1.00 51.66           C  \nHETATM  198  CD  HYP B  32      23.770  14.496  22.794  1.00 51.66           C  \nHETATM  199  OD1 HYP B  32      23.805  13.046  20.900  1.00 50.93           O  \nATOM    200  N   GLY B  33      27.581  12.866  25.203  1.00 49.77           N  \nATOM    201  CA  GLY B  33      28.943  13.143  25.621  1.00 47.87           C  \nATOM    202  C   GLY B  33      29.915  13.253  24.462  1.00 46.74           C  \nATOM    203  O   GLY B  33      29.524  13.113  23.303  1.00 46.13           O  \nATOM    204  N   PRO B  34      31.202  13.507  24.742  1.00 46.08           N  \nATOM    205  CA  PRO B  34      32.202  13.628  23.677  1.00 44.69           C  \nATOM    206  C   PRO B  34      32.641  12.263  23.142  1.00 43.81           C  \nATOM    207  O   PRO B  34      32.422  11.238  23.790  1.00 41.81           O  \nATOM    208  CB  PRO B  34      33.358  14.382  24.344  1.00 44.60           C  \nATOM    209  CG  PRO B  34      32.935  14.620  25.786  1.00 45.64           C  \nATOM    210  CD  PRO B  34      31.796  13.696  26.074  1.00 45.43           C  \nHETATM  211  N   HYP B  35      33.253  12.237  21.942  1.00 43.04           N  \nHETATM  212  CA  HYP B  35      33.714  10.977  21.348  1.00 41.84           C  \nHETATM  213  C   HYP B  35      34.592  10.208  22.330  1.00 40.20           C  \nHETATM  214  O   HYP B  35      35.250  10.807  23.178  1.00 40.68           O  \nHETATM  215  CB  HYP B  35      34.503  11.411  20.115  1.00 43.28           C  \nHETATM  216  CG  HYP B  35      34.051  12.791  19.810  1.00 44.66           C  \nHETATM  217  CD  HYP B  35      33.530  13.400  21.080  1.00 44.05           C  \nHETATM  218  OD1 HYP B  35      33.008  12.757  18.827  1.00 46.60           O  \nATOM    219  N   GLY B  36      34.596   8.885  22.210  1.00 38.10           N  \nATOM    220  CA  GLY B  36      35.396   8.063  23.098  1.00 37.10           C  \nATOM    221  C   GLY B  36      36.882   8.228  22.851  1.00 36.28           C  \nATOM    222  O   GLY B  36      37.287   8.982  21.964  1.00 35.78           O  \nATOM    223  N   PRO B  37      37.731   7.536  23.623  1.00 36.53           N  \nATOM    224  CA  PRO B  37      39.171   7.684  23.394  1.00 36.38           C  \nATOM    225  C   PRO B  37      39.599   6.844  22.198  1.00 35.16           C  \nATOM    226  O   PRO B  37      38.903   5.908  21.811  1.00 35.23           O  \nATOM    227  CB  PRO B  37      39.800   7.198  24.701  1.00 37.02           C  \nATOM    228  CG  PRO B  37      38.796   6.232  25.265  1.00 38.32           C  \nATOM    229  CD  PRO B  37      37.429   6.593  24.714  1.00 36.53           C  \nHETATM  230  N   HYP B  38      40.744   7.181  21.587  1.00 35.13           N  \nHETATM  231  CA  HYP B  38      41.240   6.432  20.428  1.00 33.27           C  \nHETATM  232  C   HYP B  38      41.390   4.948  20.743  1.00 31.89           C  \nHETATM  233  O   HYP B  38      41.650   4.571  21.884  1.00 32.77           O  \nHETATM  234  CB  HYP B  38      42.579   7.099  20.117  1.00 33.98           C  \nHETATM  235  CG  HYP B  38      42.448   8.480  20.695  1.00 32.48           C  \nHETATM  236  CD  HYP B  38      41.635   8.302  21.939  1.00 34.70           C  \nHETATM  237  OD1 HYP B  38      41.735   9.309  19.792  1.00 36.28           O  \nATOM    238  N   GLY B  39      41.203   4.108  19.734  1.00 31.73           N  \nATOM    239  CA  GLY B  39      41.325   2.680  19.940  1.00 31.28           C  \nATOM    240  C   GLY B  39      42.769   2.281  20.183  1.00 31.44           C  \nATOM    241  O   GLY B  39      43.692   3.061  19.939  1.00 30.69           O  \nATOM    242  N   ILE B  40      42.968   1.066  20.681  1.00 30.81           N  \nATOM    243  CA  ILE B  40      44.308   0.567  20.941  1.00 30.08           C  \nATOM    244  C   ILE B  40      45.093   0.507  19.632  1.00 28.71           C  \nATOM    245  O   ILE B  40      44.546   0.169  18.585  1.00 26.65           O  \nATOM    246  CB  ILE B  40      44.263  -0.841  21.565  1.00 30.13           C  \nATOM    247  CG1 ILE B  40      43.366  -0.830  22.807  1.00 32.64           C  \nATOM    248  CG2 ILE B  40      45.669  -1.294  21.946  1.00 32.07           C  \nATOM    249  CD1 ILE B  40      43.778   0.183  23.875  1.00 33.96           C  \nATOM    250  N   THR B  41      46.372   0.851  19.701  1.00 27.09           N  \nATOM    251  CA  THR B  41      47.247   0.831  18.536  1.00 25.42           C  \nATOM    252  C   THR B  41      47.391  -0.593  17.990  1.00 25.08           C  \nATOM    253  O   THR B  41      47.447  -1.546  18.760  1.00 24.71           O  \nATOM    254  CB  THR B  41      48.636   1.378  18.912  1.00 24.13           C  \nATOM    255  OG1 THR B  41      48.501   2.748  19.312  1.00 24.13           O  \nATOM    256  CG2 THR B  41      49.606   1.280  17.736  1.00 23.01           C  \nATOM    257  N   GLY B  42      47.445  -0.727  16.665  1.00 23.54           N  \nATOM    258  CA  GLY B  42      47.583  -2.034  16.046  1.00 21.44           C  \nATOM    259  C   GLY B  42      48.892  -2.745  16.354  1.00 23.15           C  \nATOM    260  O   GLY B  42      49.864  -2.131  16.806  1.00 23.32           O  \nATOM    261  N   ALA B  43      48.920  -4.051  16.110  1.00 23.46           N  \nATOM    262  CA  ALA B  43      50.110  -4.855  16.361  1.00 22.57           C  \nATOM    263  C   ALA B  43      51.156  -4.696  15.262  1.00 22.56           C  \nATOM    264  O   ALA B  43      50.836  -4.350  14.124  1.00 23.35           O  \nATOM    265  CB  ALA B  43      49.722  -6.320  16.489  1.00 22.88           C  \nATOM    266  N   ARG B  44      52.412  -4.960  15.607  1.00 21.65           N  \nATOM    267  CA  ARG B  44      53.496  -4.873  14.641  1.00 24.19           C  \nATOM    268  C   ARG B  44      53.271  -5.907  13.536  1.00 22.51           C  \nATOM    269  O   ARG B  44      52.703  -6.976  13.777  1.00 20.49           O  \nATOM    270  CB  ARG B  44      54.843  -5.139  15.327  1.00 25.52           C  \nATOM    271  CG  ARG B  44      56.054  -4.750  14.474  1.00 28.44           C  \nATOM    272  CD  ARG B  44      57.372  -4.967  15.213  1.00 26.34           C  \nATOM    273  NE  ARG B  44      57.692  -3.852  16.099  1.00 27.12           N  \nATOM    274  CZ  ARG B  44      58.210  -2.700  15.686  1.00 26.18           C  \nATOM    275  NH1 ARG B  44      58.461  -2.510  14.399  1.00 25.87           N  \nATOM    276  NH2 ARG B  44      58.472  -1.741  16.556  1.00 24.56           N  \nATOM    277  N   GLY B  45      53.707  -5.590  12.321  1.00 23.14           N  \nATOM    278  CA  GLY B  45      53.526  -6.528  11.224  1.00 22.99           C  \nATOM    279  C   GLY B  45      54.382  -7.784  11.300  1.00 22.87           C  \nATOM    280  O   GLY B  45      55.350  -7.863  12.064  1.00 23.80           O  \nATOM    281  N   LEU B  46      54.030  -8.787  10.507  1.00 22.87           N  \nATOM    282  CA  LEU B  46      54.809 -10.019  10.498  1.00 22.06           C  \nATOM    283  C   LEU B  46      56.212  -9.704   9.978  1.00 20.37           C  \nATOM    284  O   LEU B  46      56.411  -8.729   9.252  1.00 19.66           O  \nATOM    285  CB  LEU B  46      54.127 -11.070   9.612  1.00 21.71           C  \nATOM    286  CG  LEU B  46      52.844 -11.668  10.202  1.00 22.29           C  \nATOM    287  CD1 LEU B  46      51.945 -12.219   9.103  1.00 21.21           C  \nATOM    288  CD2 LEU B  46      53.216 -12.751  11.186  1.00 19.71           C  \nATOM    289  N   ALA B  47      57.196 -10.510  10.357  1.00 21.04           N  \nATOM    290  CA  ALA B  47      58.550 -10.270   9.873  1.00 21.13           C  \nATOM    291  C   ALA B  47      58.546 -10.517   8.360  1.00 20.17           C  \nATOM    292  O   ALA B  47      57.731 -11.286   7.854  1.00 22.28           O  \nATOM    293  CB  ALA B  47      59.532 -11.206  10.561  1.00 23.11           C  \nATOM    294  N   GLY B  48      59.435  -9.852   7.639  1.00 19.15           N  \nATOM    295  CA  GLY B  48      59.479 -10.046   6.203  1.00 20.31           C  \nATOM    296  C   GLY B  48      60.038 -11.410   5.837  1.00 20.94           C  \nATOM    297  O   GLY B  48      60.484 -12.156   6.708  1.00 20.57           O  \nATOM    298  N   PRO B  49      60.014 -11.773   4.548  1.00 21.07           N  \nATOM    299  CA  PRO B  49      60.531 -13.060   4.081  1.00 22.49           C  \nATOM    300  C   PRO B  49      62.057 -13.097   4.036  1.00 22.42           C  \nATOM    301  O   PRO B  49      62.720 -12.070   4.197  1.00 23.46           O  \nATOM    302  CB  PRO B  49      59.930 -13.208   2.670  1.00 23.85           C  \nATOM    303  CG  PRO B  49      58.981 -12.034   2.499  1.00 24.11           C  \nATOM    304  CD  PRO B  49      59.455 -10.980   3.443  1.00 20.81           C  \nHETATM  305  N   HYP B  50      62.635 -14.295   3.841  1.00 23.87           N  \nHETATM  306  CA  HYP B  50      64.095 -14.414   3.767  1.00 22.77           C  \nHETATM  307  C   HYP B  50      64.635 -13.585   2.602  1.00 23.55           C  \nHETATM  308  O   HYP B  50      63.955 -13.401   1.588  1.00 22.12           O  \nHETATM  309  CB  HYP B  50      64.329 -15.904   3.536  1.00 22.58           C  \nHETATM  310  CG  HYP B  50      63.088 -16.571   4.003  1.00 24.90           C  \nHETATM  311  CD  HYP B  50      61.972 -15.606   3.731  1.00 23.69           C  \nHETATM  312  OD1 HYP B  50      63.182 -16.826   5.412  1.00 28.19           O  \nATOM    313  N   GLY B  51      65.855 -13.086   2.754  1.00 23.66           N  \nATOM    314  CA  GLY B  51      66.465 -12.301   1.699  1.00 26.27           C  \nATOM    315  C   GLY B  51      66.846 -13.147   0.491  1.00 26.99           C  \nATOM    316  O   GLY B  51      66.747 -14.372   0.526  1.00 25.52           O  \nATOM    317  N   PRO B  52      67.286 -12.514  -0.602  1.00 28.56           N  \nATOM    318  CA  PRO B  52      67.675 -13.251  -1.808  1.00 29.66           C  \nATOM    319  C   PRO B  52      68.920 -14.101  -1.570  1.00 30.97           C  \nATOM    320  O   PRO B  52      69.668 -13.868  -0.627  1.00 30.33           O  \nATOM    321  CB  PRO B  52      67.926 -12.153  -2.834  1.00 29.39           C  \nATOM    322  CG  PRO B  52      68.278 -10.957  -2.029  1.00 29.52           C  \nATOM    323  CD  PRO B  52      67.455 -11.061  -0.766  1.00 29.01           C  \nHETATM  324  N   HYP B  53      69.137 -15.122  -2.409  1.00 32.39           N  \nHETATM  325  CA  HYP B  53      70.313 -15.983  -2.255  1.00 33.77           C  \nHETATM  326  C   HYP B  53      71.587 -15.166  -2.461  1.00 33.52           C  \nHETATM  327  O   HYP B  53      71.580 -14.170  -3.179  1.00 33.49           O  \nHETATM  328  CB  HYP B  53      70.134 -17.046  -3.346  1.00 34.10           C  \nHETATM  329  CG  HYP B  53      68.670 -16.990  -3.703  1.00 34.15           C  \nHETATM  330  CD  HYP B  53      68.286 -15.552  -3.529  1.00 33.61           C  \nHETATM  331  OD1 HYP B  53      67.916 -17.803  -2.797  1.00 35.88           O  \nATOM    332  N   GLY B  54      72.674 -15.586  -1.827  1.00 33.72           N  \nATOM    333  CA  GLY B  54      73.923 -14.868  -1.982  1.00 34.34           C  \nATOM    334  C   GLY B  54      74.474 -14.970  -3.393  1.00 35.06           C  \nATOM    335  O   GLY B  54      73.930 -15.701  -4.222  1.00 35.07           O  \nATOM    336  N   PRO B  55      75.555 -14.239  -3.702  1.00 35.44           N  \nATOM    337  CA  PRO B  55      76.139 -14.292  -5.044  1.00 36.28           C  \nATOM    338  C   PRO B  55      76.893 -15.603  -5.253  1.00 36.34           C  \nATOM    339  O   PRO B  55      77.138 -16.350  -4.303  1.00 35.31           O  \nATOM    340  CB  PRO B  55      77.066 -13.084  -5.074  1.00 35.93           C  \nATOM    341  CG  PRO B  55      77.504 -12.939  -3.653  1.00 36.89           C  \nATOM    342  CD  PRO B  55      76.298 -13.323  -2.818  1.00 35.83           C  \nHETATM  343  N   HYP B  56      77.265 -15.903  -6.505  1.00 37.30           N  \nHETATM  344  CA  HYP B  56      77.994 -17.146  -6.780  1.00 37.41           C  \nHETATM  345  C   HYP B  56      79.368 -17.112  -6.139  1.00 36.70           C  \nHETATM  346  O   HYP B  56      79.955 -16.045  -5.992  1.00 37.04           O  \nHETATM  347  CB  HYP B  56      78.081 -17.183  -8.308  1.00 37.72           C  \nHETATM  348  CG  HYP B  56      77.043 -16.207  -8.777  1.00 37.27           C  \nHETATM  349  CD  HYP B  56      77.036 -15.133  -7.737  1.00 37.08           C  \nHETATM  350  OD1 HYP B  56      75.768 -16.849  -8.809  1.00 38.99           O  \nATOM    351  N   GLY B  57      79.874 -18.276  -5.749  1.00 37.69           N  \nATOM    352  CA  GLY B  57      81.190 -18.332  -5.140  1.00 40.68           C  \nATOM    353  C   GLY B  57      82.283 -18.077  -6.163  1.00 42.20           C  \nATOM    354  O   GLY B  57      81.991 -17.893  -7.343  1.00 41.04           O  \nATOM    355  N   PRO B  58      83.557 -18.043  -5.745  1.00 44.65           N  \nATOM    356  CA  PRO B  58      84.618 -17.802  -6.727  1.00 47.65           C  \nATOM    357  C   PRO B  58      84.710 -18.991  -7.685  1.00 50.32           C  \nATOM    358  O   PRO B  58      84.130 -20.045  -7.428  1.00 50.19           O  \nATOM    359  CB  PRO B  58      85.876 -17.654  -5.872  1.00 47.00           C  \nATOM    360  CG  PRO B  58      85.562 -18.391  -4.605  1.00 46.58           C  \nATOM    361  CD  PRO B  58      84.086 -18.225  -4.382  1.00 45.54           C  \nHETATM  362  N   HYP B  59      85.439 -18.840  -8.802  1.00 53.39           N  \nHETATM  363  CA  HYP B  59      86.187 -17.670  -9.260  1.00 55.01           C  \nHETATM  364  C   HYP B  59      85.376 -16.813 -10.229  1.00 56.07           C  \nHETATM  365  O   HYP B  59      84.155 -16.937 -10.310  1.00 56.38           O  \nHETATM  366  CB  HYP B  59      87.414 -18.288  -9.938  1.00 56.54           C  \nHETATM  367  CG  HYP B  59      87.105 -19.814 -10.038  1.00 55.92           C  \nHETATM  368  CD  HYP B  59      85.649 -19.958  -9.729  1.00 55.22           C  \nHETATM  369  OD1 HYP B  59      87.861 -20.514  -9.048  1.00 58.34           O  \nATOM    370  N   GLY B  60      86.067 -15.951 -10.968  1.00 56.86           N  \nATOM    371  CA  GLY B  60      85.395 -15.089 -11.923  1.00 58.19           C  \nATOM    372  C   GLY B  60      84.782 -15.860 -13.078  1.00 58.31           C  \nATOM    373  O   GLY B  60      85.534 -16.537 -13.809  1.00 58.29           O  \nATOM    374  OXT GLY B  60      83.547 -15.786 -13.257  1.00 58.82           O  \nTER     375      GLY B  60                                                      \nATOM    376  N   PRO C  61      23.056  11.379  26.782  1.00 60.93           N  \nATOM    377  CA  PRO C  61      23.468  10.032  26.324  1.00 61.00           C  \nATOM    378  C   PRO C  61      24.980   9.987  26.112  1.00 60.57           C  \nATOM    379  O   PRO C  61      25.627  11.026  25.965  1.00 60.97           O  \nATOM    380  CB  PRO C  61      22.749   9.738  25.017  1.00 60.54           C  \nATOM    381  CG  PRO C  61      22.355  11.123  24.529  1.00 60.80           C  \nATOM    382  CD  PRO C  61      22.346  12.106  25.714  1.00 60.89           C  \nHETATM  383  N   HYP C  62      25.566   8.779  26.106  1.00 59.44           N  \nHETATM  384  CA  HYP C  62      27.015   8.656  25.903  1.00 57.55           C  \nHETATM  385  C   HYP C  62      27.455   9.213  24.547  1.00 54.87           C  \nHETATM  386  O   HYP C  62      26.657   9.297  23.610  1.00 54.23           O  \nHETATM  387  CB  HYP C  62      27.272   7.151  26.016  1.00 58.94           C  \nHETATM  388  CG  HYP C  62      26.073   6.602  26.730  1.00 59.80           C  \nHETATM  389  CD  HYP C  62      24.928   7.467  26.303  1.00 60.22           C  \nHETATM  390  OD1 HYP C  62      26.257   6.707  28.148  1.00 60.21           O  \nATOM    391  N   GLY C  63      28.727   9.588  24.451  1.00 51.64           N  \nATOM    392  CA  GLY C  63      29.243  10.131  23.209  1.00 47.77           C  \nATOM    393  C   GLY C  63      29.541   9.061  22.175  1.00 44.62           C  \nATOM    394  O   GLY C  63      29.449   7.871  22.472  1.00 43.75           O  \nATOM    395  N   PRO C  64      29.896   9.454  20.943  1.00 42.37           N  \nATOM    396  CA  PRO C  64      30.198   8.469  19.899  1.00 41.59           C  \nATOM    397  C   PRO C  64      31.498   7.727  20.200  1.00 40.89           C  \nATOM    398  O   PRO C  64      32.272   8.142  21.067  1.00 40.04           O  \nATOM    399  CB  PRO C  64      30.297   9.307  18.626  1.00 41.80           C  \nATOM    400  CG  PRO C  64      30.685  10.667  19.103  1.00 41.29           C  \nATOM    401  CD  PRO C  64      30.043  10.838  20.460  1.00 41.86           C  \nHETATM  402  N   HYP C  65      31.748   6.612  19.497  1.00 39.48           N  \nHETATM  403  CA  HYP C  65      32.983   5.867  19.748  1.00 38.68           C  \nHETATM  404  C   HYP C  65      34.200   6.684  19.341  1.00 36.95           C  \nHETATM  405  O   HYP C  65      34.112   7.549  18.466  1.00 36.40           O  \nHETATM  406  CB  HYP C  65      32.840   4.611  18.881  1.00 38.51           C  \nHETATM  407  CG  HYP C  65      31.390   4.544  18.523  1.00 39.82           C  \nHETATM  408  CD  HYP C  65      30.930   5.970  18.456  1.00 39.72           C  \nHETATM  409  OD1 HYP C  65      30.679   3.854  19.555  1.00 40.35           O  \nATOM    410  N   GLY C  66      35.329   6.408  19.983  1.00 35.16           N  \nATOM    411  CA  GLY C  66      36.545   7.109  19.642  1.00 32.95           C  \nATOM    412  C   GLY C  66      36.981   6.652  18.264  1.00 31.71           C  \nATOM    413  O   GLY C  66      36.413   5.707  17.707  1.00 30.08           O  \nATOM    414  N   PRO C  67      37.999   7.297  17.686  1.00 30.73           N  \nATOM    415  CA  PRO C  67      38.462   6.899  16.356  1.00 29.99           C  \nATOM    416  C   PRO C  67      39.271   5.607  16.430  1.00 28.63           C  \nATOM    417  O   PRO C  67      39.768   5.238  17.495  1.00 26.55           O  \nATOM    418  CB  PRO C  67      39.327   8.077  15.891  1.00 29.79           C  \nATOM    419  CG  PRO C  67      39.310   9.088  17.019  1.00 31.79           C  \nATOM    420  CD  PRO C  67      38.793   8.400  18.245  1.00 30.39           C  \nHETATM  421  N   HYP C  68      39.405   4.902  15.296  1.00 27.79           N  \nHETATM  422  CA  HYP C  68      40.169   3.654  15.275  1.00 27.08           C  \nHETATM  423  C   HYP C  68      41.621   3.909  15.667  1.00 26.37           C  \nHETATM  424  O   HYP C  68      42.174   4.980  15.385  1.00 24.84           O  \nHETATM  425  CB  HYP C  68      40.030   3.174  13.830  1.00 28.99           C  \nHETATM  426  CG  HYP C  68      38.783   3.860  13.322  1.00 30.59           C  \nHETATM  427  CD  HYP C  68      38.845   5.212  13.970  1.00 30.04           C  \nHETATM  428  OD1 HYP C  68      37.614   3.165  13.785  1.00 33.03           O  \nATOM    429  N   GLY C  69      42.230   2.937  16.336  1.00 24.45           N  \nATOM    430  CA  GLY C  69      43.609   3.094  16.742  1.00 24.74           C  \nATOM    431  C   GLY C  69      44.461   3.161  15.503  1.00 24.05           C  \nATOM    432  O   GLY C  69      44.032   2.705  14.448  1.00 25.41           O  \nATOM    433  N   ILE C  70      45.659   3.725  15.621  1.00 23.97           N  \nATOM    434  CA  ILE C  70      46.555   3.835  14.484  1.00 24.18           C  \nATOM    435  C   ILE C  70      47.052   2.446  14.091  1.00 23.82           C  \nATOM    436  O   ILE C  70      47.065   1.525  14.908  1.00 25.04           O  \nATOM    437  CB  ILE C  70      47.776   4.736  14.807  1.00 25.75           C  \nATOM    438  CG1 ILE C  70      48.580   4.133  15.967  1.00 26.56           C  \nATOM    439  CG2 ILE C  70      47.304   6.163  15.127  1.00 23.09           C  \nATOM    440  CD1 ILE C  70      49.567   5.101  16.628  1.00 24.79           C  \nATOM    441  N   THR C  71      47.443   2.296  12.830  1.00 23.27           N  \nATOM    442  CA  THR C  71      47.947   1.018  12.342  1.00 22.37           C  \nATOM    443  C   THR C  71      49.294   0.742  13.003  1.00 22.42           C  \nATOM    444  O   THR C  71      50.068   1.668  13.271  1.00 21.10           O  \nATOM    445  CB  THR C  71      48.113   1.033  10.813  1.00 24.22           C  \nATOM    446  OG1 THR C  71      49.222   1.868  10.459  1.00 26.81           O  \nATOM    447  CG2 THR C  71      46.844   1.566  10.145  1.00 23.92           C  \nATOM    448  N   GLY C  72      49.559  -0.532  13.281  1.00 21.44           N  \nATOM    449  CA  GLY C  72      50.800  -0.906  13.930  1.00 21.10           C  \nATOM    450  C   GLY C  72      52.054  -0.661  13.118  1.00 21.49           C  \nATOM    451  O   GLY C  72      51.987  -0.393  11.920  1.00 23.81           O  \nATOM    452  N   ALA C  73      53.203  -0.752  13.779  1.00 19.14           N  \nATOM    453  CA  ALA C  73      54.482  -0.554  13.129  1.00 19.51           C  \nATOM    454  C   ALA C  73      54.705  -1.655  12.087  1.00 20.08           C  \nATOM    455  O   ALA C  73      54.027  -2.691  12.102  1.00 21.07           O  \nATOM    456  CB  ALA C  73      55.590  -0.588  14.159  1.00 16.07           C  \nATOM    457  N   ARG C  74      55.649  -1.424  11.183  1.00 20.66           N  \nATOM    458  CA  ARG C  74      55.964  -2.405  10.151  1.00 20.84           C  \nATOM    459  C   ARG C  74      56.818  -3.527  10.728  1.00 20.81           C  \nATOM    460  O   ARG C  74      57.649  -3.298  11.619  1.00 21.05           O  \nATOM    461  CB  ARG C  74      56.715  -1.748   8.995  1.00 20.94           C  \nATOM    462  CG  ARG C  74      57.264  -2.760   7.987  1.00 23.91           C  \nATOM    463  CD  ARG C  74      57.466  -2.148   6.618  1.00 25.23           C  \nATOM    464  NE  ARG C  74      57.546  -3.175   5.583  1.00 24.51           N  \nATOM    465  CZ  ARG C  74      58.100  -2.987   4.395  1.00 26.00           C  \nATOM    466  NH1 ARG C  74      58.622  -1.806   4.090  1.00 26.90           N  \nATOM    467  NH2 ARG C  74      58.127  -3.981   3.511  1.00 24.08           N  \nATOM    468  N   GLY C  75      56.619  -4.735  10.211  1.00 18.80           N  \nATOM    469  CA  GLY C  75      57.390  -5.870  10.677  1.00 20.11           C  \nATOM    470  C   GLY C  75      58.871  -5.634  10.473  1.00 21.81           C  \nATOM    471  O   GLY C  75      59.262  -4.770   9.697  1.00 20.88           O  \nATOM    472  N   LEU C  76      59.704  -6.377  11.192  1.00 22.29           N  \nATOM    473  CA  LEU C  76      61.142  -6.240  11.033  1.00 24.48           C  \nATOM    474  C   LEU C  76      61.527  -6.950   9.741  1.00 25.10           C  \nATOM    475  O   LEU C  76      60.714  -7.662   9.151  1.00 25.74           O  \nATOM    476  CB  LEU C  76      61.881  -6.885  12.208  1.00 25.39           C  \nATOM    477  CG  LEU C  76      61.615  -6.287  13.588  1.00 26.17           C  \nATOM    478  CD1 LEU C  76      62.457  -7.007  14.616  1.00 28.37           C  \nATOM    479  CD2 LEU C  76      61.940  -4.797  13.583  1.00 27.51           C  \nATOM    480  N   ALA C  77      62.764  -6.755   9.304  1.00 26.53           N  \nATOM    481  CA  ALA C  77      63.255  -7.397   8.087  1.00 27.38           C  \nATOM    482  C   ALA C  77      63.369  -8.903   8.305  1.00 27.00           C  \nATOM    483  O   ALA C  77      63.568  -9.365   9.423  1.00 25.38           O  \nATOM    484  CB  ALA C  77      64.611  -6.816   7.696  1.00 27.95           C  \nATOM    485  N   GLY C  78      63.238  -9.668   7.229  1.00 27.21           N  \nATOM    486  CA  GLY C  78      63.335 -11.109   7.340  1.00 25.57           C  \nATOM    487  C   GLY C  78      64.769 -11.573   7.491  1.00 25.80           C  \nATOM    488  O   GLY C  78      65.690 -10.757   7.493  1.00 24.54           O  \nATOM    489  N   PRO C  79      64.992 -12.887   7.632  1.00 27.42           N  \nATOM    490  CA  PRO C  79      66.348 -13.428   7.781  1.00 27.71           C  \nATOM    491  C   PRO C  79      67.182 -13.274   6.515  1.00 29.05           C  \nATOM    492  O   PRO C  79      66.648 -13.096   5.422  1.00 27.83           O  \nATOM    493  CB  PRO C  79      66.120 -14.901   8.127  1.00 26.60           C  \nATOM    494  CG  PRO C  79      64.658 -15.023   8.439  1.00 26.25           C  \nATOM    495  CD  PRO C  79      63.969 -13.941   7.693  1.00 26.69           C  \nHETATM  496  N   HYP C  80      68.514 -13.303   6.655  1.00 32.05           N  \nHETATM  497  CA  HYP C  80      69.375 -13.168   5.477  1.00 32.85           C  \nHETATM  498  C   HYP C  80      69.121 -14.312   4.499  1.00 34.07           C  \nHETATM  499  O   HYP C  80      68.762 -15.421   4.906  1.00 33.65           O  \nHETATM  500  CB  HYP C  80      70.786 -13.204   6.057  1.00 34.17           C  \nHETATM  501  CG  HYP C  80      70.607 -12.833   7.504  1.00 33.98           C  \nHETATM  502  CD  HYP C  80      69.294 -13.412   7.899  1.00 33.49           C  \nHETATM  503  OD1 HYP C  80      70.561 -11.415   7.635  1.00 38.46           O  \nATOM    504  N   GLY C  81      69.288 -14.036   3.212  1.00 34.04           N  \nATOM    505  CA  GLY C  81      69.069 -15.068   2.217  1.00 36.51           C  \nATOM    506  C   GLY C  81      70.024 -16.234   2.382  1.00 37.32           C  \nATOM    507  O   GLY C  81      70.979 -16.148   3.154  1.00 37.99           O  \nATOM    508  N   PRO C  82      69.794 -17.349   1.672  1.00 37.89           N  \nATOM    509  CA  PRO C  82      70.678 -18.513   1.778  1.00 38.50           C  \nATOM    510  C   PRO C  82      71.968 -18.307   0.976  1.00 38.15           C  \nATOM    511  O   PRO C  82      72.119 -17.307   0.279  1.00 37.99           O  \nATOM    512  CB  PRO C  82      69.833 -19.650   1.219  1.00 39.29           C  \nATOM    513  CG  PRO C  82      68.899 -18.983   0.253  1.00 39.91           C  \nATOM    514  CD  PRO C  82      68.689 -17.570   0.724  1.00 38.30           C  \nHETATM  515  N   HYP C  83      72.924 -19.240   1.085  1.00 38.84           N  \nHETATM  516  CA  HYP C  83      74.183 -19.112   0.342  1.00 38.56           C  \nHETATM  517  C   HYP C  83      73.976 -19.208  -1.172  1.00 39.05           C  \nHETATM  518  O   HYP C  83      73.097 -19.935  -1.640  1.00 38.33           O  \nHETATM  519  CB  HYP C  83      75.032 -20.268   0.874  1.00 39.04           C  \nHETATM  520  CG  HYP C  83      74.382 -20.661   2.164  1.00 39.63           C  \nHETATM  521  CD  HYP C  83      72.923 -20.443   1.932  1.00 39.63           C  \nHETATM  522  OD1 HYP C  83      74.829 -19.798   3.206  1.00 40.86           O  \nATOM    523  N   GLY C  84      74.790 -18.475  -1.927  1.00 38.63           N  \nATOM    524  CA  GLY C  84      74.682 -18.489  -3.376  1.00 38.82           C  \nATOM    525  C   GLY C  84      75.112 -19.809  -3.988  1.00 38.47           C  \nATOM    526  O   GLY C  84      75.573 -20.698  -3.274  1.00 38.21           O  \nATOM    527  N   PRO C  85      74.980 -19.967  -5.315  1.00 38.32           N  \nATOM    528  CA  PRO C  85      75.374 -21.216  -5.978  1.00 39.41           C  \nATOM    529  C   PRO C  85      76.891 -21.324  -6.145  1.00 39.86           C  \nATOM    530  O   PRO C  85      77.613 -20.347  -5.964  1.00 41.20           O  \nATOM    531  CB  PRO C  85      74.655 -21.141  -7.319  1.00 37.95           C  \nATOM    532  CG  PRO C  85      74.617 -19.676  -7.615  1.00 38.26           C  \nATOM    533  CD  PRO C  85      74.469 -18.975  -6.278  1.00 37.98           C  \nHETATM  534  N   HYP C  86      77.392 -22.523  -6.477  1.00 41.44           N  \nHETATM  535  CA  HYP C  86      78.838 -22.708  -6.659  1.00 41.01           C  \nHETATM  536  C   HYP C  86      79.399 -21.797  -7.744  1.00 41.04           C  \nHETATM  537  O   HYP C  86      78.689 -21.424  -8.676  1.00 40.98           O  \nHETATM  538  CB  HYP C  86      78.970 -24.184  -7.038  1.00 40.16           C  \nHETATM  539  CG  HYP C  86      77.728 -24.827  -6.525  1.00 40.36           C  \nHETATM  540  CD  HYP C  86      76.652 -23.781  -6.674  1.00 40.96           C  \nHETATM  541  OD1 HYP C  86      77.895 -25.165  -5.142  1.00 40.57           O  \nATOM    542  N   GLY C  87      80.674 -21.441  -7.621  1.00 42.11           N  \nATOM    543  CA  GLY C  87      81.295 -20.582  -8.612  1.00 42.31           C  \nATOM    544  C   GLY C  87      81.459 -21.278  -9.954  1.00 43.17           C  \nATOM    545  O   GLY C  87      81.279 -22.493 -10.044  1.00 42.19           O  \nATOM    546  N   PRO C  88      81.794 -20.533 -11.021  1.00 43.57           N  \nATOM    547  CA  PRO C  88      81.974 -21.127 -12.353  1.00 45.51           C  \nATOM    548  C   PRO C  88      83.158 -22.093 -12.408  1.00 47.20           C  \nATOM    549  O   PRO C  88      84.171 -21.875 -11.742  1.00 47.87           O  \nATOM    550  CB  PRO C  88      82.172 -19.916 -13.271  1.00 44.82           C  \nATOM    551  CG  PRO C  88      82.642 -18.823 -12.370  1.00 44.51           C  \nATOM    552  CD  PRO C  88      82.009 -19.075 -11.026  1.00 44.13           C  \nHETATM  553  N   HYP C  89      83.043 -23.172 -13.209  1.00 47.91           N  \nHETATM  554  CA  HYP C  89      84.105 -24.178 -13.352  1.00 48.66           C  \nHETATM  555  C   HYP C  89      85.433 -23.593 -13.817  1.00 50.06           C  \nHETATM  556  O   HYP C  89      85.481 -22.837 -14.791  1.00 49.78           O  \nHETATM  557  CB  HYP C  89      83.543 -25.162 -14.379  1.00 49.47           C  \nHETATM  558  CG  HYP C  89      82.072 -24.955 -14.352  1.00 48.65           C  \nHETATM  559  CD  HYP C  89      81.871 -23.498 -14.038  1.00 48.12           C  \nHETATM  560  OD1 HYP C  89      81.489 -25.759 -13.322  1.00 48.62           O  \nATOM    561  N   GLY C  90      86.509 -23.954 -13.121  1.00 50.78           N  \nATOM    562  CA  GLY C  90      87.826 -23.460 -13.481  1.00 51.59           C  \nATOM    563  C   GLY C  90      88.353 -24.082 -14.761  1.00 52.27           C  \nATOM    564  O   GLY C  90      87.952 -25.223 -15.084  1.00 52.78           O  \nATOM    565  OXT GLY C  90      89.169 -23.430 -15.448  1.00 52.57           O  \nTER     566      GLY C  90                                                      \nHETATM  567  C   ACY B 401      53.590  -3.401  19.236  0.25 33.42           C  \nHETATM  568  OXT ACY B 401      54.060  -2.931  18.127  0.50 33.35           O  \nHETATM  569  CH3 ACY B 401      53.584  -4.893  19.218  0.25 33.71           C  \nHETATM  570  C   ACY B 403      52.901 -12.932   5.280  1.00 51.62           C  \nHETATM  571  O   ACY B 403      54.031 -13.382   5.481  1.00 52.05           O  \nHETATM  572  OXT ACY B 403      51.877 -13.716   5.072  1.00 51.53           O  \nHETATM  573  CH3 ACY B 403      52.545 -11.465   5.247  1.00 50.10           C  \nHETATM  574  C   ACY B 404      64.229 -15.952  -1.803  1.00 50.07           C  \nHETATM  575  O   ACY B 404      63.899 -15.009  -2.532  1.00 49.31           O  \nHETATM  576  OXT ACY B 404      63.363 -16.810  -1.330  1.00 49.88           O  \nHETATM  577  CH3 ACY B 404      65.630 -16.251  -1.362  1.00 49.01           C  \nHETATM  578  C   ACY B 405      53.574  -4.459  19.235  0.25 36.05           C  \nHETATM  579  OXT ACY B 405      52.529  -4.907  18.620  0.50 36.89           O  \nHETATM  580  CH3 ACY B 405      53.595  -2.967  19.224  0.25 38.07           C  \nHETATM  581  C   ACY C 402      47.148   4.911  10.111  1.00 34.55           C  \nHETATM  582  O   ACY C 402      48.353   4.918   9.861  1.00 34.17           O  \nHETATM  583  OXT ACY C 402      46.698   4.995  11.328  1.00 34.71           O  \nHETATM  584  CH3 ACY C 402      46.057   4.804   9.092  1.00 35.01           C  \nHETATM  585  O   HOH A 113      60.959 -11.339  -0.923  1.00 23.09           O  \nHETATM  586  O   HOH A 114      60.187  -7.637  -2.569  1.00 36.83           O  \nHETATM  587  O   HOH A 123      55.178 -12.425  14.624  1.00 29.58           O  \nHETATM  588  O   HOH A 126      46.251  -5.068  15.125  1.00 29.53           O  \nHETATM  589  O   HOH A 129      52.637 -13.191  14.636  1.00 23.37           O  \nHETATM  590  O   HOH A 134      58.540  -8.298   0.014  0.50 32.43           O  \nHETATM  591  O   HOH A 136      81.089 -12.275  -3.148  1.00 41.58           O  \nHETATM  592  O   HOH A 140      74.352 -13.231   5.407  1.00 35.47           O  \nHETATM  593  O   HOH A 142      44.802  -5.908  10.724  1.00 27.62           O  \nHETATM  594  O   HOH A 143      46.156  -5.554   7.949  1.00 38.27           O  \nHETATM  595  O   HOH A 144      38.223  -2.997  18.840  1.00 42.71           O  \nHETATM  596  O   HOH A 147      40.996  -0.909  12.919  1.00 34.35           O  \nHETATM  597  O   HOH A 150      44.056   0.891  11.656  1.00 37.09           O  \nHETATM  598  O   HOH A 157      60.452 -13.872  -0.935  1.00 42.47           O  \nHETATM  599  O   HOH A 160      85.152 -19.021  -0.540  1.00 58.01           O  \nHETATM  600  O   HOH A 162      88.699 -19.377  -2.750  1.00 39.22           O  \nHETATM  601  O   HOH A 163      51.312  -6.690   4.994  1.00 30.91           O  \nHETATM  602  O   HOH A 164      52.093  -7.941   2.138  1.00 42.09           O  \nHETATM  603  O   HOH A 168      67.998  -5.947   7.455  1.00 35.70           O  \nHETATM  604  O   HOH A 171      44.952  -7.121  14.253  1.00 36.30           O  \nHETATM  605  O   HOH A 175      36.358  15.036  27.233  1.00 51.90           O  \nHETATM  606  O   HOH A 177      48.623  -6.002   8.070  1.00 36.58           O  \nHETATM  607  O   HOH A 179      70.861  -9.889  -4.343  1.00 30.35           O  \nHETATM  608  O   HOH A 182      89.016 -16.916  -5.688  1.00 41.25           O  \nHETATM  609  O   HOH A 185      40.166  -4.117  15.253  1.00 37.86           O  \nHETATM  610  O   HOH A 186      45.141  -1.027   7.030  1.00 40.61           O  \nHETATM  611  O   HOH A 189      58.539  -4.867  -0.004  0.50 37.84           O  \nHETATM  612  O   HOH A 191      74.688  -7.604  -1.925  1.00 38.48           O  \nHETATM  613  O   HOH A 196      35.898   7.341  28.304  1.00 37.86           O  \nHETATM  614  O   HOH A 206      55.170  -5.587   2.892  1.00 44.58           O  \nHETATM  615  O   HOH A 211      58.545 -15.074   0.000  0.50 46.07           O  \nHETATM  616  O   HOH B 101      55.506  -9.308  14.873  1.00 25.01           O  \nHETATM  617  O   HOH B 104      56.916 -12.482  12.494  1.00 26.68           O  \nHETATM  618  O   HOH B 105      60.952 -16.735   7.124  1.00 32.25           O  \nHETATM  619  O   HOH B 106      29.686  14.356  20.990  1.00 45.40           O  \nHETATM  620  O   HOH B 108      52.888  -8.790  16.078  1.00 24.06           O  \nHETATM  621  O   HOH B 111      56.186 -11.613   5.491  1.00 18.34           O  \nHETATM  622  O   HOH B 112      54.594  -1.823  22.148  1.00 27.83           O  \nHETATM  623  O   HOH B 116      55.853 -10.172   3.315  1.00 23.07           O  \nHETATM  624  O   HOH B 118      57.346 -13.770   8.495  1.00 27.15           O  \nHETATM  625  O   HOH B 119      63.802 -12.570  -0.995  1.00 28.10           O  \nHETATM  626  O   HOH B 122      51.962  -8.124   9.027  1.00 21.91           O  \nHETATM  627  O   HOH B 124      59.296   0.919  15.521  1.00 36.70           O  \nHETATM  628  O   HOH B 125      50.647  -2.056  19.584  1.00 21.63           O  \nHETATM  629  O   HOH B 127      51.587   0.563  20.387  1.00 25.61           O  \nHETATM  630  O   HOH B 130      61.501 -14.132  10.703  1.00 25.63           O  \nHETATM  631  O   HOH B 131      56.912  -5.091  18.367  1.00 25.73           O  \nHETATM  632  O   HOH B 135      40.833  -0.450  20.693  1.00 39.31           O  \nHETATM  633  O   HOH B 137      43.183   9.398  17.375  1.00 26.44           O  \nHETATM  634  O   HOH B 138      43.069  11.342  21.850  1.00 31.49           O  \nHETATM  635  O   HOH B 139      60.233 -14.402   8.316  1.00 30.34           O  \nHETATM  636  O   HOH B 146      74.752 -20.893 -11.136  1.00 31.35           O  \nHETATM  637  O   HOH B 152      47.689  -4.111  19.254  1.00 32.30           O  \nHETATM  638  O   HOH B 155      54.055   0.969  21.054  1.00 30.42           O  \nHETATM  639  O   HOH B 158      61.145  -4.988  16.701  1.00 34.68           O  \nHETATM  640  O   HOH B 159      52.407  -8.067  18.646  1.00 35.64           O  \nHETATM  641  O   HOH B 169      59.538 -12.517  14.042  1.00 51.04           O  \nHETATM  642  O   HOH B 170      56.817   1.964  16.169  1.00 33.83           O  \nHETATM  643  O   HOH B 173      26.899  11.369  17.856  1.00 50.77           O  \nHETATM  644  O   HOH B 181      52.721   3.475  18.529  1.00 40.60           O  \nHETATM  645  O   HOH B 183      57.454  -9.160  16.393  1.00 37.45           O  \nHETATM  646  O   HOH B 187      65.206 -18.802   6.069  1.00 36.61           O  \nHETATM  647  O   HOH B 188      36.965  -1.873  24.325  1.00 50.52           O  \nHETATM  648  O   HOH B 190      73.718 -14.995  -8.615  1.00 38.43           O  \nHETATM  649  O   HOH B 197      50.297   4.137  20.510  1.00 44.88           O  \nHETATM  650  O   HOH B 198      40.123  -0.501  23.910  1.00 45.95           O  \nHETATM  651  O   HOH B 199      41.514   3.350  26.541  1.00 49.39           O  \nHETATM  652  O   HOH B 200      39.327  10.663  20.993  1.00 44.43           O  \nHETATM  653  O   HOH B 203      89.111 -21.906 -11.082  1.00 41.46           O  \nHETATM  654  O   HOH B 204      76.255 -18.244 -11.082  1.00 40.14           O  \nHETATM  655  O   HOH B 207      70.317 -13.738  -5.737  1.00 48.11           O  \nHETATM  656  O   HOH B 209      43.573   4.469  24.422  1.00 40.41           O  \nHETATM  657  O   HOH C 102      67.378  -9.309   9.320  1.00 54.44           O  \nHETATM  658  O   HOH C 103      61.148  -2.825   9.293  1.00 28.25           O  \nHETATM  659  O   HOH C 107      36.413   3.329  16.229  1.00 30.39           O  \nHETATM  660  O   HOH C 109      52.711  -0.515   9.176  1.00 29.99           O  \nHETATM  661  O   HOH C 110      58.212  -8.411  12.729  1.00 26.12           O  \nHETATM  662  O   HOH C 115      75.363 -25.664  -3.534  1.00 34.47           O  \nHETATM  663  O   HOH C 117      54.681   0.693   7.709  1.00 23.06           O  \nHETATM  664  O   HOH C 120      65.246  -8.826  11.102  1.00 30.79           O  \nHETATM  665  O   HOH C 121      57.158   1.070  11.017  1.00 19.70           O  \nHETATM  666  O   HOH C 128      33.788   2.545  16.074  1.00 27.68           O  \nHETATM  667  O   HOH C 132      63.412 -11.927  10.766  1.00 32.50           O  \nHETATM  668  O   HOH C 133      58.190   0.681   5.034  1.00 26.69           O  \nHETATM  669  O   HOH C 141      43.975   6.891  16.632  1.00 30.43           O  \nHETATM  670  O   HOH C 145      45.008   7.060  11.690  1.00 33.14           O  \nHETATM  671  O   HOH C 148      45.912   4.858  18.163  1.00 39.08           O  \nHETATM  672  O   HOH C 149      72.052 -22.701  -5.368  1.00 51.67           O  \nHETATM  673  O   HOH C 151      47.079   7.228  18.775  1.00 29.78           O  \nHETATM  674  O   HOH C 153      46.334   7.895  21.242  1.00 45.20           O  \nHETATM  675  O   HOH C 154      46.226   9.770  17.715  1.00 29.65           O  \nHETATM  676  O   HOH C 156      50.004   7.527  19.817  1.00 31.52           O  \nHETATM  677  O   HOH C 161      40.181   3.870   9.351  1.00 47.77           O  \nHETATM  678  O   HOH C 165      22.254   6.499  28.310  1.00 38.02           O  \nHETATM  679  O   HOH C 166      79.728 -27.910  -7.547  1.00 44.49           O  \nHETATM  680  O   HOH C 167      80.557 -27.093 -10.768  1.00 58.92           O  \nHETATM  681  O   HOH C 172      41.965   9.040  13.963  1.00 43.32           O  \nHETATM  682  O   HOH C 174      65.484  -6.244  11.639  1.00 40.61           O  \nHETATM  683  O   HOH C 176      63.752  -3.847   9.732  1.00 39.00           O  \nHETATM  684  O   HOH C 178      64.297  -2.289  12.151  1.00 44.10           O  \nHETATM  685  O   HOH C 180      77.874 -20.921   3.369  1.00 48.91           O  \nHETATM  686  O   HOH C 184      89.584 -24.062 -18.035  1.00 34.85           O  \nHETATM  687  O   HOH C 192      67.467 -16.514  11.532  1.00 54.00           O  \nHETATM  688  O   HOH C 193      59.709  -1.612   1.352  1.00 35.61           O  \nHETATM  689  O   HOH C 194      42.679   6.343  13.315  1.00 42.05           O  \nHETATM  690  O   HOH C 195      49.422   6.079   7.758  1.00 43.48           O  \nHETATM  691  O   HOH C 201      60.808  -3.844   0.712  1.00 62.87           O  \nHETATM  692  O   HOH C 202      27.956   3.719  19.388  1.00 52.69           O  \nHETATM  693  O   HOH C 205      67.679 -17.888   4.298  1.00 43.44           O  \nHETATM  694  O   HOH C 208      71.531 -20.223  -4.178  1.00 42.68           O  \nHETATM  695  O   HOH C 210      56.087   1.146   3.643  1.00 50.59           O  \nCONECT    1    2    7                                                           \nCONECT    2    1    3    5                                                      \nCONECT    3    2    4    9                                                      \nCONECT    4    3                                                                \nCONECT    5    2    6                                                           \nCONECT    6    5    7    8                                                      \nCONECT    7    1    6                                                           \nCONECT    8    6                                                                \nCONECT    9    3                                                                \nCONECT   15   20                                                                \nCONECT   20   15   21   26                                                      \nCONECT   21   20   22   24                                                      \nCONECT   22   21   23   28                                                      \nCONECT   23   22                                                                \nCONECT   24   21   25                                                           \nCONECT   25   24   26   27                                                      \nCONECT   26   20   25                                                           \nCONECT   27   25                                                                \nCONECT   28   22                                                                \nCONECT   34   39                                                                \nCONECT   39   34   40   45                                                      \nCONECT   40   39   41   43                                                      \nCONECT   41   40   42   47                                                      \nCONECT   42   41                                                                \nCONECT   43   40   44                                                           \nCONECT   44   43   45   46                                                      \nCONECT   45   39   44                                                           \nCONECT   46   44                                                                \nCONECT   47   41                                                                \nCONECT  109  114                                                                \nCONECT  114  109  115  120                                                      \nCONECT  115  114  116  118                                                      \nCONECT  116  115  117  122                                                      \nCONECT  117  116                                                                \nCONECT  118  115  119                                                           \nCONECT  119  118  120  121                                                      \nCONECT  120  114  119                                                           \nCONECT  121  119                                                                \nCONECT  122  116                                                                \nCONECT  128  133                                                                \nCONECT  133  128  134  139                                                      \nCONECT  134  133  135  137                                                      \nCONECT  135  134  136  141                                                      \nCONECT  136  135                                                                \nCONECT  137  134  138                                                           \nCONECT  138  137  139  140                                                      \nCONECT  139  133  138                                                           \nCONECT  140  138                                                                \nCONECT  141  135                                                                \nCONECT  147  152                                                                \nCONECT  152  147  153  158                                                      \nCONECT  153  152  154  156                                                      \nCONECT  154  153  155  160                                                      \nCONECT  155  154                                                                \nCONECT  156  153  157                                                           \nCONECT  157  156  158  159                                                      \nCONECT  158  152  157                                                           \nCONECT  159  157                                                                \nCONECT  160  154                                                                \nCONECT  166  171                                                                \nCONECT  171  166  172  177                                                      \nCONECT  172  171  173  175                                                      \nCONECT  173  172  174  179                                                      \nCONECT  174  173                                                                \nCONECT  175  172  176                                                           \nCONECT  176  175  177  178                                                      \nCONECT  177  171  176                                                           \nCONECT  178  176                                                                \nCONECT  179  173                                                                \nCONECT  187  192                                                                \nCONECT  192  187  193  198                                                      \nCONECT  193  192  194  196                                                      \nCONECT  194  193  195  200                                                      \nCONECT  195  194                                                                \nCONECT  196  193  197                                                           \nCONECT  197  196  198  199                                                      \nCONECT  198  192  197                                                           \nCONECT  199  197                                                                \nCONECT  200  194                                                                \nCONECT  206  211                                                                \nCONECT  211  206  212  217                                                      \nCONECT  212  211  213  215                                                      \nCONECT  213  212  214  219                                                      \nCONECT  214  213                                                                \nCONECT  215  212  216                                                           \nCONECT  216  215  217  218                                                      \nCONECT  217  211  216                                                           \nCONECT  218  216                                                                \nCONECT  219  213                                                                \nCONECT  225  230                                                                \nCONECT  230  225  231  236                                                      \nCONECT  231  230  232  234                                                      \nCONECT  232  231  233  238                                                      \nCONECT  233  232                                                                \nCONECT  234  231  235                                                           \nCONECT  235  234  236  237                                                      \nCONECT  236  230  235                                                           \nCONECT  237  235                                                                \nCONECT  238  232                                                                \nCONECT  300  305                                                                \nCONECT  305  300  306  311                                                      \nCONECT  306  305  307  309                                                      \nCONECT  307  306  308  313                                                      \nCONECT  308  307                                                                \nCONECT  309  306  310                                                           \nCONECT  310  309  311  312                                                      \nCONECT  311  305  310                                                           \nCONECT  312  310                                                                \nCONECT  313  307                                                                \nCONECT  319  324                                                                \nCONECT  324  319  325  330                                                      \nCONECT  325  324  326  328                                                      \nCONECT  326  325  327  332                                                      \nCONECT  327  326                                                                \nCONECT  328  325  329                                                           \nCONECT  329  328  330  331                                                      \nCONECT  330  324  329                                                           \nCONECT  331  329                                                                \nCONECT  332  326                                                                \nCONECT  338  343                                                                \nCONECT  343  338  344  349                                                      \nCONECT  344  343  345  347                                                      \nCONECT  345  344  346  351                                                      \nCONECT  346  345                                                                \nCONECT  347  344  348                                                           \nCONECT  348  347  349  350                                                      \nCONECT  349  343  348                                                           \nCONECT  350  348                                                                \nCONECT  351  345                                                                \nCONECT  357  362                                                                \nCONECT  362  357  363  368                                                      \nCONECT  363  362  364  366                                                      \nCONECT  364  363  365  370                                                      \nCONECT  365  364                                                                \nCONECT  366  363  367                                                           \nCONECT  367  366  368  369                                                      \nCONECT  368  362  367                                                           \nCONECT  369  367                                                                \nCONECT  370  364                                                                \nCONECT  378  383                                                                \nCONECT  383  378  384  389                                                      \nCONECT  384  383  385  387                                                      \nCONECT  385  384  386  391                                                      \nCONECT  386  385                                                                \nCONECT  387  384  388                                                           \nCONECT  388  387  389  390                                                      \nCONECT  389  383  388                                                           \nCONECT  390  388                                                                \nCONECT  391  385                                                                \nCONECT  397  402                                                                \nCONECT  402  397  403  408                                                      \nCONECT  403  402  404  406                                                      \nCONECT  404  403  405  410                                                      \nCONECT  405  404                                                                \nCONECT  406  403  407                                                           \nCONECT  407  406  408  409                                                      \nCONECT  408  402  407                                                           \nCONECT  409  407                                                                \nCONECT  410  404                                                                \nCONECT  416  421                                                                \nCONECT  421  416  422  427                                                      \nCONECT  422  421  423  425                                                      \nCONECT  423  422  424  429                                                      \nCONECT  424  423                                                                \nCONECT  425  422  426                                                           \nCONECT  426  425  427  428                                                      \nCONECT  427  421  426                                                           \nCONECT  428  426                                                                \nCONECT  429  423                                                                \nCONECT  491  496                                                                \nCONECT  496  491  497  502                                                      \nCONECT  497  496  498  500                                                      \nCONECT  498  497  499  504                                                      \nCONECT  499  498                                                                \nCONECT  500  497  501                                                           \nCONECT  501  500  502  503                                                      \nCONECT  502  496  501                                                           \nCONECT  503  501                                                                \nCONECT  504  498                                                                \nCONECT  510  515                                                                \nCONECT  515  510  516  521                                                      \nCONECT  516  515  517  519                                                      \nCONECT  517  516  518  523                                                      \nCONECT  518  517                                                                \nCONECT  519  516  520                                                           \nCONECT  520  519  521  522                                                      \nCONECT  521  515  520                                                           \nCONECT  522  520                                                                \nCONECT  523  517                                                                \nCONECT  529  534                                                                \nCONECT  534  529  535  540                                                      \nCONECT  535  534  536  538                                                      \nCONECT  536  535  537  542                                                      \nCONECT  537  536                                                                \nCONECT  538  535  539                                                           \nCONECT  539  538  540  541                                                      \nCONECT  540  534  539                                                           \nCONECT  541  539                                                                \nCONECT  542  536                                                                \nCONECT  548  553                                                                \nCONECT  553  548  554  559                                                      \nCONECT  554  553  555  557                                                      \nCONECT  555  554  556  561                                                      \nCONECT  556  555                                                                \nCONECT  557  554  558                                                           \nCONECT  558  557  559  560                                                      \nCONECT  559  553  558                                                           \nCONECT  560  558                                                                \nCONECT  561  555                                                                \nCONECT  567  568  569                                                           \nCONECT  568  567                                                                \nCONECT  569  567                                                                \nCONECT  570  571  572  573                                                      \nCONECT  571  570                                                                \nCONECT  572  570                                                                \nCONECT  573  570                                                                \nCONECT  574  575  576  577                                                      \nCONECT  575  574                                                                \nCONECT  576  574                                                                \nCONECT  577  574                                                                \nCONECT  578  579  580                                                           \nCONECT  579  578                                                                \nCONECT  580  578                                                                \nCONECT  581  582  583  584                                                      \nCONECT  582  581                                                                \nCONECT  583  581                                                                \nCONECT  584  581                                                                \nMASTER      278    0   26    0    0    0    8    6  692    3  227    9          \nEND                                                                             \n'
protein_to_pdb = {
    'insulin': '3I40',
    'collagen': '1BKV',
    'proteasome': '1YAR'
}
parts = []
for protein_name, pdb_id in protein_to_pdb.items():
    pdb_structure = fetch_protein_structure(pdb_id)
    v = py3Dmol.view(width=300, height=300)
    v.addModel(pdb_structure, 'pdb')
    v.setStyle({'cartoon': {'color': 'spectrum'}})
    v.zoomTo()
    parts.append(f'<div style="display:inline-block;margin:10px;vertical-align:top"><b>{protein_name}</b><br>{v._make_html()}</div>')

display(HTML(''.join(parts)))
insulin

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

collagen

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

proteasome

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Let’s Play With a FASTA Sequence

# this is insulin
r = requests.get("https://rest.uniprot.org/uniprotkb/P01308.fasta")
# Strip the header line to get just the sequence
lines = r.text.strip().split("\n")
sequence = "".join(lines[1:])
print(sequence)
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
len(sequence)
110
amino_acids = [
    'R', 'H', 'K', 'D', 'E', 'S', 'T', 'N', 'Q', 'G', 'P', 'C', 'A', 'V', 'I', 'L', 'M', 'F', 'Y', 'W'
]
amino_acid_to_index = {
    amino_acid: index for index, amino_acid in enumerate(amino_acids)
}
tiny_protein = ['M', 'A', 'L', 'W', 'M']
tiny_protein_indices = [
    amino_acid_to_index[amino_acid] for amino_acid in tiny_protein
]
tiny_protein_indices
[16, 12, 15, 19, 16]

I’m more familiar with PyTorch but I’m also palying around with JAX here.

one_hot_encoded_sequence = jax.nn.one_hot(
    x=tiny_protein_indices, num_classes=len(amino_acids)
)
print(one_hot_encoded_sequence)
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
ohes = torch.nn.functional.one_hot(torch.tensor(tiny_protein_indices), num_classes=len(amino_acids))
print(ohes)
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]])
fig = sns.heatmap(
    ohes, square=True, cbar=False, cmap='inferno'
)
fig.set(xlabel='Amino Acid Index', ylabel='Protein Sequence')
[Text(0.5, 146.32222222222222, 'Amino Acid Index'),
 Text(50.722222222222214, 0.5, 'Protein Sequence')]

Embeddings and T-SNE

model_checkpoint = 'facebook/esm2_t33_650M_UR50D'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = EsmModel.from_pretrained(model_checkpoint)
vocab_to_index = tokenizer.get_vocab()
print(vocab_to_index)
/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
WARNING:huggingface_hub.utils._http:Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
EsmModel LOAD REPORT from: facebook/esm2_t33_650M_UR50D
Key                         | Status     | 
----------------------------+------------+-
lm_head.layer_norm.bias     | UNEXPECTED | 
lm_head.layer_norm.weight   | UNEXPECTED | 
lm_head.dense.bias          | UNEXPECTED | 
lm_head.dense.weight        | UNEXPECTED | 
lm_head.bias                | UNEXPECTED | 
esm.embeddings.position_ids | UNEXPECTED | 
pooler.dense.weight         | MISSING    | 
pooler.dense.bias           | MISSING    | 

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING   :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
{'<cls>': 0, '<pad>': 1, '<eos>': 2, '<unk>': 3, 'L': 4, 'A': 5, 'G': 6, 'V': 7, 'S': 8, 'E': 9, 'R': 10, 'T': 11, 'I': 12, 'D': 13, 'P': 14, 'K': 15, 'Q': 16, 'N': 17, 'F': 18, 'Y': 19, 'M': 20, 'H': 21, 'W': 22, 'C': 23, 'X': 24, 'B': 25, 'U': 26, 'Z': 27, 'O': 28, '.': 29, '-': 30, '<null_1>': 31, '<mask>': 32}
tokenizer('MALWM')
{'input_ids': [0, 20, 5, 4, 22, 20, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
token_embeddings = model.get_input_embeddings().weight.detach().numpy()
token_embeddings.shape
(33, 1280)
len(vocab_to_index)
33
tsne = TSNE(n_components=2, random_state=42)
embeddings_tsne = tsne.fit_transform(token_embeddings)
embeddings_tsne_df = pd.DataFrame(
    embeddings_tsne, columns=['first_dim', 'second_dim']
)
embeddings_tsne_df.shape
(33, 2)
fig = sns.scatterplot(
    data=embeddings_tsne_df, x='first_dim', y='second_dim', s=50
)
fig.set_xlabel('First Dimension')
fig.set_ylabel('Second Dimension')
Text(0, 0.5, 'Second Dimension')

embeddings_tsne_df['token'] = list(vocab_to_index.keys())

token_annotation = {
    'hydrophobic': ['A', 'F', 'I', 'L', 'M', 'V', 'W', 'Y'],
    'polar uncharged': ['N', 'Q', 'S', 'T'],
    'negatively charged': ['D', 'E'],
    'positively charged': ['H', 'K', 'R'],
    'special amino acid': ['B', 'C', 'G', 'O', 'P', 'U', 'X', 'Z'],
    'special token': [
        '-',
        '.',
        '<cls>',
        '<eos>',
        '<mask>',
        '<null_1>',
        '<pad>',
        '<unk>'
    ]
}
embeddings_tsne_df['label'] = embeddings_tsne_df['token'].map(
    {t: label for label, tokens in token_annotation.items() for t in tokens}
)
fig = sns.scatterplot(
    data=embeddings_tsne_df,
    x='first_dim',
    y='second_dim',
    hue='label',
    style='label',
    s=50
)
fig.set_xlabel('First Dimension')
fig.set_ylabel('Second Dimension')
texts = [
    fig.text(point['first_dim'], point['second_dim'], point['token'])
    for _, point in embeddings_tsne_df.iterrows()
]
adjust_text(
    texts, expand=(1.5, 1.5), arrowprops=dict(arrowstyle='->', color='grey')
);

Masked Language Modeling

insulin_sequence = (
    'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'
)

masked_insulin_sequence = (
    'MALWMRLLPLLALLALWGPDPAAAFVNQH<mask>CGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'
)

masked_inputs = tokenizer(masked_insulin_sequence)['input_ids']
assert masked_inputs[30] == vocab_to_index['<mask>']
model_checkpoint = 'facebook/esm2_t30_150M_UR50D'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
masked_lm_model = EsmForMaskedLM.from_pretrained(model_checkpoint)
EsmForMaskedLM LOAD REPORT from: facebook/esm2_t30_150M_UR50D
Key                         | Status     |  | 
----------------------------+------------+--+-
esm.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
model_outputs = masked_lm_model(
    **tokenizer(text=masked_insulin_sequence, return_tensors='pt')
)
model_preds = model_outputs.logits
# shape will follow B, T, C convention
# like torch.Size([1, 112, 33])
print(model_preds.shape)
torch.Size([1, 112, 33])
# we grab the zeroth batch, 30th 'token', all 33 dimensions of the 'C' (channel) vector
mask_preds = model_preds[0, 30].detach().numpy()
print(mask_preds.shape)
(33,)
# ...softmax it
mask_probs = jax.nn.softmax(mask_preds)
mask_probs.shape
# by the way you could do
#
# mask_probs = torch.nn.functional.softmax(torch.tensor(mask_preds), dim=-1)
(33,)
letters = list(vocab_to_index.keys())
fig, ax = plt.subplots(figsize=(6, 4))
plt.bar(letters, mask_probs, color='grey')
plt.xticks(rotation=90)
plt.title('Model Probabilities for the Masked Amino Acid')
Text(0.5, 1.0, 'Model Probabilities for the Masked Amino Acid')

from transformers import PreTrainedModel, PreTrainedTokenizer
from matplotlib.pyplot import Figure

class MaskPredictor:
    """Predict masked amino acids using a protein language model."""

    def __init__(self, tokenizer: PreTrainedTokenizer, model: PreTrainedModel):
        self.tokenizer = tokenizer
        self.model = model

    def plot_predictions(self, sequence: str, mask_index: int) -> Figure:
        mask_probs = self.predict(sequence, mask_index)
        fig, _ = plt.subplots(figsize=(6, 4))
        plt.bar(list(self.tokenizer.get_vocab().keys()), mask_probs, color='grey')
        plt.xticks(rotation=90)
        plt.title(
            'Model Probabilities for the Masked Amino Acid\n'
            f"at Index={mask_index} (True Amino Acid = {sequence[mask_index]})"
        )
        return fig

    def predict(self, sequence: str, mask_index: int) -> jax.Array:
        """Return model probabilities for masked amino acid at a position."""
        masked_sequence = self.mask_sequence(sequence, mask_index)
        masked_inputs = self.tokenizer(masked_sequence, return_tensors='pt')
        model_outputs = self.model(**masked_inputs)
        mask_preds = model_outputs.logits[0, mask_index + 1].detach().numpy()
        mask_probs = jax.nn.softmax(mask_preds)
        return mask_probs

    @staticmethod
    def mask_sequence(sequence: str, mask_index: int) -> str:
        if mask_index < 0 or mask_index > len(sequence):
            raise ValueError('Mask index outside of sequence range.')
        return f"{sequence[0:mask_index]}<mask>{sequence[mask_index + 1]}"
MaskPredictor(tokenizer, model=masked_lm_model).plot_predictions(
    sequence=insulin_sequence, mask_index=26
);

Embedding Entire Proteins

import requests

base_url = "https://assets.deep-learning-for-biology.com"
file_path = "proteins/datasets/sequence_df_cco.csv"

r = requests.get(f"{base_url}/{file_path}", timeout=60)
r.raise_for_status()

with open("sequence_df_cco.csv", "wb") as f:
    f.write(r.content)

import pandas as pd
df = pd.read_csv("sequence_df_cco.csv")
df.head()
EntryID Sequence taxonomyID term aspect Length
0 O95231 MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP... 9606 GO:0005622 CCO 258
1 O95231 MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP... 9606 GO:0031981 CCO 258
2 O95231 MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP... 9606 GO:0043229 CCO 258
3 O95231 MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP... 9606 GO:0043226 CCO 258
4 O95231 MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP... 9606 GO:0110165 CCO 258
protein_df = df[~df["term"].isin(["GO:0005575", "GO:0110165"])]
num_proteins = protein_df["EntryID"].nunique()
print(protein_df)
print(num_proteins)
       EntryID                                           Sequence  taxonomyID  \
0       O95231  MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP...        9606   
1       O95231  MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP...        9606   
2       O95231  MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP...        9606   
3       O95231  MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP...        9606   
5       O95231  MRLSSSPPRGPQQLSSFGSVDWLSQSSCSGPTHTPRPADFSLGSLP...        9606   
...        ...                                                ...         ...   
337549  E7ER32  MPPLKSPAAFHEQRRSLERARTEDYLKRKIRSRPERSELVRMHILE...        9606   
337550  E7ER32  MPPLKSPAAFHEQRRSLERARTEDYLKRKIRSRPERSELVRMHILE...        9606   
337551  E7ER32  MPPLKSPAAFHEQRRSLERARTEDYLKRKIRSRPERSELVRMHILE...        9606   
337552  E7ER32  MPPLKSPAAFHEQRRSLERARTEDYLKRKIRSRPERSELVRMHILE...        9606   
337553  E7ER32  MPPLKSPAAFHEQRRSLERARTEDYLKRKIRSRPERSELVRMHILE...        9606   

              term aspect  Length  
0       GO:0005622    CCO     258  
1       GO:0031981    CCO     258  
2       GO:0043229    CCO     258  
3       GO:0043226    CCO     258  
5       GO:0043231    CCO     258  
...            ...    ...     ...  
337549  GO:0005737    CCO     798  
337550  GO:0043227    CCO     798  
337551  GO:0031974    CCO     798  
337552  GO:0005634    CCO     798  
337553  GO:0005654    CCO     798  

[294731 rows x 6 columns]
21457
num_locations = protein_df.groupby("EntryID")["term"].nunique()
proteins_one_location = num_locations[num_locations == 1].index
protein_df = protein_df[protein_df["EntryID"].isin(proteins_one_location)]
protein_df
EntryID Sequence taxonomyID term aspect Length
621 Q6N075 MLVTAYLAFVGLLASCLGLELSRCRAKPPGRACSNPSFLRFQLDFY... 9606 GO:0016020 CCO 450
3036 P01566 MALSFSLLMAVLVLSYKSICSLGCDLPQTHSLGNRRALILLGQMGR... 9606 GO:0005576 CCO 189
4648 Q86SQ6 MDLKTVLSLPRYPGEFLHPVVYACTAVMLLCLLASFVTYIVHQSAI... 9606 GO:0016020 CCO 560
4715 Q8IYM0 MEKDDPPQLVTPTSVKAIILRIEAAQLTRAQEDISTQLSDILDNVN... 9606 GO:0032991 CCO 893
5467 Q9NVL1 MAPEENAGSELLLQSFKRRFLAARALRSFRWQSLEAKLRDSSDSEL... 9606 GO:0032991 CCO 165
... ... ... ... ... ... ...
329726 A0A024RDS4 MDWGTLHTFIGGVNKHSTSIGKVWITVIFIFRVMILVVAAQEVWGD... 9606 GO:0030054 CCO 261
331823 Q5NV92 QLVLTQSPSASASLGASVKLTCTLSSGHSSYAIAWHQQQPEKGPRY... 9606 GO:0005576 CCO 99
331841 Q5NV81 QSVLTQPPSASGTPGQRVTISCSGSSSNIGSNTVNWYQQLPGTAPK... 9606 GO:0005576 CCO 99
332246 Q5NV68 QPVLTQPPSSSASPGESARLTCTLPSDINVGSYNIYWYQQKPGSPP... 9606 GO:0005576 CCO 104
335109 P78328 QFQFIQVAGRSGDKIFIGNVNNSGLKINLFDTPLETQYVRLEPIIC... 9606 GO:0005576 CCO 78

421 rows × 6 columns

go_function_examples = {
    "extracellular": "GO:0005576",
    "membrane": "GO:0016020"
}

sequences_by_function = {}

sequence
min_length = 100
max_length = 500
num_samples = 20

for function, go_term in go_function_examples.items():
    proteins_with_function = protein_df[
        (protein_df["term"] == go_term)
        & (protein_df["Length"] >= min_length)
        & (protein_df["Length"] <= max_length)
    ]
    print(
        f"Found {len(proteins_with_function)} human proteins\n"
        f"with the molecular function '{function}' ({go_term}),\n"
        f"and {min_length}<=length<={max_length}.\n"
        f"Sampling {num_samples} proteins at random.\n"
    )
    sequences = list(
        proteins_with_function.sample(num_samples, random_state=42)["Sequence"]
    )
    sequences_by_function[function] = sequences
Found 164 human proteins
with the molecular function 'extracellular' (GO:0005576),
and 100<=length<=500.
Sampling 20 proteins at random.

Found 65 human proteins
with the molecular function 'membrane' (GO:0016020),
and 100<=length<=500.
Sampling 20 proteins at random.
sequences_by_function['extracellular'][0]
'MASPFALLMVLVVLSCKSSCSLGCDLPETHSLDNRRTLMLLAQMSRISPSSCLMDRHDFGFPQEEFDGNQFQKAPAISVLHELIQQIFNLFTTKDSSAAWDEDLLDKFCTELYQQLNDLEACVMQEERVGETPLMNADSILAVKKYFRRITLYLTEKKYSPCAWEVVRAEIMRSLSLSTNLQERLRRKE'
def get_device():
    return torch.device("mps" if torch.backends.mps.is_available()
                        else "cuda" if torch.cuda.is_available()
                        else "cpu")

def get_mean_embeddings(
    sequences: list[str],
    tokenizer: PreTrainedTokenizer,
    model: PreTrainedModel,
    device: torch.device | None = None
):
    if not device:
        device = get_device()

    model_inputs = tokenizer(sequences, padding=True, return_tensors='pt')
    model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
    model = model.to(device)
    model.eval()

    with torch.no_grad():
        outputs = model(**model_inputs)
        mean_embeddings = outputs.last_hidden_state.mean(dim=1)
    return mean_embeddings.detach().cpu().numpy()
model_checkpoint = 'facebook/esm2_t6_8M_UR50D'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = EsmModel.from_pretrained(model_checkpoint)
EsmModel LOAD REPORT from: facebook/esm2_t6_8M_UR50D
Key                         | Status     | 
----------------------------+------------+-
lm_head.layer_norm.bias     | UNEXPECTED | 
lm_head.layer_norm.weight   | UNEXPECTED | 
lm_head.dense.bias          | UNEXPECTED | 
lm_head.dense.weight        | UNEXPECTED | 
lm_head.bias                | UNEXPECTED | 
esm.embeddings.position_ids | UNEXPECTED | 
pooler.dense.weight         | MISSING    | 
pooler.dense.bias           | MISSING    | 

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING   :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
protein_embeddings = {
    loc: get_mean_embeddings(sequences_by_function[loc], tokenizer, model)
    for loc in ["extracellular", "membrane"]
}

labels, embeddings = [], []
for location, embedding in protein_embeddings.items():
    labels.extend([location] * embedding.shape[0])
    embeddings.append(embedding)
    print(f"{location}: {embedding.shape}")
extracellular: (20, 320)
membrane: (20, 320)
embeddings_tsne = TSNE(n_components=2, random_state=42).fit_transform(
    np.vstack(embeddings)
)
embeddings_tsne_df = pd.DataFrame(
    {
        "first_dimension": embeddings_tsne[:, 0],
        "second_dimension": embeddings_tsne[:, 1],
        "location": np.array(labels),
    }
)
fig = sns.scatterplot(
    data=embeddings_tsne_df,
    x="first_dimension",
    y="second_dimension",
    hue="location",
    style="location",
    s=50,
    alpha=0.7
)
plt.title("tSNE of Protein Embeddings")
fig.set_xlabel("First Dimension")
fig.set_ylabel("Second Dimension")
Text(0, 0.5, 'Second Dimension')

import requests

r = requests.get(
    "https://assets.deep-learning-for-biology.com/proteins/datasets/train_terms.tsv.zip",
    timeout=60
)
r.raise_for_status()

with open("train_terms.tsv.zip", "wb") as f:
    f.write(r.content)

labels = pd.read_csv("train_terms.tsv.zip", sep="\t", compression="infer")
print(labels)
            EntryID        term aspect
0        A0A009IHW8  GO:0008152    BPO
1        A0A009IHW8  GO:0034655    BPO
2        A0A009IHW8  GO:0072523    BPO
3        A0A009IHW8  GO:0044270    BPO
4        A0A009IHW8  GO:0006753    BPO
...             ...         ...    ...
5363858      X5L565  GO:0050649    MFO
5363859      X5L565  GO:0016491    MFO
5363860      X5M5N0  GO:0005515    MFO
5363861      X5M5N0  GO:0005488    MFO
5363862      X5M5N0  GO:0003674    MFO

[5363863 rows x 3 columns]
labels["term"].nunique()
31466
import obonet
import requests

r = requests.get(
    "https://current.geneontology.org/ontology/go-basic.obo",
    headers={"User-Agent": "Mozilla/5.0"},
    timeout=60
)
r.raise_for_status()

with open("go-basic.obo", "wb") as f:
    f.write(r.content)

def get_go_term_descriptions(store_path: str) -> pd.DataFrame:
    if not os.path.exists(store_path):
        graph = obonet.read_obo("go-basic.obo")
        id_to_name = {id: data.get("name") for id, data in graph.nodes(data=True)}
        go_term_descriptions = pd.DataFrame(
            zip(id_to_name.keys(), id_to_name.values()),
            columns=["term", "description"]
        )
        go_term_descriptions.to_csv(store_path, index=False)
    else:
        go_term_descriptions = pd.read_csv(store_path)
    return go_term_descriptions
go_term_descriptions = get_go_term_descriptions(
    "go_term_descriptions.csv"
)
print(go_term_descriptions)
             term                                        description
0      GO:0000001                          mitochondrion inheritance
1      GO:0000006  high-affinity zinc transmembrane transporter a...
2      GO:0000007  low-affinity zinc ion transmembrane transporte...
3      GO:0000009             alpha-1,6-mannosyltransferase activity
4      GO:0000010          heptaprenyl diphosphate synthase activity
...           ...                                                ...
38555  GO:7770052  endoplasmic reticulum-lysosome membrane contac...
38556  GO:7770053  nucleotide-binding leucine-rich repeat recepto...
38557  GO:7770054  tRNA(Val) (adenine(37)-N6)-methyltransferase a...
38558  GO:7770055                     intestinal lumen acidification
38559  GO:7770056  multicellular organismal-level extracellular f...

[38560 rows x 2 columns]
labels = labels.merge(go_term_descriptions, on="term")
labels
EntryID term aspect description
0 A0A009IHW8 GO:0008152 BPO metabolic process
1 A0A009IHW8 GO:0034655 BPO nucleobase-containing compound catabolic process
2 A0A009IHW8 GO:0072523 BPO purine-containing compound catabolic process
3 A0A009IHW8 GO:0006753 BPO nucleoside phosphate metabolic process
4 A0A009IHW8 GO:1901292 BPO nucleoside phosphate catabolic process
... ... ... ... ...
4874414 X5L565 GO:0050649 MFO testosterone 6-beta-hydroxylase activity
4874415 X5L565 GO:0016491 MFO oxidoreductase activity
4874416 X5M5N0 GO:0005515 MFO protein binding
4874417 X5M5N0 GO:0005488 MFO binding
4874418 X5M5N0 GO:0003674 MFO molecular_function

4874419 rows × 4 columns

labels = labels[labels["aspect"] == "MFO"]
print(labels["description"].value_counts())
description
molecular_function                               78637
binding                                          57380
protein binding                                  47987
catalytic activity                               25324
heterocyclic compound binding                    12694
                                                 ...  
(S)-limonene 6-monooxygenase activity                1
tricyclene synthase activity                         1
acetylglutamate kinase regulator activity            1
tocopherol C-methyltransferase activity              1
L-lysine:pyruvate alpha-transaminase activity        1
Name: count, Length: 6887, dtype: int64