Overview -------- atomium is a Python library for opening and saving .pdb, .cif and .mmtf files, and presenting and manipulating the information contained within. Loading Data ~~~~~~~~~~~~ While you can use atomium to create models from scratch to build an entirely *de novo* structure, in practice you would generally use it to load molecular data from an existing file... >>> import atomium >>> pdb1 = atomium.open('../1LOL.pdb') >>> mmtf1 = atomium.open('/structures/glucose.mmtf') >>> cif1 = atomium.open('/structures/1XDA.cif') >>> pdb3 = atomium.open('./5CPA.pdb.gz') >>> pdb2 = atomium.fetch('5XME.pdb') >>> cif2 = atomium.fetch('5XME') In that latter case, you don't need the file to be saved locally - it will just go and grab the PDB with that code from the RCSB. atomium will use the file extension you provide to decide how to parse it. If there isn't one, or it doesn't recognise the extension, it will peek at the file contents and try and guess whether it should be interpreted as .pdb, .cif or .mmtf. Using Data ~~~~~~~~~~ Once you've got your :py:class:`.File` object, what can you do with it? Annotation ########## There is meta information contained within the :py:class:`.File` object: >>> pdb1.title 'CRYSTAL STRUCTURE OF OROTIDINE MONOPHOSPHATE DECARBOXYLASE COMPLEX WITH XMP' >>> pdb1.deposition_date datetime.date(2002, 5, 6) >>> pdb1.keywords ['TIM BARREL', 'LYASE'] >>> pdb1.classification 'LYASE' >>> pdb1.source_organism 'METHANOTHERMOBACTER THERMAUTOTROPHICUS STR. DELTA H' >>> pdb1.resolution 1.9 >>> pdb1.rvalue 0.193 >>> pdb1.rfree 0.229 atomium doesn't currently parse *every* bit of information from these files, but there is more than those shown above. See `the full API docs `_ for more details. In particular, you can access the processed intermediate MMCIF dictionary to get *any* attribute of these structures. Models and Assembly ################### All .pdb files contain one or more models - little universes containing a molecular scene. >>> pdb1.model >>> pdb1.models (,) Most just contain one - it's generally those that come from NMR experiments which contain multiple models. You can easily iterate through these to get their individual metrics: >>> for model in pdb2.models: print(model.center_of_mass) This model contains the 'asymmetric unit' - this is one or more protein (usually) chains arranged in space, which may not be how the molecule arranges itself in real life. It might just be how they arranged themselves in the experiment. To create the 'real thing' from the asymmetric unit, you use **biological assemblies.** Most .pdb files contain one or more biological assemblies - instructions for how to create a more realistic structure from the chains present, which in atomium are accessed using :py:attr:`~.File.assemblies`. In practice, what you need to know is that you can create a new model - not the one already there containing the asymmetric unit - as follows... >>> pdb3 = atomium.fetch('1XDA') >>> pdb3.model >>> pdb3.generate_assembly(1) >>> pdb3.generate_assembly(10) >>> [pdb.generate_assembly(n + 1) for n in range(len(pdb.assemblies))] [, , , , , , , , , , , ] Here you load a .pdb with multiple possible assemblies, have a quick look at the asymmetric unit with 1,842 atoms, and then generate first , and then all, of its possible biological assemblies by passing in their IDs. Model Contents ############## The basic structures within a model are chains, residues, ligands, and atoms. >>> pdb1.model.chains() {, } >>> pdb1.model.chain('B') >>> pdb1.model.residues(name='TYR') {, , , , , , , } >>> pdb1.model.residues(name__regex='TYR|PRO') {, , , , , , , , , , , , , < Residue PRO (B.1180)>, , , , , , , , , , , , } >>> pdb1.model.chain('B').residue('B.1206') >>> pdb1.model.chain('B').residue('B.1206').helix True >>> pdb1.model.ligands() {, , , } >>> pdb1.model.ligand(name='BU2').atoms() {, , , , , } >>> pdb1.model.ligand(name='BU2').atoms(mass__gt=12) {, , , , , } >>> pdb1.model.ligand(name='BU2').atoms(mass__gt=14) {, } The examples above demonstrate atomium's selection language. In the case of the molecules - :py:class:`.Model`, :py:class:`.Chain`, :py:class:`.Residue` and :py:class:`.Ligand` - you can pass in an ``id`` or ``name``, or search by regex pattern with ``id__regex`` or ``name__regex``. These structures have an even more powerful syntax too - you can pass in *any* property such as ``charge=1``, any comparitor of a property such as ``mass__lt=100``, or any regex of a property such as ``name__regex='[^C]'``. For pairwise comparisons, structures also have the :py:meth:`~.AtomStructure.pairwise_atoms` generator which will yield all unique atom pairs in the structure. These can obviously get very big indeed - a 5000 atom PDB file would have about 12 million unique pairs. Structures can be moved around and otherwise compared with each other... >>> pdb1.model.ligand(id='B:2002').mass 351.1022 >>> pdb1.model.ligand(id='B.2002').formula Counter({'C': 10, 'O': 9, 'N': 4, 'P': 1}) >>> pdb1.model.ligand(id='B:2002').nearby_atoms(2.8) {, , } >>> pdb1.model.ligand(id='B.2002').nearby_atoms(2.8, name='OD1') {} >>> pdb1.model.ligand(id='B.2002').nearby_residues(2.8) {} >>> pdb1.model.ligand(id='B.2002').nearby_structures(2.8, waters=True) {, , } >>> import math >>> pdb1.model.ligand(id='B.2002').rotate(math.pi / 2, 'x') >>> pdb1.model.ligand(id='B.2002').translate(10, 10, 15) >>> pdb1.model.ligand(id='B.2002').center_of_mass (-9.886734282781484, -42.558415679537184, 77.33400578435568) >>> pdb1.model.ligand(id='B.2002').radius_of_gyration 3.6633506511540825 >>> pdb1.model.ligand(id='B.2002').rmsd_with(pdb1.model.ligand(id='A.2001')) 0.133255572356 Here we look at one of the ligands, identify its mass and molecular formula, look at what atoms are within 2.8 Angstroms of it, and what residues are within that same distance, rotate it and translate it through space, see where its new center of mass is, and then finally get its RMSD with the other similar ligand in the model. Any operation which involves identifying nearby structures or atoms can be sped up - dramatically in the case of very large structures - by calling :py:meth:`~.Model.optimise_distances` on the :py:class:`.Model` first. This prevents atomium from having to compare every atom with every other atom every time a proximity check is made. The :py:class:`.Atom` objects themselves have their own useful properties. >>> pdb1.model.atom(97) >>> pdb1.model.atom(97).mass 12.0107 >>> pdb1.model.atom(97).anisotropy [0, 0, 0, 0, 0, 0] >>> pdb1.model.atom(97).bvalue 24.87 >>> pdb1.model.atom(97).location (-12.739, 31.201, 43.016) >>> pdb1.model.atom(97).distance_to(pdb1.model.atom(1)) 26.18289982030257 >>> pdb1.model.atom(97).nearby_atoms(2) {, , } >>> pdb1.model.atom(97).is_metal False >>> pdb1.model.atom(97).structure >>> pdb1.model.atom(97).chain Chains are a bit different from other structures in that they are iterable, indexable, and return their residues as a tuple, not a set... >>> pdb1.model.atom(97).chain >>> pdb1.model.chain('A') >>> len(pdb1.model.chain('A')) 204 >>> pdb1.model.chain('A')[10] >>> pdb1.model.chain('A').residues()[:5] (, , , , ) >>> pdb1.model.chain('A').sequence 'LRSRRVDVMDVMNRLILAMDLMNRDDALRVTGEVREYIDTVKIGYPLVLSEGMDIIAEFRKRFGCRIIADFKVAD IPETNEKICRATFKAGADAIIVHGFPGADSVRACLNVAEEMGREVFLLTEMSHPGAEMFIQGAADEIARMGVDLGV KNYVGPSTRPERLSRLREIIGQDSFLISPGVGAQGGDPGETLRFADAIIVGRSIYLADNPAAAAAGIIESIKDLLI PE' The sequence is the 'real' sequence that exists in nature. Some of them will be missing from the model for practical reasons. Residues can generate name information based on their three letter code, and are aware of their immediate neighbors. >>> pdb1.model.residue('A.100') >>> pdb1.model.residue('A.100').name 'PHE' >>> pdb1.model.residue('A.100').code 'F' >>> pdb1.model.residue('A.100').full_name 'phenylalanine' >>> pdb1.model.residue('A.100').next >>> pdb1.model.residue('A.100').previous Saving Data ~~~~~~~~~~~ A model can be saved to file using: >>> model.save("new.cif") >>> model.save("new.pdb") Any structure can be saved in this way, so you can save chains or molecules to their own seperate files if you so wish. >>> model.chain("A").save("chainA.pdb") >>> model.chain("B").save("chainB.cif") >>> model.ligand(name="XMP").save("ligand.mmtf") Note that if the model you are saving is one from a biological assembly, it will likely have many duplicated IDs, so saving to file may create unexpected results.