From 54dcfa6a66af92a0ee9b7eb6ead3a5adedbc92c4 Mon Sep 17 00:00:00 2001 From: josemduarte Date: Wed, 27 Aug 2014 22:43:39 +0200 Subject: [PATCH 1/7] New tutorial chapter --- structure/crystal-contacts.md | 36 +++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 structure/crystal-contacts.md diff --git a/structure/crystal-contacts.md b/structure/crystal-contacts.md new file mode 100644 index 0000000..d399ae1 --- /dev/null +++ b/structure/crystal-contacts.md @@ -0,0 +1,36 @@ +# How to calculate all crystal contacts in a PDB structure + +## Why crystal contacts? + +A protein structure is determined by X-ray diffraction by producing a crystal - an infinite lattice of molecules - of the protein. Thus the end result of the diffraction experiment is a crystal lattice and not just a single molecule. However the PDB file only contains the coordinates of the Asymmetric Unit, defined as the minimum unit needed to reconstruct the full crystal using symmetry operators. + +[here](http://www.wwpdb.org/news/news_2013.html#22-May-2013) + + + +## Getting the set of unique contacts in the crystal lattice + +This code snippet will produce a list of all non-redundant interfaces present in the crystal lattice of PDB entry [1SMT](http://www.rcsb.org/pdb/explore.do?structureId=1SMT): + +```java + AtomCache cache = new AtomCache(); + + StructureIO.setAtomCache(cache); + + Structure structure = StructureIO.getStructure("1SMT"); + + CrystalBuilder cb = new CrystalBuilder(structure); + + StructureInterfaceList interfaces = cb.getUniqueInterfaces(6); + + interfaces.calcAsas(3000, 1, -1); + + // now interfaces are sorted by areas, we can get the largest interface in the crystal and look at its area + interfaces.get(1).getTotalArea(); + +``` + +An interface is defined here as any 2 chains with at least a pair of atoms within the given distance cutoff (6 A in the example above) + + + From f0ec3f23ef59c06514689c8a31185a7238f18bb7 Mon Sep 17 00:00:00 2001 From: Jose Manuel Duarte Date: Wed, 27 Aug 2014 22:49:04 +0200 Subject: [PATCH 2/7] Update crystal-contacts.md --- structure/crystal-contacts.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/structure/crystal-contacts.md b/structure/crystal-contacts.md index d399ae1..502ac7d 100644 --- a/structure/crystal-contacts.md +++ b/structure/crystal-contacts.md @@ -4,9 +4,6 @@ A protein structure is determined by X-ray diffraction by producing a crystal - an infinite lattice of molecules - of the protein. Thus the end result of the diffraction experiment is a crystal lattice and not just a single molecule. However the PDB file only contains the coordinates of the Asymmetric Unit, defined as the minimum unit needed to reconstruct the full crystal using symmetry operators. -[here](http://www.wwpdb.org/news/news_2013.html#22-May-2013) - - ## Getting the set of unique contacts in the crystal lattice @@ -21,11 +18,15 @@ This code snippet will produce a list of all non-redundant interfaces present in CrystalBuilder cb = new CrystalBuilder(structure); + // 6 is the distance cutoff to consider 2 atoms in contact StructureInterfaceList interfaces = cb.getUniqueInterfaces(6); + + System.out.println("The crystal contains "+interfaces.size()+" unique interfaces"); + // this calculates the buried surface areas of all interfaces and sorts them by areas interfaces.calcAsas(3000, 1, -1); - // now interfaces are sorted by areas, we can get the largest interface in the crystal and look at its area + // we can get the largest interface in the crystal and look at its area interfaces.get(1).getTotalArea(); ``` From d5f479e2cd3f1e09c4f6091e7d15257e37a082fe Mon Sep 17 00:00:00 2001 From: Jose Manuel Duarte Date: Thu, 28 Aug 2014 09:37:43 +0200 Subject: [PATCH 3/7] Update crystal-contacts.md --- structure/crystal-contacts.md | 1 + 1 file changed, 1 insertion(+) diff --git a/structure/crystal-contacts.md b/structure/crystal-contacts.md index 502ac7d..9fbd452 100644 --- a/structure/crystal-contacts.md +++ b/structure/crystal-contacts.md @@ -33,5 +33,6 @@ This code snippet will produce a list of all non-redundant interfaces present in An interface is defined here as any 2 chains with at least a pair of atoms within the given distance cutoff (6 A in the example above) +See [DemoCrystalInterfaces](https://github.com/biojava/biojava/blob/master/biojava3-structure/src/main/java/demo/DemoCrystalInterfaces.java) for a fully working demo of the example above. From af0f3966c15433059315d4222f1f3819ab47175c Mon Sep 17 00:00:00 2001 From: Jose Manuel Duarte Date: Thu, 28 Aug 2014 09:49:34 +0200 Subject: [PATCH 4/7] Update crystal-contacts.md --- structure/crystal-contacts.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/structure/crystal-contacts.md b/structure/crystal-contacts.md index 9fbd452..fcf47b3 100644 --- a/structure/crystal-contacts.md +++ b/structure/crystal-contacts.md @@ -1,8 +1,8 @@ -# How to calculate all crystal contacts in a PDB structure +# How to find all crystal contacts in a PDB structure ## Why crystal contacts? -A protein structure is determined by X-ray diffraction by producing a crystal - an infinite lattice of molecules - of the protein. Thus the end result of the diffraction experiment is a crystal lattice and not just a single molecule. However the PDB file only contains the coordinates of the Asymmetric Unit, defined as the minimum unit needed to reconstruct the full crystal using symmetry operators. +A protein structure is determined by X-ray diffraction from a protein crystal, i.e. an infinite lattice of molecules. Thus the end result of the diffraction experiment is a crystal lattice and not just a single molecule. However the PDB file only contains the coordinates of the Asymmetric Unit, defined as the minimum unit needed to reconstruct the full crystal using symmetry operators. ## Getting the set of unique contacts in the crystal lattice @@ -31,7 +31,12 @@ This code snippet will produce a list of all non-redundant interfaces present in ``` -An interface is defined here as any 2 chains with at least a pair of atoms within the given distance cutoff (6 A in the example above) +An interface is defined here as any 2 chains with at least a pair of atoms within the given distance cutoff (6 A in the example above). + +The algorithm to find all unique interfaces in the crystal works roughly like this: ++ Reconstructs the full unit cell by applying the matrix operators of the corresponding space group to the Asymmetric Unit. ++ Searches all cells around the original one by applying crystal translations, if any 2 chains in that search is found to contact then the new contact is added to the final list. ++ The search is performend without repeating redundant symmetry operators, making sure that if a contact is found then it is a unique contact. See [DemoCrystalInterfaces](https://github.com/biojava/biojava/blob/master/biojava3-structure/src/main/java/demo/DemoCrystalInterfaces.java) for a fully working demo of the example above. From 764bea3823cff54cad44fb3fa0806d452d4a3006 Mon Sep 17 00:00:00 2001 From: Jose Manuel Duarte Date: Thu, 28 Aug 2014 10:12:25 +0200 Subject: [PATCH 5/7] Update crystal-contacts.md --- structure/crystal-contacts.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/structure/crystal-contacts.md b/structure/crystal-contacts.md index fcf47b3..97ba9bb 100644 --- a/structure/crystal-contacts.md +++ b/structure/crystal-contacts.md @@ -2,7 +2,11 @@ ## Why crystal contacts? -A protein structure is determined by X-ray diffraction from a protein crystal, i.e. an infinite lattice of molecules. Thus the end result of the diffraction experiment is a crystal lattice and not just a single molecule. However the PDB file only contains the coordinates of the Asymmetric Unit, defined as the minimum unit needed to reconstruct the full crystal using symmetry operators. +A protein structure is determined by X-ray diffraction from a protein crystal, i.e. an infinite lattice of molecules. Thus the end result of the diffraction experiment is a crystal lattice and not just a single molecule. However the PDB file only contains the coordinates of the Asymmetric Unit (AU), defined as the minimum unit needed to reconstruct the full crystal using symmetry operators. + +Looking at the AU alone is not enough to understand the crystal structure. For instance the biologically relevant assembly (known as the Biological Unit) can occur through a symmetry operator that can be found looking at the crystal contacts. See for instance [1M4N](http://www.rcsb.org/pdb/explore.do?structureId=1M4N): its biological unit is a dimer that happens through a 2-fold operator and is the largest interface found in the crystal. + +Looking at crystal contacts can also be important in order to assess the quality and reliability of the deposited PDB model: an AU can look perfectly fine but then upon reconstruction of the lattice the molecules can be clashing, which indicates that something is wrong in the model. ## Getting the set of unique contacts in the crystal lattice From 817a56dec2b0cf9640dd24c4bf664df12cc13eae Mon Sep 17 00:00:00 2001 From: Jose Manuel Duarte Date: Thu, 28 Aug 2014 10:28:58 +0200 Subject: [PATCH 6/7] New tutorial on contact maps --- structure/contact-map.md | 50 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 structure/contact-map.md diff --git a/structure/contact-map.md b/structure/contact-map.md new file mode 100644 index 0000000..3ed1214 --- /dev/null +++ b/structure/contact-map.md @@ -0,0 +1,50 @@ +# Finding contacts within a protein chain: contact maps + +Contacts are a useful tool to analyse protein structures. It simplifies the 3-Dimensional view of the structures into a 2-Dimensional set of contacts between its atoms or its residues. The representation of the contacts in a matrix is known as the contact map. Many protein structure analysis and prediction efforts are done by using contacts, see for instance + +## Getting the contact map of a protein chain + +This code snippet will produce the set of contacts between all C alpha atoms for chain A of PDB entry [1SMT](http://www.rcsb.org/pdb/explore.do?structureId=1SMT): + +```java + AtomCache cache = new AtomCache(); + StructureIO.setAtomCache(cache); + + Structure structure = StructureIO.getStructure("1SMT"); + + Chain chain = structure.getChainByPDB("A"); + + // we want contacts between Calpha atoms only + String[] atoms = {" CA "}; + // the distance cutoff we use is 8A + AtomContactSet contacts = StructureTools.getAtomsInContact(chain, atoms, 8.0); + + System.out.println("Total number of CA-CA contacts: "+contacts.size()); + + +``` + +The algorithm to find the contacts uses geometric hashing without need to calculate a full distance matrix, thus it scales nicely. + +## Getting the contacts between two protein chains + +One can also find the contacting atoms between two protein chains. For instance the following code finds the contacts between the first 2 chains of PDB entry [1SMT](http://www.rcsb.org/pdb/explore.do?structureId=1SMT): + +```java + AtomCache cache = new AtomCache(); + StructureIO.setAtomCache(cache); + + Structure structure = StructureIO.getStructure("1SMT"); + + AtomContactSet contacts = StructureTools.getAtomsInContact(structure.getChain(0), structure.getChain(1), 5, false); + + System.out.println("Total number of atom contacts: "+contacts.size()); + + // the list of atom contacts can be reduced to a list of contacts between groups: + GroupContactSet groupContacts = new GroupContactSet(contacts); +``` + + +See [DemoContacts](https://github.com/biojava/biojava/blob/master/biojava3-structure/src/main/java/demo/DemoContacts.java) for a fully working demo of the example above. + + From 4f3b37b9271609ee4579c5799edbae785aae30f3 Mon Sep 17 00:00:00 2001 From: Jose Manuel Duarte Date: Thu, 28 Aug 2014 10:46:09 +0200 Subject: [PATCH 7/7] Update contact-map.md --- structure/contact-map.md | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/structure/contact-map.md b/structure/contact-map.md index 3ed1214..f4127ac 100644 --- a/structure/contact-map.md +++ b/structure/contact-map.md @@ -1,6 +1,11 @@ -# Finding contacts within a protein chain: contact maps +# Finding contacts between atoms in a protein: contact maps -Contacts are a useful tool to analyse protein structures. It simplifies the 3-Dimensional view of the structures into a 2-Dimensional set of contacts between its atoms or its residues. The representation of the contacts in a matrix is known as the contact map. Many protein structure analysis and prediction efforts are done by using contacts, see for instance +Contacts are a useful tool to analyse protein structures. They simplify the 3-Dimensional view of the structures into a 2-Dimensional set of contacts between its atoms or its residues. The representation of the contacts in a matrix is known as the contact map. Many protein structure analysis and prediction efforts are done by using contacts. For instance they can be useful for: + ++ development of structural alignment algorithms [Holm 1993][] [Caprara 2004][] ++ automatic domain identification [Alexandrov 2003][] [Emmert-Streib 2007][] ++ structural modelling by extraction of contact-based empirical potentials [Benkert 2008][] ++ structure prediction via contact prediction from sequence information [Jones 2012][] ## Getting the contact map of a protein chain @@ -36,7 +41,8 @@ One can also find the contacting atoms between two protein chains. For instance Structure structure = StructureIO.getStructure("1SMT"); - AtomContactSet contacts = StructureTools.getAtomsInContact(structure.getChain(0), structure.getChain(1), 5, false); + AtomContactSet contacts = + StructureTools.getAtomsInContact(structure.getChain(0), structure.getChain(1), 5, false); System.out.println("Total number of atom contacts: "+contacts.size()); @@ -45,6 +51,13 @@ One can also find the contacting atoms between two protein chains. For instance ``` -See [DemoContacts](https://github.com/biojava/biojava/blob/master/biojava3-structure/src/main/java/demo/DemoContacts.java) for a fully working demo of the example above. +See [DemoContacts](https://github.com/biojava/biojava/blob/master/biojava3-structure/src/main/java/demo/DemoContacts.java) for a fully working demo of the examples above. + +[Holm 1993]: http://www.biomedcentral.com/pubmed/8377180 +[Caprara 2004]: http://www.biomedcentral.com/pubmed/15072687 +[Alexandrov 2003]: http://www.biomedcentral.com/pubmed/12584135 +[Emmert-Streib 2007]: http://www.biomedcentral.com/pubmed/17608939 +[Benkert 2008]: http://www.biomedcentral.com/pubmed/17932912 +[Jones 2012]: http://www.ncbi.nlm.nih.gov/pubmed/22101153