devldevelopment
diff --git a/‎structure/README.md‎
Lines changed: 8 additions & 6 deletions b/‎structure/README.md‎
Lines changed: 8 additions & 6 deletions
diff --git a/‎structure/alignment.md‎
Lines changed: 14 additions & 5 deletions b/‎structure/alignment.md‎
Lines changed: 14 additions & 5 deletions
diff --git a/‎structure/bioassembly.md‎
Lines changed: 225 additions & 0 deletions b/‎structure/bioassembly.md‎
Lines changed: 225 additions & 0 deletions
diff --git a/‎structure/caching.md‎
Lines changed: 10 additions & 10 deletions b/‎structure/caching.md‎
Lines changed: 10 additions & 10 deletions
@@ -1,7 +1,7 @@
 The Protein Structure Modules of BioJava
 =====================================================
 
-A tutorial for the protein structure modules of BioJava
+A tutorial for the protein structure modules of [BioJava](http://www.biojava.org)
 
 ## About
 <table>
@@ -43,13 +43,15 @@ Chapter 7 - [SEQRES and ATOM records](seqres.md), mapping to Uniprot (SIFTs)
 
 Chapter 8 - Protein [Structure Alignments](alignment.md)
 
-Chapter 9 - Biological Assemblies 
+Chapter 9 - [Biological Assemblies](bioassembly.md) 
 
-Chapter 10 - Protein Symmetry
+Chapter 10 - [External Databases](externaldb.md) like SCOP &amp; CATH
+
+Chapter 11 - Protein Symmetry
+
+Chapter 12 - Bonds
 
-Chapter 11 - Bonds
 
-Chapter 12 - [External Databases](externaldb.md) like SCOP &amp; CATH
 
 
 ### Author: 
@@ -67,6 +69,6 @@ doi: 10.1093/bioinformatics/bts494
 
 The content of this tutorial is available under the [CC-BY](http://creativecommons.org/licenses/by/3.0/) license.
 
-[view license](license.md)
+[view license](../license.md)
 
 
@@ -26,22 +26,31 @@ Before going the details how to use the algorithms programmatically, let's take
         AlignmentGui.getInstance();
 </pre>    
 
-shows this user interface:
+shows the following user interface. 
 
 ![Alignment GUI](img/alignment_gui.png)
 
+You can manually select protein chains, domains, or custom files to be aligned. Try to align 2hyn vs. 1zll. This will show the results in a graphical way, in 3D:
 
+![3D Alignment of PDB IDs 2hyn and 1zll](img/2hyn_1zll.png)
 
+and also a 2D display, that interacts with the 3D display
 
-## Combinatorial Extension (CE)
+![2D Alignment of PDB IDs 2hyn and 1zll](img/alignmentpanel.png)
+
+The functionality to perform and visualize these alignments can of course be used also from your own code. Let's first have a look at the alignment algorithms:
+
+## The Alignment Algorithms
+
+### Combinatorial Extension (CE)
 
 The Combinatorial Extension (CE) algorithm was originally developed by [Shindyalov and Bourne in 1998](http://peds.oxfordjournals.org/content/11/9/739.short). 
 
-## Combinatorial Extension with Circular Permutation (CE-CP)
+### Combinatorial Extension with Circular Permutation (CE-CP)
 
-## FATCAT - rigid
+### FATCAT - rigid
 
-## FATCAR - flexible
+### FATCAR - flexible
 
 
 ## Acknowledgements
 
@@ -1,3 +1,228 @@
 Asymmetric Unit and Biological Assembly
 =======================================
 
+For many proteins, the asymmetric unit and the biological assembly are the same. However there are quite a few proteins where they are not identical and depending on what you are interested in, it might be important that you work with the biological assembly, instead of the asymmetric unit.
+
+## Asymmetric Unit
+
+The asymmetric unit is the smallest portion of a crystal structure to which symmetry operations can be applied in order to generate the complete unit cell (the crystal repeating unit). 
+
+A crystal asymmetric unit may contain:
+
+* one biological assembly
+* a portion of a biological assembly
+* multiple biological assemblies
+
+## Biological Assembly
+
+The biological assembly (also sometimes referred to as the biological unit) is the macromolecular assembly that has either been shown to be or is believed to be the functional form of the molecule For example, the functional form of hemoglobin has four chains.
+
+The [StructureIO](http://www.biojava.org/docs/api/org/biojava3/structure/StructureIO.html) and [AtomCache](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/util/AtomCache.html) classes in Biojava provide access methods to work with either asymmetric unit or biological assembly.
+
+Let's load both representations of hemoglobin PDB ID [1HHO](http://www.rcsb.org/pdb/explore.do?structureId=1hho) and visualize it:
+
+```java
+    public static void main(String[] args){
+
+        try {
+            Structure asymUnit = StructureIO.getStructure("1hho");
+
+            showStructure(asymUnit);
+            
+            Structure bioAssembly = StructureIO.getBiologicalAssembly("1hho");
+            
+            showStructure(bioAssembly);
+            
+        } catch (Exception e){
+            e.printStackTrace();
+        }
+
+    }
+
+    public static void showStructure(Structure structure){
+
+        StructureAlignmentJmol jmolPanel = new StructureAlignmentJmol();
+
+        jmolPanel.setStructure(structure);
+
+        // send some commands to Jmol
+        jmolPanel.evalString("select * ; color chain;");            
+        jmolPanel.evalString("select *; spacefill off; wireframe off; cartoon on;  ");
+        jmolPanel.evalString("select ligands; cartoon off; wireframe 0.3; spacefill 0.5; color cpk;");
+
+    }
+```
+
+<table>
+    <tr>
+        <td>
+            The <b>asymmetric unit</b> of hemoglobin PDB ID <a href="http://www.rcsb.org/pdb/explore.do?structureId=1hho">1HHO</a>
+        </td>
+        <td>
+            The <b>biological assembly</b> of hemoglobin PDB ID <a href="http://www.rcsb.org/pdb/explore.do?structureId=1hho">1HHO</a>
+        </td>
+    </tr>
+    <tr>
+        <td>
+            <img src="img/1hho_asym.png"/>
+        </td>
+        <td>
+            <img src="img/1hho_biounit.png"/>
+        </td>
+    </tr>
+</table>
+
+As we can see, the two representations are quite different! When investigating protein interfaces, ligand binding and for many other applications, you always want to work with the biological assemblies.
+
+Here another example, the bacteriophave GA protein capsid PDB ID [1GAV](http://www.rcsb.org/pdb/explore.do?structureId=1gav)
+
+<table>
+    <tr>
+        <td>
+            The <b>asymmetric unit</b> of bacteriophave GA protein capsid PDB ID  <a href="http://www.rcsb.org/pdb/explore.do?structureId=1gav">1GAV</a>
+        </td>
+        <td>
+            The <b>biological assembly</b> of bacteriophave GA protein capsid PDB ID  <a href="http://www.rcsb.org/pdb/explore.do?structureId=1gav">1GAV</a>
+        </td>
+    </tr>
+    <tr>
+        <td>
+            <img src="img/1gav_asym.png"/>
+        </td>
+        <td>
+            <img src="img/1gav_biounit.png"/>
+        </td>
+    </tr>
+</table>
+
+## Re-creating Biological Assemblies
+
+Since biological assemblies can be accessed via the StructureIO interface, in principle there is no need to access the lower-level code in BioJava that allows to re-create biological assemblies. If you are interested in looking at the gory details of this, here a couple of pointers into the code. In principle there are two ways for how to get to a biological assembly:
+
+A) The biological assembly needs to be re-built and the atom coordinates of the asymmetric unit need to be rotated according to the instructions in the files. The information required to re-create the biological assemblies is available in both the PDB an mmCIF/PDBx files. 
+
+In PDB files the relevant transformations are stored in the *REMARK 350* records. For mmCIF/PDBx, the *_pdbx_struct_assembly* and *_pdbx_struct_oper_list* categories store the corresponding rules.
+
+B) There is also a pre-computed file available that contains an assembled version of a structure. This file can be parsed directly, without having to perform rotation operations on coordinates.
+
+BioJava contains utility classes to re-create biological assemblies for both PDB and mmCIF, as well as to parse the pre-computed file. The [BioUnitDataProvider](http://www.biojava.org/docs/api/org/biojava/bio/structure/quaternary/io/BioUnitDataProvider.html) interface defines what is required to re-build an assembly. The [BioUnitDataProviderFactory](http://www.biojava.org/docs/api/org/biojava/bio/structure/quaternary/io/BioUnitDataProviderFactory.html) allows to specify which of the BioUnitDataProviders is getting used.
+
+Take a look at the method getBiologicalAssembly() in [StructureIO](http://www.biojava.org/docs/api/org/biojava/bio/structure/io/StructureIO.html)  to see how the BioUnitDataProviders are used by the *BiologicalAssemblyBuilder*.
+
+## Memory consumption
+
+This example in the next section loads the structure of the PBCV-1 virus capsid (PDB ID [1M4X](http://www.rcsb.org/pdb/explore.do?structureId=1m4x)). It consists of 16 million atoms and has one of the largest, if not the largest biological assembly that is currently available in the PDB. Needless to say it is important to change the maximum heap size parameter, otherwise there is no successfully load this. It requires a minimum of 9GB RAM to load (measured on Java 1.7 on OSX). You can change the heap size by providing the following startup parameter (and assuming you have 10G or more of RAM available on your system)
+<pre>
+    -Xmx10G 
+</pre>
+
+Note: when loading this structure with 9GB of memory, the Java VM spends a significant amount of time in garbage collection (GC). If you provide more RAM than the minimum requirement, then GC is triggered less often and the biological assembly loads faster.
+
+<table>
+    <tr>
+        <td>
+          <img src="img/1m4x_bio_r_250.jpg"/>
+        </td>       
+    </tr>
+    <tr>
+        <td>
+            The biological assembly of the PBCV-1 virus capsid. (image source: <a href="http://www.rcsb.org/pdb/explore.do?structureId=1m4x">RCSB</a>)
+        </td>
+    </tr>
+</table>
+
+## Low level access to parsing pre-assembled biological asssembly files
+
+To load the pre-assembled biological assembly file directly, one can tweak the low-level PDB file parser like this
+
+```java
+
+public static void main(String[] args){
+
+        public static void main(String[] args){
+
+        // This loads the PBCV-1 virus capsid, one of, if not the biggest biological assembly in terms on nr. of atoms.
+        // The 1m4x.pdb1.gz file has 313 MB (compressed)
+        // This Structure requires a minimum of 9 GB of memory to be loaded in memory. 
+
+        String pdbId = "1M4X";
+
+        Structure bigStructure = readStructure(pdbId,1);
+        
+        // let's take a look how much memory this consumes currently
+
+        Runtime r = Runtime.getRuntime();
+
+        // let's try to trigger the Java Garbage collector
+        r.gc();
+
+        System.out.println("Memory consumption after " + pdbId + 
+                " structure has been loaded into memory:");
+        
+        String mem = String.format("Total %dMB, Used %dMB, Free %dMB, Max %dMB", 
+                r.totalMemory() / 1048576,
+                (r.totalMemory() - r.freeMemory()) / 1048576, 
+                r.freeMemory() / 1048576,
+                r.maxMemory() / 1048576);
+
+        System.out.println(mem);
+                
+        System.out.println("# atoms: " + StructureTools.getNrAtoms(bigStructure));
+        
+    }
+    /** Load a specific biological assembly for a PDB entry
+     *  
+     * @param pdbId .. the PDB ID
+     * @param bioAssemblyId .. the first assembly has the bioAssemblyId 1
+     * @return a Structure object or null if something went wrong.
+     */
+    public static Structure  readStructure(String pdbId, int bioAssemblyId) {
+
+        // pre-computed files use lower case PDB IDs
+        pdbId = pdbId.toLowerCase();
+
+        // we need to tweak the FileParsing parameters a bit
+        FileParsingParameters p = new FileParsingParameters();
+
+        // some bio assemblies are large, we want an all atom representation and avoid
+        // switching to a Calpha-only representation for large molecules
+        // note, this requires several GB of memory for some of the largest assemblies, such a 1MX4
+        p.setAtomCaThreshold(Integer.MAX_VALUE);
+
+        // parse remark 350 
+        p.setParseBioAssembly(true);
+
+        // The low level PDB file parser
+        PDBFileReader pdbreader = new PDBFileReader();
+
+        // we just need this to track where to store PDB files
+        // this checks the PDB_DIR property (and uses a tmp location if not set) 
+        AtomCache cache = new AtomCache();
+        pdbreader.setPath(cache.getPath());
+
+        pdbreader.setFileParsingParameters(p);
+
+        // download missing files
+        pdbreader.setAutoFetch(true);
+
+        pdbreader.setBioAssemblyId(bioAssemblyId);
+        pdbreader.setBioAssemblyFallback(false);
+
+        Structure structure = null;
+        try { 
+            structure = pdbreader.getStructureById(pdbId);
+            if ( bioAssemblyId > 0 )
+                structure.setBiologicalAssembly(true);
+            structure.setPDBCode(pdbId);
+        } catch (Exception e){
+            e.printStackTrace();
+            return null;
+        }
+        return structure;
+    }
+ ```
+
+
+## Further Reading
+
+The RCSB PDB web site has a great [tutorial on Biological Assemblies](http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/bioassembly_tutorial.html).
@@ -8,16 +8,16 @@ The main class that provides this functionality is the [AtomCache](http://www.bi
 
 It is hidden inside the StructureIO class, that we already encountered earlier.
 
-<pre>
+```java
 	Structure structure = StructureIO.getStructure("4hhb");			
-</pre>
+```
 
 is the same as
 
-<pre>
+```java
 	AtomCache cache = new AtomCache();
 	cache.getStructure("4hhb");
-</pre>
+```
 
 
 ## Where are the files getting written to?
@@ -33,11 +33,11 @@ you can configure the AtomCache by setting the PDB_DIR system property
 
 An alternative is to hard-code the path in this way (but setting it as a property is better style)
 
-<pre>
+```java
 	AtomCache cache = new AtomCache();
 
 	cache.setPath("/path/to/pdb/files/");
-</pre>
+```
 
 ## File Parsing Parameters
 
@@ -47,7 +47,7 @@ class is the main place to influence the level of detail and as a consequence th
 
 This example turns on the use of chemical components when loading a structure. (See also the [next chapter](chemcomp.md))
 
-<pre>
+```java
 	AtomCache cache = new AtomCache();
 
 	cache.setPath("/tmp/");
@@ -60,20 +60,20 @@ This example turns on the use of chemical components when loading a structure. (
 
 	Structure structure = StructureIO.getStructure("4hhb");			
 
-</pre>
+```
 
 ## Caching of other SCOP, CATH
 
 The AtomCache not only provides access to PDB, it can also fetch Structure representations of protein domains, as defined by SCOP and CATH.
 
-<pre>
+```java
 	// uses a SCOP domain definition
 	Structure domain1 = StructureIO.getStructure("d4hhba_");
 
 	// Get a specific protein chain, note: chain IDs are case sensitive, PDB IDs are not.
 	Structure chain1 = StructureIO.getStructure("4HHB.A");
 
-</pre>
+```
 
 There are quite a number of external database IDs that are supported here. See the 
 <a href="http://www.biojava.org/docs/api/org/biojava/bio/structure/align/util/AtomCache.html#getStructure(java.lang.String)">AtomCache documentation</a> for more details on the supported options.