|
| 1 | +Structure Alignment Data Model |
| 2 | +=== |
| 3 | + |
| 4 | +## AFPChain Data Model |
| 5 | + |
| 6 | +The `AFPChain` data structure was designed to store pairwise structural |
| 7 | +alignments. The class functions as a bean, and contains many variables |
| 8 | +used internally by the alignment algorithms implemented in biojava. |
| 9 | + |
| 10 | +Some of the important stored variables are: |
| 11 | +* Algorithm Name |
| 12 | +* Optimal Alignment: described later. |
| 13 | +* Optimal RMSD: final and total RMSD value of the alignment. |
| 14 | +* TM-score |
| 15 | +* BlockRotationMatrix: rotation component of the superposition transformation. |
| 16 | +* BlockShiftVector: translation component of the superposition transformation. |
| 17 | + |
| 18 | +BioJava class: [org.biojava.bio.structure.align.model.AFPChain](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/model/AFPChain.html) |
| 19 | + |
| 20 | +### The Optimal Alignment |
| 21 | + |
| 22 | +The residue equivalencies of the alignment (EQRs) are described in the optimal |
| 23 | +alignment variable, a triple array of integers, where the indices stand for: |
| 24 | + |
| 25 | +```java |
| 26 | + int[][][] optAln = afpChain.getOptAln(); |
| 27 | + int residue = optAln[block][chain][eqr]; |
| 28 | +``` |
| 29 | + |
| 30 | +* **block**: the blocks divide the alignment into different parts. The |
| 31 | +division can be due to non-topological rearrangements (e.g. circular |
| 32 | +permutations) or due to flexible parts (e.g. domain switch). There can |
| 33 | +be any number of blocks in a structural alignment, defined by the structure |
| 34 | +alignment algorithm. |
| 35 | +* **chain**: in a pairwise alignment there are only two chains, or structures. |
| 36 | +* **eqr**: EQR stands for equivalent residue position, i.e. the alignment |
| 37 | +position. There are as many positions (EQRs) in a block as the length of |
| 38 | +the alignment block, and their number is equal for any of the two chains in |
| 39 | +the same block. |
| 40 | + |
| 41 | +In each entry (combination of the three indices described above) an integer |
| 42 | +is stored, which corresponds to the residue index in the specified chain, i.e. |
| 43 | +the index in the Atom array of the chain. In between the same block, the stored |
| 44 | +integers (residues) are always in increasing order. |
| 45 | + |
| 46 | +### Examples |
| 47 | + |
| 48 | +Some examples of how to get the basic properties of an `AFPChain`: |
| 49 | + |
| 50 | +```java |
| 51 | + afpChain.getAlgorithmName(); //Name of the algorithm that generated the alignment |
| 52 | + afpChain.getBlockNum(); //Number of blocks |
| 53 | + afpChain.getTMScore(); //TM-score |
| 54 | + afpChain.getTotalRmsdOpt() //Optimal RMSD |
| 55 | + afpChain.getBlockRotationMatrix()[0] //get the rotation matrix of the first block |
| 56 | + afpChain.getBlockShiftVector()[0] //get the translation vector of the first block |
| 57 | +``` |
| 58 | + |
| 59 | +### Overview |
| 60 | + |
| 61 | +As an overview, the `AFPChain` data model: |
| 62 | + |
| 63 | +* Only supports **pairwise alignments**, i.e. two chains or structures aligned. |
| 64 | +* Can support **flexible alignments** and **non-topological alignments**. |
| 65 | +However, their combinatation (a flexible alignment with topological rearrangements) |
| 66 | +can not be represented, because the blocks mean either one or the other. |
| 67 | +* Can not support **non-sequential alignments**, or they would require a new block |
| 68 | +for each EQR, because sequentiality of the residues is assumed inside each block. |
| 69 | + |
| 70 | +## MultipleAlignment Data Model |
| 71 | + |
| 72 | +Since BioJava 4.1.0, a new data model is available to store structure alignments. |
| 73 | +The `MultipleAlignment` data structure is a general model that supports any of the |
| 74 | +following properties, and any combination: |
| 75 | + |
| 76 | +* **Multiple structures**: the model is no longer restricted to pairwise alignments. |
| 77 | +* **Non-topological alignments**: such as circular permutations or domain rearrangements. |
| 78 | +* **Flexible alignments**: parts of the alignment with different superposition |
| 79 | +transformation. |
| 80 | + |
| 81 | +In addtition, the data structure is not limited in the number and types of scores |
| 82 | +it can store, because the scores are stored in a key:value fashion, as it will be |
| 83 | +described later. |
| 84 | + |
| 85 | +BioJava class: [org.biojava.bio.structure.align.multiple.MultipleAlignment](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/MultipleAlignment.html) |
| 86 | + |
| 87 | +### Object Hierarchy |
| 88 | + |
| 89 | +The biggest difference with `AFPChain` is that the `MultipleAlignment` data |
| 90 | +structure is object oriented. |
| 91 | +The hierarchy of sub-objects is represented below: |
| 92 | + |
| 93 | +<pre> |
| 94 | +MultipleAlignmentEnsemble |
| 95 | + | |
| 96 | + MultipleAlignment(s) |
| 97 | + | |
| 98 | + BlockSet(s) |
| 99 | + | |
| 100 | + Block(s) |
| 101 | +</pre> |
| 102 | + |
| 103 | +* **MultipleAlignmentEnsemble**: the ensemble is the top level of the hierarchy. |
| 104 | +As a top level, it stores information regarding creation properties (algorithm, |
| 105 | +version, creation time, etc.), the structures involved in the alignment (Atoms, |
| 106 | +structure identifiers, etc.) and cached variables (atomic distance matrices). |
| 107 | +It contains a collection of `MultipleAlignment` that share the same properties |
| 108 | +stored in the ensemble. This construction allows the storage of alternative |
| 109 | +alignments inside the same data structure. |
| 110 | + |
| 111 | +* **MultipleAlignment**: the `MultipleAlignment` stores the core information of a |
| 112 | +multiple structure alignment. It is designed to be the return type of the multiple |
| 113 | +structure alignment algorithms. The object contains a collection of `BlockSet` and |
| 114 | +it is linked to its parent `MultipleAlignmentEnsemble`. |
| 115 | + |
| 116 | +* **BlockSet**: the `BlockSet` stores a flexible part of a multiple structure |
| 117 | +alignment. A flexible part needs the residue equivalencies involved, contained in |
| 118 | +a collection of `Block`, and a transformation matrix for every structure that |
| 119 | +describes the 3D superposition of all structures. It is linked to its parent |
| 120 | +`MultipleAlignment`. |
| 121 | + |
| 122 | +* **Block**: the `Block` stores the aligned positions (equivalent residues) of a |
| 123 | +`BlockSet` that are in sequentially increasing order. Each `Block` represents a |
| 124 | +sequential part of a non-topological alignment, if more than one `Block` is present. |
| 125 | +It is linked to its parent `BlockSet`. |
| 126 | + |
| 127 | +### The Optimal Alignment |
| 128 | + |
| 129 | +In the `MultipleAlignment` data structure the aligned residues are stored in a |
| 130 | +double List for every `Block`. The indices of the double List are the following: |
| 131 | + |
| 132 | +```java |
| 133 | + List<List<Integer>> optAln = block.getAlnRes(); |
| 134 | + Integer residue = optAln.get(chain).get(eqr); |
| 135 | +``` |
| 136 | + |
| 137 | +The indices mean the same as in the optimal alignment of the `AFPChain`, just to |
| 138 | +remember them: |
| 139 | + |
| 140 | +* **chain**: chain or structure index. |
| 141 | +* **eqr**: EQR stands for equivalent residue position, i.e. the alignment |
| 142 | +position. There are as many positions (EQRs) in a block as the length of |
| 143 | +the alignment block, and their number is equal for any of the chains in |
| 144 | +the same block. |
| 145 | + |
| 146 | +As in `AFPChain`, each entry (combination of the two indices described above) |
| 147 | +is an Integer that corresponds to the residue index in the specified chain, i.e. |
| 148 | +the index in the Atom array of the chain. Caution has to be taken in the code, |
| 149 | +because a `MultipleAlignment` can contain gaps, which are represented as `null` |
| 150 | +in the List entries. |
| 151 | + |
| 152 | +### Alignment Scores |
| 153 | + |
| 154 | +All the objects in the hierarchy levels implement the `ScoresCache` interface. |
| 155 | +This interface allows the storage of any number of scores as a key:value set. |
| 156 | +The key is a `String` that describes the score and used to recover it after, |
| 157 | +and the value is a double with the calculated score. The interface has only |
| 158 | +two methods: putScore and getScore. |
| 159 | + |
| 160 | +The following lines of code are an example on how to do score manipulations |
| 161 | +on a `MultipleAlignment`: |
| 162 | + |
| 163 | +```java |
| 164 | + //Put a score into the alignment and get it back |
| 165 | + alignment.putScore('myRMSD', 1.234); |
| 166 | + double myRMSD = alignment.getScore('myRMSD'); |
| 167 | + |
| 168 | + BlockSet bs = alignment.getBlockSets().get(0); |
| 169 | + //The same can be done for BlockSets |
| 170 | + alignment.putScore('bsRMSD', 1.234); |
| 171 | + double bsRMSD = alignment.getScore('bsRMSD'); |
| 172 | +``` |
| 173 | + |
| 174 | +### Manipulating Multiple Alignments |
| 175 | + |
| 176 | +Some classes are designed to contain utility methods for manipulating a `MultipleAlignment` object. |
| 177 | +The most important ones are ennumerated and briefly described below: |
| 178 | + |
| 179 | +* [MultipleAlignmentScorer](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentScorer.html): contains frequent names for scores and methods to calculate them. |
| 180 | + |
| 181 | +* [MultipleAlignmentTools](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentTools.html): contains helper methods, such as sequence alignment calculation, transform atom arrays of the structures or calculate aligned residue distances between all structures. |
| 182 | + |
| 183 | +* [MultipleAlignmentWriter](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentWriter.html): contains methods to generate different types of String outputs of the alignment, e.g. FASTA, XML, FatCat. |
| 184 | + |
| 185 | +* [MultipleSuperimposer](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleSuperimposer.html): interface for implementations that calculate the structure superpositions of the alignment. Some examples of implementations are the ReferenceSuperimposer (superimposes all the structures to a reference) and the CoreSuperimposer (only uses EQRs present in all structures, without gaps, to superimpose them). |
| 186 | + |
| 187 | +* [MultipleAlignmentXMLParser](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/xml/MultipleAlignmentXMLParser.html): contains a method to create a `MultipleAlignment` object from an XML file representation. |
| 188 | + |
| 189 | +### Overview |
| 190 | + |
| 191 | +As an overview, the `MultipleAlignment` data model: |
| 192 | + |
| 193 | +* Supports any number of aligned structures, **multiple structures**. |
| 194 | +* Can support **flexible alignments** and **non-topological alignments**, |
| 195 | +and any of their combinatations (e.g. a flexible alignment with topological |
| 196 | +rearrangements). |
| 197 | +* Can not support **non-sequential alignments**, or they would require a new |
| 198 | +`Block` for each EQR, because sequentiality of the residues is a requirement |
| 199 | +for each `Block`. |
| 200 | +* Can store **any score** in any of the four object hierarchy level, making it |
| 201 | +easy to adapt to new requirements and algorithms. |
| 202 | + |
| 203 | +For more examples and information about the `MultipleAlignment` data structure |
| 204 | +go to the Demo package on the biojava-structure module or look through the interface |
| 205 | +files, where the javadoc explanations can be found. |
| 206 | + |
| 207 | +## Conversion between Data Models |
| 208 | + |
| 209 | +The conversion from an `AFPChain` to a `MultipleAlignment` is possible trough the |
| 210 | +ensemble constructor. An example on how to do it programatically is below: |
| 211 | + |
| 212 | +```java |
| 213 | + AFPChain afpChain; |
| 214 | + Atom[] chain1; |
| 215 | + Atom[] chain2; |
| 216 | + boolean flexible = false; |
| 217 | + MultipleAlignmentEnsemble ensemble = new MultipleAlignmentEnsemble(afpChain, chain1, chain2, false); |
| 218 | + MultipleAlignment converted = ensemble.getMultipleAlignments().get(0); |
| 219 | +``` |
| 220 | + |
| 221 | +There is no method to convert from a `MultipleAlignment` to an `AFPChain`, because |
| 222 | +the first representation supports any number of structures, while the second is |
| 223 | +only supporting pairwise alignments. However, the conversion can be done with some |
| 224 | +lines of code if needed (instantiate a new `AFPChain` and copy one by one the |
| 225 | +properties that can be represented from the `MultipleAlignment`. |
| 226 | + |
| 227 | +=== |
| 228 | + |
| 229 | +Go back to [Chapter 8 : Structure Alignments](alignment.md). |
0 commit comments