|
| 1 | +Structure Alignment Data Model |
| 2 | +=== |
| 3 | + |
| 4 | +## AFPChain Data Model |
| 5 | + |
| 6 | +The `AFPChain` data structure was designed to store pairwise structural |
| 7 | +alignments. The class functions as a bean, and contains many variables |
| 8 | +used internally by the alignment algorithms implemented in biojava. |
| 9 | + |
| 10 | +### The Optimal Alignment |
| 11 | + |
| 12 | +The residue equivalencies of the alignment (EQRs) are described in the optimal |
| 13 | +alignment variable, a triple array of integers, where the indices stand for: |
| 14 | + |
| 15 | +```java |
| 16 | + int[][][] optAln = afpChain.getOptAln(); |
| 17 | + int residue = optAln[block][chain][eqr]; |
| 18 | +``` |
| 19 | + |
| 20 | +* **block**: the blocks divide the alignment into different parts. The |
| 21 | +division can be due to non-topological rearrangements (e.g. circular |
| 22 | +permutations) or due to flexible parts (e.g. domain switch). There can |
| 23 | +be any number of blocks in a structural alignment, defined by the structure |
| 24 | +alignment algorithm. |
| 25 | + |
| 26 | +* **chain**: in a pairwise alignment there are only two chains, or structures. |
| 27 | + |
| 28 | +* **eqr**: EQR stands for equivalent residue position, i.e. the alignment |
| 29 | +position. There are as many positions (EQRs) in a block as the length of |
| 30 | +the alignment block, and their number is equal for any of the two chains in |
| 31 | +the same block. |
| 32 | + |
| 33 | +In each entry (combination of the three indices described above) an integer |
| 34 | +is stored, which corresponds to the residue index in the specified chain, i.e. |
| 35 | +the index in the Atom array of the chain. In between the same block, the stored |
| 36 | +integers (residues) are always in increasing order. |
| 37 | + |
| 38 | +### Example |
| 39 | + |
| 40 | + |
| 41 | + |
| 42 | +### Overview |
| 43 | + |
| 44 | +As an overview, the `AFPChain` data model: |
| 45 | + |
| 46 | +* Only supports **pairwise alignments**, i.e. two chains or structures aligned. |
| 47 | +* Can support **flexible alignments** and **non-topological alignments**. |
| 48 | +However, their combinatation (a flexible alignment with topological rearrangements) |
| 49 | +can not be represented, because the blocks mean either one or the other. |
| 50 | +* Can not support **non-sequential alignments**, or they would require a new block |
| 51 | +for each EQR, because sequentiality of the residues is assumed inside each block. |
| 52 | + |
| 53 | +## MultipleAlignment Data Model |
| 54 | + |
| 55 | +Since BioJava 4.1.0, a new data model is available to store structure alignments. |
| 56 | +The `MultipleAlignment` data structure is a general model that supports any of the |
| 57 | +following properties, and any combination: |
| 58 | + |
| 59 | +* **Multiple structures**: the model is no longer restricted to pairwise alignments. |
| 60 | +* **Non-topological alignments**: such as circular permutations or domain rearrangements. |
| 61 | +* **Flexible alignments**: parts of the alignment with different superposition |
| 62 | +transformation. |
| 63 | + |
| 64 | +In addtition, the data structure is not limited in the number and types of scores |
| 65 | +it can store, because the scores are stored in a key:value fashion, as it will be |
| 66 | +described later. |
| 67 | + |
| 68 | +### Object Hierarchy |
| 69 | + |
| 70 | +The biggest difference with `AFPChain` is that the `MultipleAlignment` data |
| 71 | +structure is object oriented. |
| 72 | +The hierarchy of sub-objects is represented below: |
| 73 | + |
| 74 | +<pre> |
| 75 | +MultipleAlignmentEnsemble |
| 76 | + | |
| 77 | + MultipleAlignment(s) |
| 78 | + | |
| 79 | + BlockSet(s) |
| 80 | + | |
| 81 | + Block(s) |
| 82 | +</pre> |
| 83 | + |
| 84 | +* **MultipleAlignmentEnsemble**: the ensemble is the top level of the hierarchy. |
| 85 | +As a top level, it stores information regarding creation properties (algorithm, |
| 86 | +version, creation time, etc.) and the structures involved in the alignment (Atoms, |
| 87 | +structure identifiers, etc.). It contains a collection of `MultipleAlignment` that |
| 88 | +share the same properties stored in the ensemble. This construction allows the |
| 89 | +storage of alternative alignments inside the same data structure. |
| 90 | + |
| 91 | +* **MultipleAlignment**: the `MultipleAlignment` stores the core information of a |
| 92 | +multiple structure alignment. It is designed to be the return type of the multiple |
| 93 | +structure alignment algorithms. The object contains a collection of `BlockSet` and |
| 94 | +it is linked to its parent `MultipleAlignmentEnsemble`. |
| 95 | + |
| 96 | +* **BlockSet**: the `BlockSet` stores a flexible part of a multiple structure |
| 97 | +alignment. A flexible part needs the residue equivalencies involved, contained in |
| 98 | +a collection of `Block`, and a transformation matrix for every structure that |
| 99 | +describes the 3D superposition of all structures. It is linked to its parent |
| 100 | +`MultipleAlignment`. |
| 101 | + |
| 102 | +* **Block**: the `Block` stores the aligned positions (equivalent residues) of a |
| 103 | +`BlockSet` that are in sequentially increasing order. Each `Block` represents a |
| 104 | +sequential part of a non-topological alignment, if more than one `Block` is present. |
| 105 | +It is linked to its parent `BlockSet`. |
| 106 | + |
| 107 | +### The Optimal Alignment |
| 108 | + |
| 109 | +In the `MultipleAlignment` data structure the aligned residues are stored in a |
| 110 | +double List for every `Block`. The indices of the double List are the following: |
| 111 | + |
| 112 | +```java |
| 113 | + List<List<Integer>> optAln = block.getAlnRes(); |
| 114 | + Integer residue = optAln.get(chain).get(eqr); |
| 115 | +``` |
| 116 | + |
| 117 | +The indices mean the same as in the optimal alignment of the `AFPChain`, just to |
| 118 | +remember them: |
| 119 | + |
| 120 | +* **chain**: chain or structure index. |
| 121 | + |
| 122 | +* **eqr**: EQR stands for equivalent residue position, i.e. the alignment |
| 123 | +position. There are as many positions (EQRs) in a block as the length of |
| 124 | +the alignment block, and their number is equal for any of the chains in |
| 125 | +the same block. |
| 126 | + |
| 127 | +As in `AFPChain`, each entry (combination of the two indices described above) |
| 128 | +is an Integer that corresponds to the residue index in the specified chain, i.e. |
| 129 | +the index in the Atom array of the chain. Caution has to be taken in the code, |
| 130 | +because a `MultipleAlignment` can contain gaps, which are represented as `null` |
| 131 | +in the List entries. |
| 132 | + |
| 133 | +### Alignment Scores |
| 134 | + |
| 135 | + |
| 136 | + |
| 137 | +### Example |
| 138 | + |
| 139 | +### Overview |
| 140 | + |
| 141 | +## Conversion between Data Models |
| 142 | + |
| 143 | +The conversion from an `AFPChain` to a `MultipleAlignment` is possible trough the |
| 144 | +ensemble constructor. An example on how to do it programatically is below: |
| 145 | + |
| 146 | + |
| 147 | +There is no method to convert from a `MultipleAlignment` to an `AFPChain`, because |
| 148 | +the first representation supports any number of structures, while the second is |
| 149 | +only supporting pairwise alignments. However, the conversion can be done with some |
| 150 | +lines of code if needed (instantiate a new `AFPChain` and copy one by one the |
| 151 | +properties that can be represented from the `MultipleAlignment`. |
0 commit comments