Skip to content

Commit f095f28

Browse files
committed
Explain MultipleAlignment object hierarchy
1 parent 30a0202 commit f095f28

2 files changed

Lines changed: 151 additions & 73 deletions

File tree

structure/alignment-data-model.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
Structure Alignment Data Model
2+
===
3+
4+
## AFPChain Data Model
5+
6+
The `AFPChain` data structure was designed to store pairwise structural
7+
alignments. The class functions as a bean, and contains many variables
8+
used internally by the alignment algorithms implemented in biojava.
9+
10+
### The Optimal Alignment
11+
12+
The residue equivalencies of the alignment (EQRs) are described in the optimal
13+
alignment variable, a triple array of integers, where the indices stand for:
14+
15+
```java
16+
int[][][] optAln = afpChain.getOptAln();
17+
int residue = optAln[block][chain][eqr];
18+
```
19+
20+
* **block**: the blocks divide the alignment into different parts. The
21+
division can be due to non-topological rearrangements (e.g. circular
22+
permutations) or due to flexible parts (e.g. domain switch). There can
23+
be any number of blocks in a structural alignment, defined by the structure
24+
alignment algorithm.
25+
26+
* **chain**: in a pairwise alignment there are only two chains, or structures.
27+
28+
* **eqr**: EQR stands for equivalent residue position, i.e. the alignment
29+
position. There are as many positions (EQRs) in a block as the length of
30+
the alignment block, and their number is equal for any of the two chains in
31+
the same block.
32+
33+
In each entry (combination of the three indices described above) an integer
34+
is stored, which corresponds to the residue index in the specified chain, i.e.
35+
the index in the Atom array of the chain. In between the same block, the stored
36+
integers (residues) are always in increasing order.
37+
38+
### Example
39+
40+
41+
42+
### Overview
43+
44+
As an overview, the `AFPChain` data model:
45+
46+
* Only supports **pairwise alignments**, i.e. two chains or structures aligned.
47+
* Can support **flexible alignments** and **non-topological alignments**.
48+
However, their combinatation (a flexible alignment with topological rearrangements)
49+
can not be represented, because the blocks mean either one or the other.
50+
* Can not support **non-sequential alignments**, or they would require a new block
51+
for each EQR, because sequentiality of the residues is assumed inside each block.
52+
53+
## MultipleAlignment Data Model
54+
55+
Since BioJava 4.1.0, a new data model is available to store structure alignments.
56+
The `MultipleAlignment` data structure is a general model that supports any of the
57+
following properties, and any combination:
58+
59+
* **Multiple structures**: the model is no longer restricted to pairwise alignments.
60+
* **Non-topological alignments**: such as circular permutations or domain rearrangements.
61+
* **Flexible alignments**: parts of the alignment with different superposition
62+
transformation.
63+
64+
In addtition, the data structure is not limited in the number and types of scores
65+
it can store, because the scores are stored in a key:value fashion, as it will be
66+
described later.
67+
68+
### Object Hierarchy
69+
70+
The biggest difference with `AFPChain` is that the `MultipleAlignment` data
71+
structure is object oriented.
72+
The hierarchy of sub-objects is represented below:
73+
74+
<pre>
75+
MultipleAlignmentEnsemble
76+
|
77+
MultipleAlignment(s)
78+
|
79+
BlockSet(s)
80+
|
81+
Block(s)
82+
</pre>
83+
84+
* **MultipleAlignmentEnsemble**: the ensemble is the top level of the hierarchy.
85+
As a top level, it stores information regarding creation properties (algorithm,
86+
version, creation time, etc.) and the structures involved in the alignment (Atoms,
87+
structure identifiers, etc.). It contains a collection of `MultipleAlignment` that
88+
share the same properties stored in the ensemble. This construction allows the
89+
storage of alternative alignments inside the same data structure.
90+
91+
* **MultipleAlignment**: the `MultipleAlignment` stores the core information of a
92+
multiple structure alignment. It is designed to be the return type of the multiple
93+
structure alignment algorithms. The object contains a collection of `BlockSet` and
94+
it is linked to its parent `MultipleAlignmentEnsemble`.
95+
96+
* **BlockSet**: the `BlockSet` stores a flexible part of a multiple structure
97+
alignment. A flexible part needs the residue equivalencies involved, contained in
98+
a collection of `Block`, and a transformation matrix for every structure that
99+
describes the 3D superposition of all structures. It is linked to its parent
100+
`MultipleAlignment`.
101+
102+
* **Block**: the `Block` stores the aligned positions (equivalent residues) of a
103+
`BlockSet` that are in sequentially increasing order. Each `Block` represents a
104+
sequential part of a non-topological alignment, if more than one `Block` is present.
105+
It is linked to its parent `BlockSet`.
106+
107+
### The Optimal Alignment
108+
109+
In the `MultipleAlignment` data structure the aligned residues are stored in a
110+
double List for every `Block`. The indices of the double List are the following:
111+
112+
```java
113+
List<List<Integer>> optAln = block.getAlnRes();
114+
Integer residue = optAln.get(chain).get(eqr);
115+
```
116+
117+
The indices mean the same as in the optimal alignment of the `AFPChain`, just to
118+
remember them:
119+
120+
* **chain**: chain or structure index.
121+
122+
* **eqr**: EQR stands for equivalent residue position, i.e. the alignment
123+
position. There are as many positions (EQRs) in a block as the length of
124+
the alignment block, and their number is equal for any of the chains in
125+
the same block.
126+
127+
As in `AFPChain`, each entry (combination of the two indices described above)
128+
is an Integer that corresponds to the residue index in the specified chain, i.e.
129+
the index in the Atom array of the chain. Caution has to be taken in the code,
130+
because a `MultipleAlignment` can contain gaps, which are represented as `null`
131+
in the List entries.
132+
133+
### Alignment Scores
134+
135+
136+
137+
### Example
138+
139+
### Overview
140+
141+
## Conversion between Data Models
142+
143+
The conversion from an `AFPChain` to a `MultipleAlignment` is possible trough the
144+
ensemble constructor. An example on how to do it programatically is below:
145+
146+
147+
There is no method to convert from a `MultipleAlignment` to an `AFPChain`, because
148+
the first representation supports any number of structures, while the second is
149+
only supporting pairwise alignments. However, the conversion can be done with some
150+
lines of code if needed (instantiate a new `AFPChain` and copy one by one the
151+
properties that can be represented from the `MultipleAlignment`.

structure/alignmentcode.md

Lines changed: 0 additions & 73 deletions
This file was deleted.

0 commit comments

Comments
 (0)