You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ A brief introduction into [BioJava](https://github.com/biojava/biojava).
6
6
7
7
The goal of this tutorial is to provide an educational introduction into some of the features that are provided by BioJava.
8
8
9
-
At the moment this tutorial is still under development. Please check the [BioJava Cookbook](http://biojava.org/wiki/BioJava:CookBook3.0) for a more comprehensive collection of many examples of what is possible with BioJava and how to do things.
9
+
At the moment this tutorial is still under development. Please check the [BioJava Cookbook](http://biojava.org/wiki/BioJava:CookBook3.0) for a more comprehensive collection of examples about what is possible with BioJava and how to do things.
10
10
11
11
## Index
12
12
@@ -16,10 +16,9 @@ Book 1: [The Core module](core/README.md), basic working with sequences.
16
16
17
17
Book 2: [The Alignment module](alignment/README.md), pairwise and multiple alignments of protein sequences.
18
18
19
-
Book 3: [The Protein Structure modules](structure/README.md), everything related to working with 3D structures.
20
-
21
-
Book 4: [The Genomics Module](genomics/README.md), working with genomic data
19
+
Book 3: [The Structure modules](structure/README.md), everything related to working with 3D structures.
22
20
21
+
Book 4: [The Genomics Module](genomics/README.md), working with genomic data.
The biggest difference with `AFPChain` is that the `MultipleAlignment` data
@@ -167,8 +171,20 @@ on a `MultipleAlignment`:
167
171
double bsRMSD = alignment.getScore('bsRMSD');
168
172
```
169
173
170
-
Methods and names for some frequent scores are located in a util class called
171
-
`MultipleAlignmentScorer`.
174
+
### Manipulating Multiple Alignments
175
+
176
+
Some classes are designed to contain utility methods for manipulating a `MultipleAlignment` object.
177
+
The most important ones are ennumerated and briefly described below:
178
+
179
+
*[MultipleAlignmentScorer](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentScorer.html): contains frequent names for scores and methods to calculate them.
180
+
181
+
*[MultipleAlignmentTools](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentTools.html): contains helper methods, such as sequence alignment calculation, transform atom arrays of the structures or calculate aligned residue distances between all structures.
182
+
183
+
*[MultipleAlignmentWriter](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleAlignmentWriter.html): contains methods to generate different types of String outputs of the alignment, e.g. FASTA, XML, FatCat.
184
+
185
+
*[MultipleSuperimposer](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/util/MultipleSuperimposer.html): interface for implementations that calculate the structure superpositions of the alignment. Some examples of implementations are the ReferenceSuperimposer (superimposes all the structures to a reference) and the CoreSuperimposer (only uses EQRs present in all structures, without gaps, to superimpose them).
186
+
187
+
*[MultipleAlignmentXMLParser](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/xml/MultipleAlignmentXMLParser.html): contains a method to create a `MultipleAlignment` object from an XML file representation.
172
188
173
189
### Overview
174
190
@@ -207,3 +223,7 @@ the first representation supports any number of structures, while the second is
207
223
only supporting pairwise alignments. However, the conversion can be done with some
208
224
lines of code if needed (instantiate a new `AFPChain` and copy one by one the
209
225
properties that can be represented from the `MultipleAlignment`.
226
+
227
+
===
228
+
229
+
Go back to [Chapter 8 : Structure Alignments](alignment.md).
A **structural alignment** attempts to establish equivalences between two or more polymer structures based on their shape and three-dimensional conformation. In contrast to simple structural superposition (see below), where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions.
7
-
8
-
**Structural alignment** is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. **Structural alignment** can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be exercised when using the results as evidence for shared evolutionary ancestry, because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.
9
-
10
-
**Structural alignment** of other biological structures can also be made in BioJava. For example, nucleic acids can
11
-
be structurally aligned to find common structural motifs, independent of sequence simililarity. This is specially
12
-
important for RNAs, because their 3D structure arrangement is important for their function.
6
+
A **structural alignment** attempts to establish equivalences between two or
7
+
more polymer structures based on their shape and three-dimensional conformation.
8
+
In contrast to simple structural superposition (see below), where at least some
9
+
equivalent residues of the two structures are known, structural alignment requires
10
+
no a priori knowledge of equivalent positions.
11
+
12
+
A **structural alignment** is a valuable tool for the comparison of proteins with
13
+
low sequence similarity, where evolutionary relationships between proteins cannot
14
+
be easily detected by standard sequence alignment techniques. Therefore, a
15
+
**structural alignment** can be used to imply evolutionary relationships between
16
+
proteins that share very little common sequence. However, caution should be exercised
17
+
when using the results as evidence for shared evolutionary ancestry, because of the
18
+
possible confounding effects of convergent evolution by which multiple unrelated amino
19
+
acid sequences converge on a common tertiary structure.
20
+
21
+
A **structural alignment** of other biological polymers can also be made in BioJava.
22
+
For example, nucleic acids can be structurally aligned to find common structural motifs,
23
+
independent of sequence simililarity. This is specially important for RNAs, because their
24
+
3D structure arrangement is important for their function.
13
25
14
26
For more info see the Wikipedia article on [structure alignment](http://en.wikipedia.org/wiki/Structural_alignment).
15
27
16
28
## Alignment Algorithms supported by BioJava
17
29
18
30
BioJava comes with a number of algorithms for aligning structures. The following
19
31
five options are displayed by default in the graphical user interface (GUI),
20
-
although others can be accessed programmatically using the methods in
Explore the coloring options in the *Edit* menu, and through the *View* menu for
114
+
alternative representations of the alignment.
115
+
116
+
The functionality to perform and visualize these alignments can also be
117
+
used from your own code. Let's first have a look at the alignment algorithms.
66
118
67
119
## Pairwise Alignment Algorithms
68
120
@@ -175,9 +227,33 @@ interface.
175
227
176
228
## Multiple Structure Alignment
177
229
178
-
Since BioJava 4.1.0, multiple structure alignments can be generated.
230
+
This Java implementation for multiple structure alignments, named MultipleMC, is based on the original CE-MC implementation by [Guda C, Scheeff ED, Bourne PE & Shindyalov IN in 2001](http://psb.stanford.edu/psb-online/proceedings/psb01/abstracts/p275.html)
The idea remains unchanged: perform **all-to-all pairwise alignments** of the structures, choose the
234
+
**reference** as the most similar structure to all others and run a **Monte Carlo optimization** of
235
+
the multiple residue equivalencies (EQRs) to minimize a score function that depends on the inter-residue
236
+
distances.
237
+
238
+
Although the main idea is the same as in the original algorithm, some details of the implementation have
239
+
been changed in the BioJava version. They are described in the main class, but as a summary:
240
+
241
+
1. It accepts **any pairwise alignment** algorithm (instead of being attached to CE), so any
242
+
of the algorithms described before is suitable for generating a seed for optimization. Note that
243
+
this property allows *non-topological* and *flexible* multiple structure alignments, always restricted
244
+
by the pairwise alignment algorithm limitations.
245
+
2. The **moves** in the Monte Carlo optimization have been simplified to 3, instead of 4.
246
+
3. A **new move** to insert and delete individual gaps has been added.
247
+
4. The scoring function has been modified to a **continuous** function, maintaining the properties that the authors described.
248
+
5. The **probability function** is normalized in synchronization with the optimization progression, to improce the convergence into a score maximum after some random exploration of the multidimensiona space.
249
+
250
+
The algorithm performs similarly to other multiple structure alignment algorithms for most protein families.
251
+
The parameters both for the pairwise aligner and the MC optimization can have an impact on the final result. There is not a unique set of parameters, because they usually depend on the specific case. Thus, trying some parameter combinations, keeping in mind the effect they produce in the score function, is a good practice when doing structure alignments.
0 commit comments