Skip to content

Commit b6b0a01

Browse files
committed
Description of the MultipleMC algorithm
1 parent b1b8204 commit b6b0a01

1 file changed

Lines changed: 64 additions & 11 deletions

File tree

structure/alignment.md

Lines changed: 64 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Structure Alignment
1+
Structure Alignments
22
===========================
33

44
## What is a Structure Alignment?
@@ -227,9 +227,33 @@ interface.
227227

228228
## Multiple Structure Alignment
229229

230-
Since BioJava 4.1.0, multiple structure alignments can be generated.
230+
This Java implementation for multiple structure alignments, named MultipleMC, is based on the original CE-MC implementation by [Guda C, Scheeff ED, Bourne PE & Shindyalov IN in 2001](http://psb.stanford.edu/psb-online/proceedings/psb01/abstracts/p275.html)
231+
[![pubmed](http://img.shields.io/badge/in-pubmed-blue.svg?style=flat)](http://www.ncbi.nlm.nih.gov/pubmed/11262947).
231232

232-
## PDB-wide database searches
233+
The idea remains unchanged: perform **all-to-all pairwise alignments** of the structures, choose the
234+
**reference** as the most similar structure to all others and run a **Monte Carlo optimization** of
235+
the multiple residue equivalencies (EQRs) to minimize a score function that depends on the inter-residue
236+
distances.
237+
238+
Although the main idea is the same as in the original algorithm, some details of the implementation have
239+
been changed in the BioJava version. They are described in the main class, but as a summary:
240+
241+
1. It accepts **any pairwise alignment** algorithm (instead of being attached to CE), so any
242+
of the algorithms described before is suitable for generating a seed for optimization. Note that
243+
this property allows *non-topological* and *flexible* multiple structure alignments, always restricted
244+
by the pairwise alignment algorithm limitations.
245+
2. The **moves** in the Monte Carlo optimization have been simplified to 3, instead of 4.
246+
3. A **new move** to insert and delete individual gaps has been added.
247+
4. The scoring function has been modified to a **continuous** function, maintaining the properties that the authors described.
248+
5. The **probability function** is normalized in synchronization with the optimization progression, to improce the convergence into a score maximum after some random exploration of the multidimensiona space.
249+
250+
The algorithm performs similarly to other multiple structure alignment algorithms for most protein families.
251+
The parameters both for the pairwise aligner and the MC optimization can have an impact on the final result. There is not a unique set of parameters, because they usually depend on the specific case. Thus, trying some parameter combinations, keeping in mind the effect they produce in the score function, is a good practice when doing structure alignments.
252+
253+
BioJava class: [org.biojava.nbio.structure.align.multiple.mc.MultipleMcMain]
254+
(www.biojava.org/docs/api/org/biojava/nbio/structure/align/multiple/mc/MultipleMcMain.html)
255+
256+
## PDB-wide Database Searches
233257

234258
The Alignment GUI also provides functionality for PDB-wide structural searches.
235259
This systematically compares a structure against a non-redundant set of all
@@ -265,10 +289,10 @@ the `PDB_DIR` environmental variable. This operation sped up the search from
265289
about 30 hours to less than 4 hours.
266290

267291

268-
## Creating alignments programmatically
292+
## Creating Alignments Programmatically
269293

270-
The various structure alignment algorithms in BioJava implement the
271-
`StructureAlignment` interface, and are normally accessed through
294+
The **pairwise structure alignment** algorithms in BioJava implement the
295+
`StructureAlignment` interface, and are usually accessed through
272296
`StructureAlignmentFactory`. Here's an example of how to create a CE-CP
273297
alignment and print some information about it.
274298

@@ -294,13 +318,43 @@ To display the alignment using Jmol, use:
294318

295319
```java
296320
GuiWrapper.display(afpChain, ca1, ca2);
297-
// Or StructureAlignmentDisplay.display(afpChain, ca1, ca2);
321+
// Or using the biojava-structure-gui module
322+
StructureAlignmentDisplay.display(afpChain, ca1, ca2);
298323
```
299324

300325
Note that these require that you include the structure-gui package and the Jmol
301326
binary in the classpath at runtime.
302327

303-
## Command-line tools
328+
For creating **multiple structure alignments**, the code is a little bit different, because the
329+
returned data structure and the number of input structures are different. Here is an
330+
example of how to create and display a multiple alignment:
331+
332+
```java
333+
//Specify the structures to align: some ASP-proteinases
334+
List<String> names = Arrays.asList("3app", "4ape", "5pep", "1psn", "4cms", "1bbs.A", "1smr.A");
335+
336+
//Load the CA atoms of the structures
337+
AtomCache cache = new AtomCache();
338+
List<Atom[]> atomArrays = new ArrayList<Atom[]>();
339+
for (String name:names) {
340+
atomArrays.add(cache.getAtoms(name));
341+
}
342+
343+
//Generate the multiple alignment algorithm with the chosen pairwise algorithm
344+
StructureAlignment pairwise = StructureAlignmentFactory.getAlgorithm(CeMain.algorithmName);
345+
MultipleMcMain multiple = new MultipleMcMain(pairwise);
346+
347+
//Perform the alignment
348+
MultipleAlignment result = algorithm.align(atomArrays);
349+
350+
//Output the FASTA sequence alignment
351+
System.out.println(MultipleAlignmentWriter.toFASTA(result));
352+
353+
//Display the results in a 3D view
354+
MultipleAlignmentDisplay.display(result);
355+
```
356+
357+
## Command-Line Tools
304358

305359
Many of the alignment algorithms are available in the form of command line
306360
tools. These can be accessed through the main methods of the StructureAlignment
@@ -317,8 +371,7 @@ alignments in batch mode, or full database searches. Some additional parameters
317371
are available which are not exposed in the GUI, such as outputting results to a
318372
file in various formats.
319373

320-
321-
## See Also
374+
## Alignment Data Model
322375

323376
For details about the structure alignment data models in biojava, see [Structure Alignment Data Model](alignment-data-model.md)
324377

@@ -332,7 +385,7 @@ Thanks to P. Bourne, Yuzhen Ye and A. Godzik for granting permission to freely u
332385

333386
Navigation:
334387
[Home](../README.md)
335-
| [Book 3: The Protein Structure modules](README.md)
388+
| [Book 3: The Structure modules](README.md)
336389
| Chapter 8 : Structure Alignments
337390

338391
Prev: [Chapter 7 : SEQRES and ATOM records](seqres.md)

0 commit comments

Comments
 (0)