Skip to content

Commit 4e40f5e

Browse files
committed
Adding Command-line tools section
Also moved up the database search section & fixed some spellings.
1 parent 051135a commit 4e40f5e

1 file changed

Lines changed: 62 additions & 43 deletions

File tree

structure/alignment.md

Lines changed: 62 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -28,25 +28,30 @@ in 3D. See below for descriptions of the algorithms.
2828

2929
## Alignment User Interface
3030

31-
Before going the details how to use the algorithms programmatically, let's take a look at the user interface that cames with the *biojava-structure-gui* module.
31+
Before going the details how to use the algorithms programmatically, let's take
32+
a look at the user interface that cames with the *biojava-structure-gui* module.
3233

33-
<pre>
34-
AlignmentGui.getInstance();
35-
</pre>
34+
```java
35+
AlignmentGui.getInstance();
36+
```
3637

37-
shows the following user interface.
38+
This code shows the following user interface:
3839

3940
![Alignment GUI](img/alignment_gui.png)
4041

41-
You can manually select protein chains, domains, or custom files to be aligned. Try to align 2hyn vs. 1zll. This will show the results in a graphical way, in 3D:
42+
You can manually select protein chains, domains, or custom files to be aligned.
43+
Try to align 2hyn vs. 1zll. This will show the results in a graphical way, in
44+
3D:
4245

4346
![3D Alignment of PDB IDs 2hyn and 1zll](img/2hyn_1zll.png)
4447

4548
and also a 2D display, that interacts with the 3D display
4649

4750
![2D Alignment of PDB IDs 2hyn and 1zll](img/alignmentpanel.png)
4851

49-
The functionality to perform and visualize these alignments can of course be used also from your own code. Let's first have a look at the alignment algorithms:
52+
The functionality to perform and visualize these alignments can of course be
53+
used also from your own code. Let's first have a look at the alignment
54+
algorithms.
5055

5156
## The Alignment Algorithms
5257

@@ -60,7 +65,7 @@ structure, and then combining those to try to align the most residues possible
6065
while keeping the overall RMSD of the superposition low.
6166

6267
CE is a rigid-body alignment algorithm, which means that the structures being
63-
compared are kept fixed during superpositon. In some cases it may be desirable
68+
compared are kept fixed during superposition. In some cases it may be desirable
6469
to break large proteins up into domains prior to aligning them (by manually
6570
inputing a subrange, using the [SCOP or CATH databases](externaldb.md), or by
6671
decomposing the protein automatically using the [Protein Domain
@@ -77,7 +82,7 @@ related by a circular permutation, the N-terminal part of one protein is related
7782
to the C-terminal part of the other, and vice versa. CE-CP allows circularly
7883
permuted proteins to be compared. For more information on circular
7984
permutations, see the
80-
[wikipedia](http://en.wikipedia.org/wiki/Circular_permutation_in_proteins) or
85+
[Wikipedia](http://en.wikipedia.org/wiki/Circular_permutation_in_proteins) or
8186
[Molecule of the
8287
Month](http://www.pdb.org/pdb/101/motm.do?momID=124&evtc=Suggest&evta=Moleculeof%20the%20Month&evtl=TopBar)
8388
articles.
@@ -140,7 +145,7 @@ The following methods are not presented in the user interface by default:
140145

141146
* [BioJavaStructureAlignment](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/BioJavaStructureAlignment.html)
142147
A structure-based alignment method able of returning multiple alternate
143-
alignments. It was writen by Andreas Prlic and based on the PSC++ algorithm
148+
alignments. It was written by Andreas Prli&#263; and based on the PSC++ algorithm
144149
provided by Peter Lackner.
145150
* [CeSideChainMain](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/ce/CeSideChainMain.html)
146151
A variant of CE using CB-CB distances, which sometimes improves alignments in
@@ -152,6 +157,40 @@ Additional methods can be added by implementing the
152157
[StructureAlignment](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/StructureAlignment.html)
153158
interface.
154159

160+
## PDB-wide database searches
161+
162+
The Alignment GUI also provides functionality for PDB-wide structural searches.
163+
This systematically compares a structure against a non-redundant set of all
164+
other structures in the PDB at either a chain or a domain level. Representatives
165+
are selected using the RCSB's clustering of proteins with 40% sequence identity,
166+
as described
167+
[here](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp).
168+
Domains are selected using either SCOP (when available) or the
169+
ProteinDomainParser algorithm.
170+
171+
![Database Search GUI](img/database_search.png)
172+
173+
To perform a database search, select the 'Database Search' tab, then choose a
174+
query structure based on PDB ID, SCOP domain id, or from a custom file. The
175+
output directory will be used to store results. These consist of individual
176+
alignments in compressed XML format, as well as a tab-delimited file of
177+
similarity scores and statistics. The statistics are displayed in an interactive
178+
results table, which allows the alignments to be sorted. The 'Align' column
179+
allows individual alignments to be visualized with the alignment GUI.
180+
181+
![Database Search Results](img/database_search_results.png)
182+
183+
Be aware that this process can be very time consuming. Before
184+
starting a manual search, it is worth considering whether a pre-computed result
185+
may be available online, for instance for
186+
[FATCAT-rigid](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp)
187+
or [DALI](http://ekhidna.biocenter.helsinki.fi/dali/start). For custom files or
188+
specific domains, a few optimizations can reduce the time for a database search.
189+
Downloading PDB files is a considerable bottleneck. This can be solved by
190+
downloading all PDB files from the [FTP
191+
server](ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/) and setting
192+
the `PDB_DIR` environmental variable. This operation sped up the search from
193+
about 30 hours to less than 4 hours.
155194

156195

157196
## Creating alignments programmatically
@@ -186,45 +225,25 @@ GuiWrapper.display(afpChain, ca1, ca2);
186225
// Or StructureAlignmentDisplay.display(afpChain, ca1, ca2);
187226
```
188227

189-
Note that these require that you include the structure-gui package and the jmol
228+
Note that these require that you include the structure-gui package and the jMol
190229
binary in the classpath at runtime.
191230

192231
## Command-line tools
193232

194-
## PDB-wide database searches
195-
196-
The Alignment GUI also provides functionality for PDB-wide structural searches.
197-
This systematically compares a structure against a non-redundant set of all
198-
other structures in the PDB at either a chain or a domain level. Representatives
199-
are selected using the RCSB's clustering of proteins with 40% sequence identity,
200-
as described
201-
[here](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp).
202-
Domains are selected using either SCOP (when available) or the
203-
ProteinDomainParser algorithm.
204-
205-
![Database Search GUI](img/database_search.png)
206-
207-
To perform a database search, select the 'Database Search' tab, then choose a
208-
query structure based on PDB ID, SCOP domain id, or from a custom file. The
209-
output directory will be used to store results. These consist of individual
210-
alignments in compressed XML format, as well as a tab-delimited file of
211-
similarity scores and statistics. The statistics are displayed in an interactive
212-
results table, which allows the alignments to be sorted. The 'Align' column
213-
allows individual alignments to be visualized with the alignment GUI.
233+
Many of the alignment algorithms are available in the form of command line
234+
tools. These can be accessed through the main methods of the StructureAlignment
235+
classes. Tar bundles are also available with scripts for running
236+
[CE and FATCAT](http://source.rcsb.org/jfatcatserver/download.jsp).
214237

215-
![Database Search Results](img/database_search_results.png)
238+
Example:
239+
```bash
240+
runCE.sh -pdb1 4hhb.A -pdb2 4hhb.B -show3d
241+
```
216242

217-
Be aware that this process can be very time consuming. Before
218-
starting a manual search, it is worth considering whether a pre-computed result
219-
may be available online, for instance for
220-
[FATCAT-rigid](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp)
221-
or [DALI](http://ekhidna.biocenter.helsinki.fi/dali/start). For custom files or
222-
specific domains, a few optimizations can reduce the time for a database search.
223-
Downloading PDB files is a considerable bottleneck. This can be solved by
224-
downloading all PDB files from the [FTP
225-
server](ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/) and setting
226-
the `PDB_DIR` environmental variable. This operation sped up the search from
227-
about 30 hours to less than 4 hours.
243+
Using the command line tool it is possible to run pairwise alignments, several
244+
alignments in batch mode, or full database searches. Some additional parameters
245+
are available which are not exposed in the GUI, such as outputting results to a
246+
file in various formats.
228247

229248

230249
## Acknowledgements

0 commit comments

Comments
 (0)