You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: structure/bioassembly.md
+120-1Lines changed: 120 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -72,7 +72,7 @@ Let's load both representations of hemoglobin PDB ID [1HHO](http://www.rcsb.org/
72
72
</tr>
73
73
</table>
74
74
75
-
As we can see, the two representations are quite different! When investigating protein interfaces, ligand binding and for many other applications, you always want to work with the biological assemblies!
75
+
As we can see, the two representations are quite different! When investigating protein interfaces, ligand binding and for many other applications, you always want to work with the biological assemblies.
76
76
77
77
Here another example, the bacteriophave GA protein capsid PDB ID [1GAV](http://www.rcsb.org/pdb/explore.do?structureId=1gav)
78
78
@@ -95,6 +95,125 @@ Here another example, the bacteriophave GA protein capsid PDB ID [1GAV](http://w
95
95
</tr>
96
96
</table>
97
97
98
+
## Re-creating Biological Assemblies
99
+
100
+
Since biological assemblies can be accessed via the StructureIO interface, in principle there is no need to access the lower-level code in BioJava that allows to re-create biological assemblies. If you are interested in looking at the gory details of this, here a couple of pointers into the code. In principle there are two ways for how to get to a biological assembly:
101
+
102
+
A) The biological assembly needs to be re-built and the atom coordinates of the asymmetric unit need to be rotated according to the instructions in the files. The information required to re-create the biological assemblies is available in both the PDB an mmCIF/PDBx files.
103
+
104
+
In PDB files the relevant transformations are stored in the *REMARK 350* records. For mmCIF/PDBx, the *_pdbx_struct_assembly* and *_pdbx_struct_oper_list* categories store the corresponding rules.
105
+
106
+
B) There is also a pre-computed file available that contains an assembled version of a structure. This file can be parsed directly, without having to perform rotation operations on coordinates.
107
+
108
+
BioJava contains utility classes to re-create biological assemblies for both PDB and mmCIF, as well as to parse the pre-computed file. The [BioUnitDataProvider](http://www.biojava.org/docs/api/org/biojava/bio/structure/quaternary/io/BioUnitDataProvider.html) interface defines what is required to re-build an assembly. The [BioUnitDataProviderFactory](http://www.biojava.org/docs/api/org/biojava/bio/structure/quaternary/io/BioUnitDataProviderFactory.html) allows to specify which of the BioUnitDataProviders is getting used.
109
+
110
+
Take a look at the method getBiologicalAssembly() in [StructureIO](http://www.biojava.org/docs/api/org/biojava/bio/structure/io/StructureIO.html) to see how the BioUnitDataProviders are used by the *BiologicalAssemblyBuilder*.
111
+
112
+
## Memory consumption
113
+
114
+
This next example loads the structure of the PBCV-1 virus capsid (PDB ID [1M4X](http://www.rcsb.org/pdb/explore.do?structureId=1m4x)). It has one of the largest, if not the largest biological assembly that is currently available in the PDB. It consists of 16 million atoms!
115
+
116
+
<table>
117
+
<tr>
118
+
<td>
119
+
<img src="img/1m4x_bio_r_250.jpg"/>
120
+
</td>
121
+
</tr>
122
+
<tr>
123
+
<td>
124
+
The biological assembly of the PBCV-1 virus capsid. (image source: <a href="http://www.rcsb.org/pdb/explore.do?structureId=1m4x">RCSB</a>)
125
+
</td>
126
+
</tr>
127
+
</table>
128
+
129
+
To load the pre-assembled biological assembly file directly, one can tweak the low-level PDB file parser like this
130
+
131
+
<pre>
132
+
public static void main(String[] args){
133
+
134
+
public static void main(String[] args){
135
+
136
+
// This loads the PBCV-1 virus capsid, one of, if not the biggest biological assembly in terms on nr. of atoms.
137
+
// The 1m4x.pdb1.gz file has 313 MB (compressed)
138
+
// This Structure requires about 8 GB of memory to be loaded in memory.
139
+
140
+
String pdbId = "1M4X";
141
+
142
+
Structure bigStructure = readStructure(pdbId,1);
143
+
144
+
// let's take a look how much memory this consumes currently
145
+
146
+
Runtime r = Runtime.getRuntime();
147
+
148
+
// let's try to trigger the Java Garbage collector
149
+
r.gc();
150
+
151
+
System.out.println("Memory consumption after " + pdbId +
152
+
" structure has been loaded into memory:");
153
+
154
+
String mem = String.format("Total %dMB, Used %dMB, Free %dMB, Max %dMB",
0 commit comments