Skip to content

Commit 6916925

Browse files
author
Andreas Prlic
committed
small improvements
1 parent 0814e48 commit 6916925

2 files changed

Lines changed: 120 additions & 1 deletion

File tree

structure/bioassembly.md

Lines changed: 120 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ Let's load both representations of hemoglobin PDB ID [1HHO](http://www.rcsb.org/
7272
</tr>
7373
</table>
7474

75-
As we can see, the two representations are quite different! When investigating protein interfaces, ligand binding and for many other applications, you always want to work with the biological assemblies!
75+
As we can see, the two representations are quite different! When investigating protein interfaces, ligand binding and for many other applications, you always want to work with the biological assemblies.
7676

7777
Here another example, the bacteriophave GA protein capsid PDB ID [1GAV](http://www.rcsb.org/pdb/explore.do?structureId=1gav)
7878

@@ -95,6 +95,125 @@ Here another example, the bacteriophave GA protein capsid PDB ID [1GAV](http://w
9595
</tr>
9696
</table>
9797

98+
## Re-creating Biological Assemblies
99+
100+
Since biological assemblies can be accessed via the StructureIO interface, in principle there is no need to access the lower-level code in BioJava that allows to re-create biological assemblies. If you are interested in looking at the gory details of this, here a couple of pointers into the code. In principle there are two ways for how to get to a biological assembly:
101+
102+
A) The biological assembly needs to be re-built and the atom coordinates of the asymmetric unit need to be rotated according to the instructions in the files. The information required to re-create the biological assemblies is available in both the PDB an mmCIF/PDBx files.
103+
104+
In PDB files the relevant transformations are stored in the *REMARK 350* records. For mmCIF/PDBx, the *_pdbx_struct_assembly* and *_pdbx_struct_oper_list* categories store the corresponding rules.
105+
106+
B) There is also a pre-computed file available that contains an assembled version of a structure. This file can be parsed directly, without having to perform rotation operations on coordinates.
107+
108+
BioJava contains utility classes to re-create biological assemblies for both PDB and mmCIF, as well as to parse the pre-computed file. The [BioUnitDataProvider](http://www.biojava.org/docs/api/org/biojava/bio/structure/quaternary/io/BioUnitDataProvider.html) interface defines what is required to re-build an assembly. The [BioUnitDataProviderFactory](http://www.biojava.org/docs/api/org/biojava/bio/structure/quaternary/io/BioUnitDataProviderFactory.html) allows to specify which of the BioUnitDataProviders is getting used.
109+
110+
Take a look at the method getBiologicalAssembly() in [StructureIO](http://www.biojava.org/docs/api/org/biojava/bio/structure/io/StructureIO.html) to see how the BioUnitDataProviders are used by the *BiologicalAssemblyBuilder*.
111+
112+
## Memory consumption
113+
114+
This next example loads the structure of the PBCV-1 virus capsid (PDB ID [1M4X](http://www.rcsb.org/pdb/explore.do?structureId=1m4x)). It has one of the largest, if not the largest biological assembly that is currently available in the PDB. It consists of 16 million atoms!
115+
116+
<table>
117+
<tr>
118+
<td>
119+
<img src="img/1m4x_bio_r_250.jpg"/>
120+
</td>
121+
</tr>
122+
<tr>
123+
<td>
124+
The biological assembly of the PBCV-1 virus capsid. (image source: <a href="http://www.rcsb.org/pdb/explore.do?structureId=1m4x">RCSB</a>)
125+
</td>
126+
</tr>
127+
</table>
128+
129+
To load the pre-assembled biological assembly file directly, one can tweak the low-level PDB file parser like this
130+
131+
<pre>
132+
public static void main(String[] args){
133+
134+
public static void main(String[] args){
135+
136+
// This loads the PBCV-1 virus capsid, one of, if not the biggest biological assembly in terms on nr. of atoms.
137+
// The 1m4x.pdb1.gz file has 313 MB (compressed)
138+
// This Structure requires about 8 GB of memory to be loaded in memory.
139+
140+
String pdbId = "1M4X";
141+
142+
Structure bigStructure = readStructure(pdbId,1);
143+
144+
// let's take a look how much memory this consumes currently
145+
146+
Runtime r = Runtime.getRuntime();
147+
148+
// let's try to trigger the Java Garbage collector
149+
r.gc();
150+
151+
System.out.println("Memory consumption after " + pdbId +
152+
" structure has been loaded into memory:");
153+
154+
String mem = String.format("Total %dMB, Used %dMB, Free %dMB, Max %dMB",
155+
r.totalMemory() / 1048576,
156+
(r.totalMemory() - r.freeMemory()) / 1048576,
157+
r.freeMemory() / 1048576,
158+
r.maxMemory() / 1048576);
159+
160+
System.out.println(mem);
161+
162+
System.out.println("# atoms: " + StructureTools.getNrAtoms(bigStructure));
163+
164+
}
165+
/** Load a specific biological assembly for a PDB entry
166+
*
167+
* @param pdbId .. the PDB ID
168+
* @param bioAssemblyId .. the first assembly has the bioAssemblyId 1
169+
* @return a Structure object or null if something went wrong.
170+
*/
171+
public static Structure readStructure(String pdbId, int bioAssemblyId) {
172+
173+
// pre-computed files use lower case PDB IDs
174+
pdbId = pdbId.toLowerCase();
175+
176+
// we need to tweak the FileParsing parameters a bit
177+
FileParsingParameters p = new FileParsingParameters();
178+
179+
// some bio assemblies are large, we want an all atom representation and avoid
180+
// switching to a Calpha-only representation for large molecules
181+
// note, this requires several GB of memory for some of the largest assemblies, such a 1MX4
182+
p.setAtomCaThreshold(Integer.MAX_VALUE);
183+
184+
// parse remark 350
185+
p.setParseBioAssembly(true);
186+
187+
// The low level PDB file parser
188+
PDBFileReader pdbreader = new PDBFileReader();
189+
190+
// we just need this to track where to store PDB files
191+
// this checks the PDB_DIR property (and uses a tmp location if not set)
192+
AtomCache cache = new AtomCache();
193+
pdbreader.setPath(cache.getPath());
194+
195+
pdbreader.setFileParsingParameters(p);
196+
197+
// download missing files
198+
pdbreader.setAutoFetch(true);
199+
200+
pdbreader.setBioAssemblyId(bioAssemblyId);
201+
pdbreader.setBioAssemblyFallback(false);
202+
203+
Structure structure = null;
204+
try {
205+
structure = pdbreader.getStructureById(pdbId);
206+
if ( bioAssemblyId > 0 )
207+
structure.setBiologicalAssembly(true);
208+
structure.setPDBCode(pdbId);
209+
} catch (Exception e){
210+
e.printStackTrace();
211+
return null;
212+
}
213+
return structure;
214+
}
215+
</pre>
216+
98217

99218
## Further Reading
100219

structure/img/1m4x_bio_r_250.jpg

21.1 KB
Loading

0 commit comments

Comments
 (0)