-
Notifications
You must be signed in to change notification settings - Fork 395
Fix #703: Recover from empty structure files in PDB_CACHE_DIR #774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
5c1a335
4785d57
c3cec91
b207d34
6ea9ae7
62fb2a4
f5b325f
13f267c
c1c3c30
0c10a2e
08ab3e1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
The main change hasn't been implemented, so we want tests to fail. However, the tests exposed some NPE and IO exceptions. These are now fixed, so the tests fail in the expected manner. - Use ATP ligand, which is not covered by the ReducedChemCompProvider - Use the ReducedChemCompProvider as a fallback consistently, preventing null chemComp - Defensive parsing in SimpleMMcifConsumer - Robust test, doesn't require internet!
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -85,6 +85,8 @@ public class DownloadChemCompProvider implements ChemCompProvider { | |
| protectedIDs.add("AUX"); | ||
| protectedIDs.add("NUL"); | ||
| } | ||
|
|
||
| private static ChemCompProvider fallback = null; // Fallback provider if the download fails | ||
|
|
||
| /** by default we will download only some of the files. User has to request that all files should be downloaded... | ||
| * | ||
|
|
@@ -272,7 +274,10 @@ public ChemComp getChemComp(String recordName) { | |
|
|
||
| ChemComp chemComp = dict.getChemComp(recordName); | ||
|
|
||
| return chemComp; | ||
| // May be null if the file was corrupt. Fall back on ReducedChemCompProvider in that case | ||
| if(chemComp != null) { | ||
| return chemComp; | ||
| } | ||
|
|
||
| } catch (IOException e) { | ||
|
|
||
|
|
@@ -296,9 +301,11 @@ public ChemComp getChemComp(String recordName) { | |
|
|
||
| // see https://github.com/biojava/biojava/issues/315 | ||
| // probably a network error happened. Try to use the ReducedChemCOmpProvider | ||
| ReducedChemCompProvider reduced = new ReducedChemCompProvider(); | ||
| if( fallback == null) { | ||
| fallback = new ReducedChemCompProvider(); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. creating a ReducedChemCompProvider is cheap (and idempotent), but let's cache it anyways. |
||
| } | ||
|
|
||
| return reduced.getChemComp(recordName); | ||
| return fallback.getChemComp(recordName); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a warning logged when the fallback is used? That'd be important, so that users are aware and can investigate if there's something wrong in their code or file system
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. |
||
|
|
||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -328,7 +328,13 @@ public void newAtomSite(AtomSite atom) { | |
|
|
||
| String recordName = atom.getGroup_PDB(); | ||
| String residueNumberS = atom.getAuth_seq_id(); | ||
| Integer residueNrInt = Integer.parseInt(residueNumberS); | ||
| Integer residueNrInt; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you handle this in a separate pull request? as per discussion in #775 |
||
| if(residueNumberS != null) { | ||
| residueNrInt = Integer.parseInt(residueNumberS); | ||
| } else { | ||
| String label_seq_id = atom.getLabel_seq_id(); | ||
| residueNrInt = Integer.parseInt(label_seq_id); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should discuss this further. I added it because the atp.cif.gz file I added (from pymol) has label_seq_id but not auth_seq_id. It seems reasonable to me to fall back on the sequential numbering if the authors don't specify a custom numbering. Note that
Ideally BioJava would use
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's another potential issue here, which is that the seq_id is stored as a
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started #775 to continue this discussion. |
||
| } | ||
|
|
||
| // the 3-letter name of the group: | ||
| String groupCode3 = atom.getLabel_comp_id(); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,10 +29,10 @@ | |
| import java.io.File; | ||
| import java.io.FileOutputStream; | ||
| import java.io.IOException; | ||
| import java.net.URL; | ||
| import java.nio.file.Files; | ||
| import java.nio.file.Path; | ||
| import java.nio.file.Paths; | ||
| import java.nio.file.attribute.FileAttribute; | ||
| import java.text.ParseException; | ||
| import java.text.SimpleDateFormat; | ||
| import java.util.Date; | ||
|
|
@@ -53,8 +53,9 @@ | |
| import org.biojava.nbio.structure.io.LocalPDBDirectory; | ||
| import org.biojava.nbio.structure.io.LocalPDBDirectory.FetchBehavior; | ||
| import org.biojava.nbio.structure.io.LocalPDBDirectory.ObsoleteBehavior; | ||
| import org.biojava.nbio.structure.io.util.FileDownloadUtils; | ||
| import org.biojava.nbio.structure.io.MMCIFFileReader; | ||
| import org.biojava.nbio.structure.io.mmcif.model.ChemComp; | ||
| import org.biojava.nbio.structure.io.util.FileDownloadUtils; | ||
| import org.biojava.nbio.structure.scop.ScopDatabase; | ||
| import org.biojava.nbio.structure.scop.ScopFactory; | ||
| import org.junit.After; | ||
|
|
@@ -364,17 +365,34 @@ public void testEmptyChemComp() throws IOException, StructureException { | |
| cache.setUseMmCif(true); | ||
|
|
||
| // Create an empty chemcomp | ||
| Path sub = tmpCache.resolve(Paths.get("chemcomp", "ALA.cif.gz")); | ||
| Files.createDirectories(sub.getParent()); | ||
| Files.createFile(sub); | ||
| assertTrue(Files.exists(sub)); | ||
| assertEquals(0, Files.size(sub)); | ||
| Path chemCompCif = tmpCache.resolve(Paths.get("chemcomp", "ATP.cif.gz")); | ||
| Files.createDirectories(chemCompCif.getParent()); | ||
| Files.createFile(chemCompCif); | ||
| assertTrue(Files.exists(chemCompCif)); | ||
| assertEquals(0, Files.size(chemCompCif)); | ||
|
|
||
| // Copy stub file into place | ||
| Path testCif = tmpCache.resolve(Paths.get("data", "structures", "divided", "mmCIF", "ab","1abc.cif.gz")); | ||
| Files.createDirectories(testCif.getParent()); | ||
| URL resource = AtomCacheTest.class.getResource("/atp.cif.gz"); | ||
| File src = new File(resource.getPath()); | ||
| FileDownloadUtils.copy(src, testCif.toFile()); | ||
|
|
||
| Structure s = cache.getStructure("1A4W"); | ||
| // Load structure | ||
| Structure s = cache.getStructure("1ABC"); | ||
|
|
||
| assertNotNull(s); | ||
|
|
||
| Group g = s.getChainByPDB("A").getAtomGroup(0); | ||
| assertTrue(g.getPDBName().equals("ATP")); | ||
|
|
||
| // should be unknown | ||
| ChemComp chem = g.getChemComp(); | ||
| assertNotNull(chem); | ||
| assertTrue(chem.getAtoms().size() > 0); | ||
| assertEquals("NON-POLYMER", chem.getType()); | ||
| } finally { | ||
| FileDownloadUtils.deleteDirectory(tmpCache); | ||
| // FileDownloadUtils.deleteDirectory(tmpCache); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why commented?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
| } | ||
| } | ||
|
|
||
|
|
@@ -387,6 +405,7 @@ public void testEmptyChemComp() throws IOException, StructureException { | |
| */ | ||
| @Test | ||
| public void testEmptyGZChemComp() throws IOException, StructureException { | ||
|
|
||
| Path tmpCache = Paths.get(System.getProperty("java.io.tmpdir"),"BIOJAVA_TEST_CACHE"); | ||
| logger.info("Testing AtomCache at {}", tmpCache.toString()); | ||
| System.setProperty(UserConfiguration.PDB_DIR, tmpCache.toString()); | ||
|
|
@@ -400,7 +419,7 @@ public void testEmptyGZChemComp() throws IOException, StructureException { | |
| cache.setUseMmCif(true); | ||
|
|
||
| // Create an empty chemcomp | ||
| Path sub = tmpCache.resolve(Paths.get("chemcomp", "ALA.cif.gz")); | ||
| Path sub = tmpCache.resolve(Paths.get("chemcomp", "ATP.cif.gz")); | ||
| Files.createDirectories(sub.getParent()); | ||
| try(GZIPOutputStream out = new GZIPOutputStream(new FileOutputStream(sub.toFile()))) { | ||
| // don't write anything | ||
|
|
@@ -409,12 +428,28 @@ public void testEmptyGZChemComp() throws IOException, StructureException { | |
| assertTrue(Files.exists(sub)); | ||
| assertTrue(0 < Files.size(sub)); | ||
|
|
||
| Structure s = cache.getStructure("1A4W"); | ||
| // Copy stub file into place | ||
| Path testCif = tmpCache.resolve(Paths.get("data", "structures", "divided", "mmCIF", "ab","1abc.cif.gz")); | ||
| Files.createDirectories(testCif.getParent()); | ||
| URL resource = AtomCacheTest.class.getResource("/atp.cif.gz"); | ||
| File src = new File(resource.getPath()); | ||
| FileDownloadUtils.copy(src, testCif.toFile()); | ||
|
|
||
| // Load structure | ||
| Structure s = cache.getStructure("1ABC"); | ||
|
|
||
| assertNotNull(s); | ||
|
|
||
| Group g = s.getChainByPDB("A").getAtomGroup(0); | ||
| assertTrue(g.getPDBName().equals("ATP")); | ||
|
|
||
| // should be unknown | ||
| ChemComp chem = g.getChemComp(); | ||
| assertNotNull(chem); | ||
| assertTrue(chem.getAtoms().size() > 0); | ||
| assertEquals("NON-POLYMER", chem.getType()); | ||
| } finally { | ||
| FileDownloadUtils.deleteDirectory(tmpCache); | ||
| // FileDownloadUtils.deleteDirectory(tmpCache); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why commented? Is cleaning up not needed here? If not needed please remove the try/finally
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. Cleaning up /tmp is polite, but makes it harder to debug failing tests. |
||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small change in behavior. If we can't read the downloaded chemcomp file, we now return a stub chemcomp rather than NULL. Previously the stub was returned for corrupt GZIP files (which threw an error) but not for malformed cif contents (which just returned null).