Skip to content
Merged
Prev Previous commit
Next Next commit
Make tests for #703 fail with an assertion
The main change hasn't been implemented, so we want tests to fail.
However, the tests exposed some NPE and IO exceptions. These are now
fixed, so the tests fail in the expected manner.

- Use ATP ligand, which is not covered by the ReducedChemCompProvider
- Use the ReducedChemCompProvider as a fallback consistently, preventing
  null chemComp
- Defensive parsing in SimpleMMcifConsumer
- Robust test, doesn't require internet!
  • Loading branch information
sbliven committed Jun 8, 2018
commit b207d34bfe946d4eb8eccfb3483cf97b2b8bb769
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ public class DownloadChemCompProvider implements ChemCompProvider {
protectedIDs.add("AUX");
protectedIDs.add("NUL");
}

private static ChemCompProvider fallback = null; // Fallback provider if the download fails

/** by default we will download only some of the files. User has to request that all files should be downloaded...
*
Expand Down Expand Up @@ -272,7 +274,10 @@ public ChemComp getChemComp(String recordName) {

ChemComp chemComp = dict.getChemComp(recordName);

return chemComp;
// May be null if the file was corrupt. Fall back on ReducedChemCompProvider in that case
if(chemComp != null) {
return chemComp;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small change in behavior. If we can't read the downloaded chemcomp file, we now return a stub chemcomp rather than NULL. Previously the stub was returned for corrupt GZIP files (which threw an error) but not for malformed cif contents (which just returned null).

}

} catch (IOException e) {

Expand All @@ -296,9 +301,11 @@ public ChemComp getChemComp(String recordName) {

// see https://github.com/biojava/biojava/issues/315
// probably a network error happened. Try to use the ReducedChemCOmpProvider
ReducedChemCompProvider reduced = new ReducedChemCompProvider();
if( fallback == null) {
fallback = new ReducedChemCompProvider();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating a ReducedChemCompProvider is cheap (and idempotent), but let's cache it anyways.

}

return reduced.getChemComp(recordName);
return fallback.getChemComp(recordName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a warning logged when the fallback is used? That'd be important, so that users are aware and can investigate if there's something wrong in their code or file system

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.


}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,13 @@ public void newAtomSite(AtomSite atom) {

String recordName = atom.getGroup_PDB();
String residueNumberS = atom.getAuth_seq_id();
Integer residueNrInt = Integer.parseInt(residueNumberS);
Integer residueNrInt;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you handle this in a separate pull request? as per discussion in #775

if(residueNumberS != null) {
residueNrInt = Integer.parseInt(residueNumberS);
} else {
String label_seq_id = atom.getLabel_seq_id();
residueNrInt = Integer.parseInt(label_seq_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_seq_id is not the same as residue number. This can introduce many problems.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss this further.

I added it because the atp.cif.gz file I added (from pymol) has label_seq_id but not auth_seq_id. It seems reasonable to me to fall back on the sequential numbering if the authors don't specify a custom numbering.

Note that auth_seq_id is optional but label_seq_id is required. From the spec it sounds to me like label_seq_id is a good fallback for ResidueNumber:

_atom_site.auth_seq_id:

An alternative identifier for _atom_site.label_seq_id that
may be provided by an author in order to match the identification
used in the publication that describes the structure.

Ideally BioJava would use Group.getId() rather then getResidueNumber() everywhere. However, many things still use residue numbers, so setting a default seems prudent.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another potential issue here, which is that the seq_id is stored as a long while ResidueNumber contains an Integer. I don't think there are any files with >2billion groups per entity, but if we hit it then the ResidueNumber would throw a NumberFormatException here. I think that's fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started #775 to continue this discussion.

}

// the 3-letter name of the group:
String groupCode3 = atom.getLabel_comp_id();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileAttribute;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
Expand All @@ -53,8 +53,9 @@
import org.biojava.nbio.structure.io.LocalPDBDirectory;
import org.biojava.nbio.structure.io.LocalPDBDirectory.FetchBehavior;
import org.biojava.nbio.structure.io.LocalPDBDirectory.ObsoleteBehavior;
import org.biojava.nbio.structure.io.util.FileDownloadUtils;
import org.biojava.nbio.structure.io.MMCIFFileReader;
import org.biojava.nbio.structure.io.mmcif.model.ChemComp;
import org.biojava.nbio.structure.io.util.FileDownloadUtils;
import org.biojava.nbio.structure.scop.ScopDatabase;
import org.biojava.nbio.structure.scop.ScopFactory;
import org.junit.After;
Expand Down Expand Up @@ -364,17 +365,34 @@ public void testEmptyChemComp() throws IOException, StructureException {
cache.setUseMmCif(true);

// Create an empty chemcomp
Path sub = tmpCache.resolve(Paths.get("chemcomp", "ALA.cif.gz"));
Files.createDirectories(sub.getParent());
Files.createFile(sub);
assertTrue(Files.exists(sub));
assertEquals(0, Files.size(sub));
Path chemCompCif = tmpCache.resolve(Paths.get("chemcomp", "ATP.cif.gz"));
Files.createDirectories(chemCompCif.getParent());
Files.createFile(chemCompCif);
assertTrue(Files.exists(chemCompCif));
assertEquals(0, Files.size(chemCompCif));

// Copy stub file into place
Path testCif = tmpCache.resolve(Paths.get("data", "structures", "divided", "mmCIF", "ab","1abc.cif.gz"));
Files.createDirectories(testCif.getParent());
URL resource = AtomCacheTest.class.getResource("/atp.cif.gz");
File src = new File(resource.getPath());
FileDownloadUtils.copy(src, testCif.toFile());

Structure s = cache.getStructure("1A4W");
// Load structure
Structure s = cache.getStructure("1ABC");

assertNotNull(s);

Group g = s.getChainByPDB("A").getAtomGroup(0);
assertTrue(g.getPDBName().equals("ATP"));

// should be unknown
ChemComp chem = g.getChemComp();
assertNotNull(chem);
assertTrue(chem.getAtoms().size() > 0);
assertEquals("NON-POLYMER", chem.getType());
} finally {
FileDownloadUtils.deleteDirectory(tmpCache);
// FileDownloadUtils.deleteDirectory(tmpCache);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commented?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
}

Expand All @@ -387,6 +405,7 @@ public void testEmptyChemComp() throws IOException, StructureException {
*/
@Test
public void testEmptyGZChemComp() throws IOException, StructureException {

Path tmpCache = Paths.get(System.getProperty("java.io.tmpdir"),"BIOJAVA_TEST_CACHE");
logger.info("Testing AtomCache at {}", tmpCache.toString());
System.setProperty(UserConfiguration.PDB_DIR, tmpCache.toString());
Expand All @@ -400,7 +419,7 @@ public void testEmptyGZChemComp() throws IOException, StructureException {
cache.setUseMmCif(true);

// Create an empty chemcomp
Path sub = tmpCache.resolve(Paths.get("chemcomp", "ALA.cif.gz"));
Path sub = tmpCache.resolve(Paths.get("chemcomp", "ATP.cif.gz"));
Files.createDirectories(sub.getParent());
try(GZIPOutputStream out = new GZIPOutputStream(new FileOutputStream(sub.toFile()))) {
// don't write anything
Expand All @@ -409,12 +428,28 @@ public void testEmptyGZChemComp() throws IOException, StructureException {
assertTrue(Files.exists(sub));
assertTrue(0 < Files.size(sub));

Structure s = cache.getStructure("1A4W");
// Copy stub file into place
Path testCif = tmpCache.resolve(Paths.get("data", "structures", "divided", "mmCIF", "ab","1abc.cif.gz"));
Files.createDirectories(testCif.getParent());
URL resource = AtomCacheTest.class.getResource("/atp.cif.gz");
File src = new File(resource.getPath());
FileDownloadUtils.copy(src, testCif.toFile());

// Load structure
Structure s = cache.getStructure("1ABC");

assertNotNull(s);

Group g = s.getChainByPDB("A").getAtomGroup(0);
assertTrue(g.getPDBName().equals("ATP"));

// should be unknown
ChemComp chem = g.getChemComp();
assertNotNull(chem);
assertTrue(chem.getAtoms().size() > 0);
assertEquals("NON-POLYMER", chem.getType());
} finally {
FileDownloadUtils.deleteDirectory(tmpCache);
// FileDownloadUtils.deleteDirectory(tmpCache);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commented? Is cleaning up not needed here?

If not needed please remove the try/finally

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Cleaning up /tmp is polite, but makes it harder to debug failing tests.

}
}

Expand Down
Binary file added biojava-structure/src/test/resources/atp.cif.gz
Binary file not shown.