Skip to content

Add protein alias and gene alias to uniprot extract#568

Merged
emckee2006 merged 4 commits into
biojava:masterfrom
emckee2006:master
Aug 25, 2016
Merged

Add protein alias and gene alias to uniprot extract#568
emckee2006 merged 4 commits into
biojava:masterfrom
emckee2006:master

Conversation

@emckee2006

Copy link
Copy Markdown
Contributor

No description provided.

emckee2006 and others added 4 commits May 12, 2016 02:11
Catch back up with biojava/master
Catch back up with biojava/master
# Conflicts:
#	biojava-structure/src/main/java/org/biojava/nbio/structure/io/PDBFileParser.java
@emckee2006 emckee2006 merged commit aaaa2e8 into biojava:master Aug 25, 2016
@andreasprlic

Copy link
Copy Markdown
Member

Just to add: for a complete representation of UniProt in a Java datamodel, see here:

https://github.com/rcsb/uniprot-or-mapping

@emckee2006

Copy link
Copy Markdown
Contributor Author

Doesn't that only work if one is persisting it in a database in that format?

On Aug 25, 2016 5:17 PM, "Andreas Prlic" notifications@github.com wrote:

Just to add: for a complete representation of UniProt in a Java datamodel,
see here:

https://github.com/rcsb/uniprot-or-mapping


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#568 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFZRcPwBMYRwhBoeoJjHrkJqwlHETm7Kks5qjgZXgaJpZM4JsrDV
.

@andreasprlic

Copy link
Copy Markdown
Member

You can just read from UniProt XML, without a DB.

@andreasprlic

Copy link
Copy Markdown
Member

Just added some more docu there to provide an example how to read a UniProt XML into a Java object. For example this can be done:

            URL u = UniProtTools.getURLforXML(accession);
            InputStream inStream = u.openStream();
            Uniprot up = UniProtTools.readUniProtFromInputStream(inStream);

@emckee2006

Copy link
Copy Markdown
Contributor Author

Any examples on how to do that? That may be easier then using the proxy
reader (currently i download the full tar.gz from uniprot, read through it
record by record, and pass each to the proxy reader:
UniprotProxySequenceReader upsr
= UniprotProxySequenceReader
.parseUniprotXMLString(
" " + xml.toString()
+ " ",
new AminoAcidCompoundSet());

I could see an advantage of having it all in our db, though. However, we
are an oracle shop and would have to create the schema ahead of time (and
change the hibernate settings appropriately). Does is also remove obsolete
entries/update entries which have changed since last load, or is it just
lead each record once?

On Thu, Aug 25, 2016 at 5:22 PM, Andreas Prlic notifications@github.com
wrote:

You can just read from UniProt XML, without a DB.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#568 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFZRcDHq6RoeF5jZATZ-3KFyG_vhkX8xks5qjgeXgaJpZM4JsrDV
.

@andreasprlic

Copy link
Copy Markdown
Member

We can reload all of UniProt (and a small subset of Trembl) over night, so we never bothered with writing an incremental update strategy.

@emckee2006

Copy link
Copy Markdown
Contributor Author

so you just tend to dump it all and reload? With some sort of script which
truncates the tables?

On Thu, Aug 25, 2016 at 5:31 PM, Andreas Prlic notifications@github.com
wrote:

We can reload all of UniProt (and a small subset of Trembl) over night, so
we never bothered with writing an incremental update strategy.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#568 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFZRcAXUM_f549XmDL4jx7ctJg4IW2poks5qjgm4gaJpZM4JsrDV
.

@emckee2006

Copy link
Copy Markdown
Contributor Author

Also, does everything from the uniprot record currently go in the database,
or just some of it?

On Thu, Aug 25, 2016 at 5:32 PM, Erik McKee emckee2006@gmail.com wrote:

so you just tend to dump it all and reload? With some sort of script
which truncates the tables?

On Thu, Aug 25, 2016 at 5:31 PM, Andreas Prlic notifications@github.com
wrote:

We can reload all of UniProt (and a small subset of Trembl) over night,
so we never bothered with writing an incremental update strategy.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#568 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFZRcAXUM_f549XmDL4jx7ctJg4IW2poks5qjgm4gaJpZM4JsrDV
.

@andreasprlic

Copy link
Copy Markdown
Member
  • We just start with an empty database, let hibernate create all the tables and the we can load whatever we want (see LoadMissing class). We have not tried to load all of Trembl yet, but it works fine for all of SwissProt plus the subset of Trembl that is linked to PDB.
  • Yes it is a complete mapping of all of the UniProt data into the database/ Java data model.

We should move this thread over to the other project :-)

@emckee2006

Copy link
Copy Markdown
Contributor Author

How do we do that?
Is there a way to get a dump of the schema so i can create it locally?
Does that schema change from time to time?

On Thu, Aug 25, 2016 at 5:46 PM, Andreas Prlic notifications@github.com
wrote:

We just start with an empty database, let hibernate create all the
tables and the we can load whatever we want (see LoadMissing class). We
have not tried to load all of Trembl yet, but it works fine for all of
SwissProt plus the subset of Trembl that is linked to PDB.

Yes it is a complete mapping of all of the UniProt data into the
database/ Java data model.

We should move this thread over to the other project :-)


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#568 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFZRcKaPVYA5aQYorYOSYqdvuncV6zaLks5qjg0hgaJpZM4JsrDV
.

@andreasprlic

Copy link
Copy Markdown
Member

UniProt schema changes quite regularly. The project has a built in schema version that is used as a default (see file uniprot.xsd in the resources folder). I'll set up a documentation page for how to run a database load.

@emckee2006

Copy link
Copy Markdown
Contributor Author

So it's not feasible to have the db schema before hand, and always just
truncate and load the db?

On Thu, Aug 25, 2016 at 5:54 PM, Andreas Prlic notifications@github.com
wrote:

UniProt schema changes quite regularly. The project has a built in schema
version that is used as a default (see file uniprot.xsd in the resources
folder). I'll set up a documentation page for how to run a database load.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#568 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFZRcEXEHKBf5r3QGRj-x7AbhmHLGTPDks5qjg8vgaJpZM4JsrDV
.

@andreasprlic

andreasprlic commented Aug 25, 2016

Copy link
Copy Markdown
Member

No, the db-schema gets generated at compile time. (The datamodel is auto-generated by the XML schema). As such this is available also before hand. That's why the database in principle is not needed and the code works also when only UniProt XML files are available. Truncating an existing db and just loading is possible.

@andreasprlic

andreasprlic commented Aug 25, 2016

Copy link
Copy Markdown
Member

here some documation for how to load a DB
https://github.com/rcsb/uniprot-or-mapping/blob/master/loaddatabase.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants