devldevelopment
diff --git a/‎README.md‎
Lines changed: 6 additions & 0 deletions b/‎README.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎genomics/README.md‎
Lines changed: 61 additions & 0 deletions b/‎genomics/README.md‎
Lines changed: 61 additions & 0 deletions
diff --git a/‎genomics/chromosomeposition.md‎
Lines changed: 41 additions & 0 deletions b/‎genomics/chromosomeposition.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎genomics/genebank.md‎
Lines changed: 99 additions & 0 deletions b/‎genomics/genebank.md‎
Lines changed: 99 additions & 0 deletions
diff --git a/‎genomics/genenames.md‎
Lines changed: 50 additions & 0 deletions b/‎genomics/genenames.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎genomics/gff.md‎
Lines changed: 52 additions & 0 deletions b/‎genomics/gff.md‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎genomics/img/genomics.png‎
12.2 KB b/‎genomics/img/genomics.png‎
12.2 KB
diff --git a/‎genomics/installation.md‎
Lines changed: 52 additions & 0 deletions b/‎genomics/installation.md‎
Lines changed: 52 additions & 0 deletions
@@ -11,8 +11,14 @@ At the moment this tutorial is still under development. Please check  the [BioJa
 
 ## Index
 
+Quick [Installation](installation.md)
+
 Book 1: [The Protein Structure modules](structure/README.md)
 
+Book 2: [The Genomics Module](genomics/README.md)
+
+Book 3: Alignments
+
 
 ## License
 
 
@@ -0,0 +1,61 @@
+The BioJava - Genomics Module
+=====================================================
+
+A tutorial for the genomics module of [BioJava](http://www.biojava.org)
+
+## About
+<table>
+    <tr>
+        <td>
+            <img src="img/genomics.png"/>
+        </td>
+        <td>
+            The <i>genome</i> module of BioJava provides an API that allows to
+            <ul>
+                <li>Parse popular file formats used in genomcs</li>
+                <li>Convert from one file format to another</li>
+                <li>Translate DNA sequences into protein sequences</li>                
+            </ul>
+
+        </td>
+    </tr>
+</table>   
+
+## Index
+
+This tutorial is split into several chapters.
+
+Chapter 1 - Quick [Installation](installation.md)
+
+Chapter 2 - Reading [gene names information](genenames.md) from genenames.org
+
+Chapter 3 - Reading [chromosomal positions](chromosomeposition.md) for genes. (UCSC's refFlat.txt.gz )
+
+Chapter 4 - Reading [GTF and GFF files](gff.md)
+
+Chapter 5 - Reading and writing a [Genebank](genebank.md) file
+
+Chapter 5 - Reading [karyotype (cytoband)](karyotype.md) files
+
+Chapter 6 - Reading UCSC's .2bit files
+
+
+
+### Author: 
+
+[Andreas Prli&#263;](https://github.com/andreasprlic)
+
+## Please cite
+
+**BioJava: an open-source framework for bioinformatics in 2012**<br/>
+*Andreas Prlic; Andrew Yates; Spencer E. Bliven; Peter W. Rose; Julius Jacobsen; Peter V. Troshin; Mark Chapman; Jianjiong Gao; Chuan Hock Koh; Sylvain Foisy; Richard Holland; Gediminas Rimsa; Michael L. Heuer; H. Brandstatter-Muller; Philip E. Bourne; Scooter Willis* <br/>
+[Bioinformatics (2012) 28 (20): 2693-2695.](http://bioinformatics.oxfordjournals.org/content/28/20/2693.abstract) <br/>
+doi: 10.1093/bioinformatics/bts494
+
+## License
+
+The content of this tutorial is available under the [CC-BY](http://creativecommons.org/licenses/by/3.0/) license.
+
+[view license](../license.md)
+
+
@@ -0,0 +1,41 @@
+Parse Chromosomal Information of Genes
+======================================
+
+BioJava contains a parser the [refFlat.txt.gz](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz)
+from the UCSC genome browser that contains a mapping of gene names to chromosome positions.
+
+
+```java
+	try {
+
+			List<GeneChromosomePosition> genePositions=	GeneChromosomePositionParser.getChromosomeMappings();
+			System.out.println("got " + genePositions.size() + " gene positions") ;
+
+			for (GeneChromosomePosition pos : genePositions){
+				if ( pos.getGeneName().equals("FOLH1")) {
+					System.out.println(pos);
+					break;
+				}
+			}
+
+		} catch(Exception e){
+			e.printStackTrace();
+		}
+```
+
+If a local copy of the file is available, it can be provide via this:
+
+
+```java
+
+        URL url = new URL("file://local/copy/of/file");
+
+		InputStreamProvider prov = new InputStreamProvider();
+
+		InputStream inStream = prov.getInputStream(url);
+
+		GeneChromosomePositionParser.getChromosomeMappings(inStream);
+
+
+
+```
@@ -0,0 +1,99 @@
+Reading and writing a Genbank file
+==================================
+
+There are multiple ways how to read a Genbank file.
+
+## Method 1: Read a Genbank file using the GenbankProxySequenceReader
+
+```java
+
+	GenbankProxySequenceReader<AminoAcidCompound> genbankProteinReader
+	= new GenbankProxySequenceReader<AminoAcidCompound>("/tmp", "NP_000257", AminoAcidCompoundSet.getAminoAcidCompoundSet());
+	ProteinSequence proteinSequence = new ProteinSequence(genbankProteinReader);
+	genbankProteinReader.getHeaderParser().parseHeader(genbankProteinReader.getHeader(), proteinSequence);
+	System.out.println("Sequence" + "(" + proteinSequence.getAccession() + "," + proteinSequence.getLength() + ")=" +
+proteinSequence.getSequenceAsString().substring(0, 10) + "...");
+
+	GenbankProxySequenceReader<NucleotideCompound> genbankDNAReader
+	= new GenbankProxySequenceReader<NucleotideCompound>("/tmp", "NM_001126", DNACompoundSet.getDNACompoundSet());
+	DNASequence dnaSequence = new DNASequence(genbankDNAReader);
+	genbankDNAReader.getHeaderParser().parseHeader(genbankDNAReader.getHeader(), dnaSequence);
+	System.out.println("Sequence" + "(" + dnaSequence.getAccession() + "," + dnaSequence.getLength() + ")=" +
+dnaSequence.getSequenceAsString().substring(0, 10) + "...");
+
+```
+
+
+## Method 2: Read a Genbank file using GenbankReaderHelper
+
+```java
+	File dnaFile = new File("src/test/resources/NM_000266.gb");
+	File protFile = new File("src/test/resources/BondFeature.gb");
+
+	LinkedHashMap<String, DNASequence> dnaSequences = GenbankReaderHelper.readGenbankDNASequence( dnaFile );
+	for (DNASequence sequence : dnaSequences.values()) {
+	    	System.out.println( sequence.getSequenceAsString() );
+	}
+
+	LinkedHashMap<String, ProteinSequence> protSequences = GenbankReaderHelper.readGenbankProteinSequence(protFile);
+	for (ProteinSequence sequence : protSequences.values()) {
+		System.out.println( sequence.getSequenceAsString() );
+	}
+
+```
+
+## Method 3: Read a Genbank file using the GenbankReader Object
+
+```java
+
+	FileInputStream is = new FileInputStream(dnaFile);
+	GenbankReader<DNASequence, NucleotideCompound> dnaReader = new GenbankReader<DNASequence, NucleotideCompound>(
+	        is,
+	        new GenericGenbankHeaderParser<DNASequence,NucleotideCompound>(),
+	        new DNASequenceCreator(DNACompoundSet.getDNACompoundSet())
+	);
+	dnaSequences = dnaReader.process();
+	is.close();
+	System.out.println(dnaSequences);
+
+	is = new FileInputStream(protFile);
+	GenbankReader<ProteinSequence, AminoAcidCompound> protReader = new GenbankReader<ProteinSequence, AminoAcidCompound>(
+	        is,
+	        new GenericGenbankHeaderParser<ProteinSequence,AminoAcidCompound>(),
+	        new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())
+	);
+	protSequences = protReader.process();
+	is.close();
+	System.out.println(protSequences);
+
+	```
+
+
+# Write a Genbank file
+
+
+Use the GenbankWriterHelper to write DNA sequences into a Genbank file.
+
+```java
+
+        // First let's read dome DNA sequences from a genbank file
+
+		File dnaFile = new File("src/test/resources/NM_000266.gb");
+		LinkedHashMap<String, DNASequence> dnaSequences = GenbankReaderHelper.readGenbankDNASequence( dnaFile );
+		ByteArrayOutputStream fragwriter = new ByteArrayOutputStream();
+		ArrayList<DNASequence> seqs = new ArrayList<DNASequence>();
+		for(DNASequence seq : dnaSequences.values()) {
+			seqs.add(seq);
+		}
+
+		// ok now we got some DNA sequence data. Next step is to write it
+
+		GenbankWriterHelper.writeNucleotideSequence(fragwriter, seqs,
+				GenbankWriterHelper.LINEAR_DNA);
+
+        // the fragwriter object now contains a string representation in the Genbank format
+        // and you could write this into a file
+        // or print it out on the console
+		System.out.println(fragwriter.toString());
+
+```
@@ -0,0 +1,50 @@
+Parse Gene Name Information
+===========================
+
+The following code parses [a file from the www.genenames.org](http://www.genenames.org/cgi-bin/download?title=HGNC+output+data&hgnc_dbtag=on&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_prev_sym&col=gd_prev_name&col=gd_aliases&col=gd_pub_chrom_map&col=gd_pub_acc_ids&col=md_mim_id&col=gd_pub_refseq_ids&col=md_ensembl_id&col=md_prot_id&col=gd_hgnc_id" +
+                                                             			 "&status=Approved&status_opt=2&where=((gd_pub_chrom_map%20not%20like%20%27%patch%%27%20and%20gd_pub_chrom_map%20not%20like%20%27%ALT_REF%%27)%20or%20gd_pub_chrom_map%20IS%20NULL)%20and%20gd_locus_group%20%3d%20%27protein-coding%20gene%27&order_by=gd_app_sym_sort&format=text&limit=&submit=submit&.cgifields=&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag)
+website that contains a mapping of human gene names to other databases.
+
+
+```java
+    /** parses a file from the genenames website
+	 *
+	 * @param args
+	 */
+	public static void main(String[] args) {
+
+		try {
+
+			List<GeneName> geneNames = GeneNamesParser.getGeneNames();
+
+			System.out.println("got " + geneNames.size() + " gene names");
+
+
+			for ( GeneName g : geneNames){
+				if ( g.getApprovedSymbol().equals("FOLH1"))
+					System.out.println(g);
+			}
+			// and returns a list of beans that contains key-value pairs for each gene name
+
+		} catch (Exception e) {
+			// TODO Auto-generated catch block
+			e.printStackTrace();
+		}
+
+	}
+```
+
+If you have a local copy of this file, then you can just provide an input stream for it:
+
+```java
+
+        URL url = new URL("file:///local/copy/of/file");
+
+		InputStreamProvider prov = new InputStreamProvider();
+
+		InputStream inStream = prov.getInputStream(url);
+
+	    GeneNamesParser.getGeneNames(inStream);
+
+
+```
@@ -0,0 +1,52 @@
+Reading GFF files
+=================
+
+The biojava3-genome library leverages the sequence relationships in biojava3-core to read (gtf,gff2,gff3) files and
+write gff3 files. The file formats for gtf, gff2, gff3 are well defined but what gets written in the file is very
+flexible. We currently provide support for reading gff files generated by open source gene prediction applications
+GeneID, GeneMark and GlimmerHMM. Each prediction algorithm uses a different ontology to describe coding sequence,
+exons, start or stop codon which makes it difficult to write a general purpose gff parser that can create biologically
+meaningful objects. If the application is simply loading a gff file and drawing a colored glyph then you don't need to
+worry about the ontology used. It is easier to support the popular gene prediction algorithms by writing a parser that
+is aware of each gene prediction applications ontology.
+
+
+The following code example takes a 454scaffold file that was used by genemark to predict genes and returns a
+collection of ChromosomeSequences. Each chromosome sequence maps to a named entry in the fasta file and would
+contain N gene sequences. The gene sequences can be +/- strand with frame shifts and multiple transcriptions.
+
+Passing the collection of ChromsomeSequences to GeneFeatureHelper.getProteinSequences would return all protein
+sequences. You can then write the protein sequences to a fasta file.
+
+```java
+
+    LinkedHashMap<String, ChromosomeSequence> chromosomeSequenceList = GeneFeatureHelper.loadFastaAddGeneFeaturesFromGeneMarkGTF(new File("454Scaffolds.fna"), new File("genemark_hmm.gtf"));
+    LinkedHashMap<String, ProteinSequence> proteinSequenceList = GeneFeatureHelper.getProteinSequences(chromosomeSequenceList.values());
+    FastaWriterHelper.writeProteinSequence(new File("genemark_proteins.faa"), proteinSequenceList.values());
+```
+
+You can also output the gene sequence to a fasta file where the coding regions will be upper case and the non-coding regions will be lower case
+
+```java
+    LinkedHashMap<String, GeneSequence> geneSequenceHashMap = GeneFeatureHelper.getGeneSequences(chromosomeSequenceList.values());
+    Collection<GeneSequence> geneSequences = geneSequenceHashMap.values();
+    FastaWriterHelper.writeGeneSequence(new File("genemark_genes.fna"), geneSequences, true);
+
+```
+
+You can easily write out a gff3 view of a ChromosomeSequence with the following code.
+
+```java
+    FileOutputStream fo = new FileOutputStream("genemark.gff3");
+    GFF3Writer gff3Writer = new GFF3Writer();
+    gff3Writer.write(fo, chromosomeSequenceList);
+    fo.close();
+```
+
+The chromsome sequence becomes the middle layer that represents the essence of what is mapped in a gtf, gff2 or
+gff3 file. This makes it fairly easy to write code to convert from gtf to gff3 or from gff2 to gtf. The challenge
+is picking the correct ontology for writing into gtf or gff2 formats. You could use feature names used by a
+specific gene prediction program or features supported by your favorite genome browser. We would like to provide a
+complete set of java classes to do these conversions where the list of supported gene prediction applications and
+genome browsers will get longer based on end user requests.
+
@@ -0,0 +1,52 @@
+## Quick Installation
+
+In the beginning, just one quick paragraph of how to get access to BioJava.
+
+BioJava is open source and you can get the code from [Github](https://github.com/biojava/biojava), however it might be easier this way:
+
+BioJava uses [Maven](http://maven.apache.org/) as a build and distribution system. If you are new to Maven, take a look at the [Getting Started with Maven](http://maven.apache.org/guides/getting-started/index.html)  guide.
+
+Currently, we are providing a BioJava specific Maven repository at (http://biojava.org/download/maven/) .
+
+You can add the BioJava repository by adding the following XML to your project pom.xml file:
+
+```xml
+        <repositories>
+            ...
+            <repository>
+                <id>biojava-maven-repo</id>
+                <name>BioJava repository</name>
+                <url>http://www.biojava.org/download/maven/</url>           
+            </repository>
+        </repositories>
+```
+
+We are currently in the process of changing our distribution to Maven Central, which would not even require this configuration step.
+
+```xml
+        <dependencies>
+                ...
+
+                 <!-- This imports the latest version of BioJava genomics module -->
+                <dependency>
+
+                        <groupId>org.biojava</groupId>
+                        <artifactId>biojava3-genomics</artifactId>
+                        <version>3.0.8</version>
+                        <!-- note: the genomics module depends on the BioJava-core module and will import it automatically -->
+                </dependency>
+
+
+                <!-- other biojava jars as needed -->
+
+        </dependencies> 
+```
+
+If you run 
+
+<pre>
+    mvn package
+</pre>
+
+ on your project, the BioJava dependencies will be automatically downloaded and installed for you.
+