Variation APIs that bring together Ensembl Variation, VCF file format, GFF3+GVF file format, samtools, Picard, GATK, etc.
Several similar file specifications exist for dealing with sequence variation, including:
Some support for these file specifications is already present in various bioinformatics libraries (and in fact biojava3 already provides GFF3 support); it would be desirable to pull these together behind a set of common APIs in biojava3.
Approach
- Consider existing open source VCF and GVF implementations (Genotype Analysis Toolkit, GATK, VCFTools, Picard, GVF-Parser, etc.)
- Design APIs for common entities (Allele, Genotype, Haplotype, etc.)
- Create adaptors to third party implementations or implement support directly in Biojava3
Suggested for GSoC 2013
Variation APIs that bring together Ensembl Variation, VCF file format, GFF3+GVF file format, samtools, Picard, GATK, etc.
Several similar file specifications exist for dealing with sequence variation, including:
*samtools
*picard
Some support for these file specifications is already present in various bioinformatics libraries (and in fact biojava3 already provides GFF3 support); it would be desirable to pull these together behind a set of common APIs in biojava3.
Approach
Suggested for GSoC 2013