Difference between revisions of "GBrowse Adaptors"

From GMOD
Jump to: navigation, search
m
m (Email Threads: Fixing link)
 
(4 intermediate revisions by 3 users not shown)
Line 9: Line 9:
 
! Cons
 
! Cons
 
|-
 
|-
| {{BPM|Bio::DB::SeqFeature::Store}}
+
| {{BPM|Bio::DB::SeqFeature::Store}} (use bp_seqfeature_load.pl)
 
| [[MySQL]], [[PostgreSQL]], SQLite, BerkeleyDB
 
| [[MySQL]], [[PostgreSQL]], SQLite, BerkeleyDB
 
| Many and growing fast.
 
| Many and growing fast.
Line 15: Line 15:
 
| Developed for use with [[GFF3]]; about 2X slower than Bio::DB::GFF to load a database
 
| Developed for use with [[GFF3]]; about 2X slower than Bio::DB::GFF to load a database
 
|-
 
|-
| {{BPM|Bio::DB::GFF}}
+
| {{BPM|Bio::DB::GFF}} (use bp_load_gff.pl, bp_bulk_load_gff.pl, bp_fast_load_gff.pl)
 
| A [[Glossary#Database Management System|relational database server]]: [[MySQL]], [[PostgreSQL]], Oracle, or BerkeleyDB
 
| A [[Glossary#Database Management System|relational database server]]: [[MySQL]], [[PostgreSQL]], Oracle, or BerkeleyDB
 
| Lots! (Especially [[MySQL]])
 
| Lots! (Especially [[MySQL]])
Line 21: Line 21:
 
| Does not work well with [[GFF3]] formatted data
 
| Does not work well with [[GFF3]] formatted data
 
|-
 
|-
| Bio::DB::Sam (available from CPAN)
+
| {{CPAN|Bio::DB::Sam}} (available from CPAN)
 
| [http://samtools.sourceforge.net/ SAMtools]
 
| [http://samtools.sourceforge.net/ SAMtools]
 
| Growing (particularly with GBrowse2)
 
| Growing (particularly with GBrowse2)
Line 27: Line 27:
 
| Difficult to use with GBrowse 1.70
 
| Difficult to use with GBrowse 1.70
 
|-
 
|-
| Bio::DB::Das::Chado (available from CPAN)
+
| {{CPAN|Bio::DB::BigWig}} and {{CPAN|Bio::DB::BigWigSet}} (available from CPAN)
 +
| [http://genome.ucsc.edu/FAQ/FAQformat.html UCSC Formats]
 +
| Growing (particularly with GBrowse2)
 +
| Very fast access to data in [http://genome.ucsc.edu/FAQ/FAQformat.html#format6.1 bigWig] format
 +
| Difficult to use with GBrowse 1.70
 +
|-
 +
| {{CPAN|Bio::DB::BigBed}} (available from CPAN)
 +
| [http://genome.ucsc.edu/FAQ/FAQformat.html UCSC Formats]
 +
| Growing (particularly with GBrowse2)
 +
| Very fast access to data in [http://genome.ucsc.edu/FAQ/FAQformat.html#format1.5 bigBed] format
 +
| Difficult to use with GBrowse 1.70
 +
|-
 +
| {{CPAN|Bio::DB::Das::Chado}} (available from CPAN)
 
| [[PostgreSQL]] and a [[Chado]] [[Glossary#Database Schema|schema]]
 
| [[PostgreSQL]] and a [[Chado]] [[Glossary#Database Schema|schema]]
 
| Relatively few due to the specialized nature of Chado
 
| Relatively few due to the specialized nature of Chado
Line 33: Line 45:
 
| Slow compared to Bio::DB::GFF
 
| Slow compared to Bio::DB::GFF
 
|-
 
|-
| Bio::DB::Das::BioSQL (available from CPAN)
+
| {{CPAN|Bio::DB::Das::BioSQL}} (available from CPAN)
 
| [[MySQL]] and a [[BioSQL]] schema
 
| [[MySQL]] and a [[BioSQL]] schema
 
| Relatively few due to the small number of BioSQL users
 
| Relatively few due to the small number of BioSQL users
Line 55: Line 67:
  
 
There have been some useful email threads on adaptor choices and tradeoffs.
 
There have been some useful email threads on adaptor choices and tradeoffs.
* {{NabbleThreadLink|Memory-Database-td862590.html#a862590|Memory Database}}, 2010/06
+
* {{NabbleThreadLink|Memory-Database-td862590.html|Memory Database}}, 2010/06
  
 
[[Category:GBrowse]]
 
[[Category:GBrowse]]

Latest revision as of 16:22, 7 August 2012

GBrowse has a flexible adaptor (yes, it is spelled that way and is not "adapter") system for running off various types of databases/sources. A common question is "which adaptor should I be using?" This attempts to answer that question.

Adaptor Other required software Roughly how many users Pros Cons
Bio::DB::SeqFeature::Store (use bp_seqfeature_load.pl) MySQL, PostgreSQL, SQLite, BerkeleyDB Many and growing fast. Roughly 4X faster than Bio::DB::GFF for the same data; designed to work with GFF3 Developed for use with GFF3; about 2X slower than Bio::DB::GFF to load a database
Bio::DB::GFF (use bp_load_gff.pl, bp_bulk_load_gff.pl, bp_fast_load_gff.pl) A relational database server: MySQL, PostgreSQL, Oracle, or BerkeleyDB Lots! (Especially MySQL) Quite fast; large user base; Have to use this if your data is in the (now deprecated) GFF2 format. Does not work well with GFF3 formatted data
Bio::DB::Sam (available from CPAN) SAMtools Growing (particularly with GBrowse2) Very fast access to NextGen sequencing data Difficult to use with GBrowse 1.70
Bio::DB::BigWig and Bio::DB::BigWigSet (available from CPAN) UCSC Formats Growing (particularly with GBrowse2) Very fast access to data in bigWig format Difficult to use with GBrowse 1.70
Bio::DB::BigBed (available from CPAN) UCSC Formats Growing (particularly with GBrowse2) Very fast access to data in bigBed format Difficult to use with GBrowse 1.70
Bio::DB::Das::Chado (available from CPAN) PostgreSQL and a Chado schema Relatively few due to the specialized nature of Chado Allows 'live' viewing of the features in a Chado database Slow compared to Bio::DB::GFF
Bio::DB::Das::BioSQL (available from CPAN) MySQL and a BioSQL schema Relatively few due to the small number of BioSQL users Allows 'live' viewing of the features in a BioSQL database Slow compared to Bio::DB::GFF
Memory (ie, flat file database using either Bio::DB::GFF or SeqFeature::Store) None For real servers, none Easy for rapid development and testing Very slow for more than a few thousand features
LuceGene Lucene (searches indexed flat files) Relatively few

Email Threads

There have been some useful email threads on adaptor choices and tradeoffs.