Difference between revisions of "GBrowse Adaptors"

From GMOD
Jump to: navigation, search
(fixing a few links)
m (Email Threads: Fixing link)
 
(14 intermediate revisions by 4 users not shown)
Line 1: Line 1:
GBrowse has available several adaptors (yes, it is spelled that way and is not "adapters") available for several different data sources. A common question is "which adaptor should I be using?"  This page is an attempt to answer that question.
+
[[GBrowse]] has a flexible adaptor (yes, it is spelled that way and is not "adapter") system for running off various types of databases/sources. A common question is "which adaptor should I be using?"  This attempts to answer that question.
 
+
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 10: Line 9:
 
! Cons
 
! Cons
 
|-
 
|-
| {{BPM|Bio::DB::GFF}}
+
| {{BPM|Bio::DB::SeqFeature::Store}} (use bp_seqfeature_load.pl)
 +
| [[MySQL]], [[PostgreSQL]], SQLite, BerkeleyDB
 +
| Many and growing fast.
 +
| Roughly 4X faster than Bio::DB::GFF for the same data; designed to work with [[GFF3]]
 +
| Developed for use with [[GFF3]]; about 2X slower than Bio::DB::GFF to load a database
 +
|-
 +
| {{BPM|Bio::DB::GFF}} (use bp_load_gff.pl, bp_bulk_load_gff.pl, bp_fast_load_gff.pl)
 
| A [[Glossary#Database Management System|relational database server]]: [[MySQL]], [[PostgreSQL]], Oracle, or BerkeleyDB
 
| A [[Glossary#Database Management System|relational database server]]: [[MySQL]], [[PostgreSQL]], Oracle, or BerkeleyDB
 
| Lots! (Especially [[MySQL]])
 
| Lots! (Especially [[MySQL]])
| Quite fast; large user base
+
| Quite fast; large user base; Have to use this if your data is in the (now deprecated) [[GFF2]] format.
| Doesn't always work well with some GFF3 formated data
+
| Does not work well with [[GFF3]] formatted data
 
|-
 
|-
| {{BPM|Bio::DB::SeqFeature::Store}}
+
| {{CPAN|Bio::DB::Sam}} (available from CPAN)
| [[MySQL]]
+
| [http://samtools.sourceforge.net/ SAMtools]
| Relatively few (due to it's being relatively new)
+
| Growing (particularly with GBrowse2)
| Roughly 4X faster than Bio::DB::GFF for the same data; designed to work will GFF3
+
| Very fast access to NextGen sequencing data
| Still a new tool developed for use with GFF3 so there is a much smaller user base; about 2X slower than Bio::DB::GFF to load a database
+
| Difficult to use with GBrowse 1.70
 
|-
 
|-
| Bio::DB::Das::Chado (distributed with [[GBrowse]])
+
| {{CPAN|Bio::DB::BigWig}} and {{CPAN|Bio::DB::BigWigSet}} (available from CPAN)
 +
| [http://genome.ucsc.edu/FAQ/FAQformat.html UCSC Formats]
 +
| Growing (particularly with GBrowse2)
 +
| Very fast access to data in [http://genome.ucsc.edu/FAQ/FAQformat.html#format6.1 bigWig] format
 +
| Difficult to use with GBrowse 1.70
 +
|-
 +
| {{CPAN|Bio::DB::BigBed}} (available from CPAN)
 +
| [http://genome.ucsc.edu/FAQ/FAQformat.html UCSC Formats]
 +
| Growing (particularly with GBrowse2)
 +
| Very fast access to data in [http://genome.ucsc.edu/FAQ/FAQformat.html#format1.5 bigBed] format
 +
| Difficult to use with GBrowse 1.70
 +
|-
 +
| {{CPAN|Bio::DB::Das::Chado}} (available from CPAN)
 
| [[PostgreSQL]] and a [[Chado]] [[Glossary#Database Schema|schema]]
 
| [[PostgreSQL]] and a [[Chado]] [[Glossary#Database Schema|schema]]
 
| Relatively few due to the specialized nature of Chado
 
| Relatively few due to the specialized nature of Chado
Line 28: Line 45:
 
| Slow compared to Bio::DB::GFF
 
| Slow compared to Bio::DB::GFF
 
|-
 
|-
| Bio::DB::Das::BioSQL (distributed with [[GBrowse]])
+
| {{CPAN|Bio::DB::Das::BioSQL}} (available from CPAN)
 
| [[MySQL]] and a [[BioSQL]] schema
 
| [[MySQL]] and a [[BioSQL]] schema
 
| Relatively few due to the small number of BioSQL users
 
| Relatively few due to the small number of BioSQL users
Line 34: Line 51:
 
| Slow compared to Bio::DB::GFF
 
| Slow compared to Bio::DB::GFF
 
|-
 
|-
| Memory (ie, flat file database)
+
| Memory (ie, flat file database using either Bio::DB::GFF or SeqFeature::Store)
 
| None
 
| None
 
| For real servers, none
 
| For real servers, none
Line 40: Line 57:
 
| Very slow for more than a few thousand features
 
| Very slow for more than a few thousand features
 
|-
 
|-
| [[Lucegene]]
+
| [[LuceGene]]
 
| Lucene (searches indexed flat files)
 
| Lucene (searches indexed flat files)
 
| Relatively few
 
| Relatively few
Line 46: Line 63:
 
|
 
|
 
|}
 
|}
 +
 +
== Email Threads ==
 +
 +
There have been some useful email threads on adaptor choices and tradeoffs.
 +
* {{NabbleThreadLink|Memory-Database-td862590.html|Memory Database}}, 2010/06
  
 
[[Category:GBrowse]]
 
[[Category:GBrowse]]
 
[[Category:DAS]]
 
[[Category:DAS]]
 +
[[Category:BioPerl]]
 +
[[Category:Chado]]
 +
[[Category:LuceGene]]
 +
[[Category:MySQL]]
 +
[[Category:PostgreSQL]]

Latest revision as of 16:22, 7 August 2012

GBrowse has a flexible adaptor (yes, it is spelled that way and is not "adapter") system for running off various types of databases/sources. A common question is "which adaptor should I be using?" This attempts to answer that question.

Adaptor Other required software Roughly how many users Pros Cons
Bio::DB::SeqFeature::Store (use bp_seqfeature_load.pl) MySQL, PostgreSQL, SQLite, BerkeleyDB Many and growing fast. Roughly 4X faster than Bio::DB::GFF for the same data; designed to work with GFF3 Developed for use with GFF3; about 2X slower than Bio::DB::GFF to load a database
Bio::DB::GFF (use bp_load_gff.pl, bp_bulk_load_gff.pl, bp_fast_load_gff.pl) A relational database server: MySQL, PostgreSQL, Oracle, or BerkeleyDB Lots! (Especially MySQL) Quite fast; large user base; Have to use this if your data is in the (now deprecated) GFF2 format. Does not work well with GFF3 formatted data
Bio::DB::Sam (available from CPAN) SAMtools Growing (particularly with GBrowse2) Very fast access to NextGen sequencing data Difficult to use with GBrowse 1.70
Bio::DB::BigWig and Bio::DB::BigWigSet (available from CPAN) UCSC Formats Growing (particularly with GBrowse2) Very fast access to data in bigWig format Difficult to use with GBrowse 1.70
Bio::DB::BigBed (available from CPAN) UCSC Formats Growing (particularly with GBrowse2) Very fast access to data in bigBed format Difficult to use with GBrowse 1.70
Bio::DB::Das::Chado (available from CPAN) PostgreSQL and a Chado schema Relatively few due to the specialized nature of Chado Allows 'live' viewing of the features in a Chado database Slow compared to Bio::DB::GFF
Bio::DB::Das::BioSQL (available from CPAN) MySQL and a BioSQL schema Relatively few due to the small number of BioSQL users Allows 'live' viewing of the features in a BioSQL database Slow compared to Bio::DB::GFF
Memory (ie, flat file database using either Bio::DB::GFF or SeqFeature::Store) None For real servers, none Easy for rapid development and testing Very slow for more than a few thousand features
LuceGene Lucene (searches indexed flat files) Relatively few

Email Threads

There have been some useful email threads on adaptor choices and tradeoffs.