Difference between revisions of "GBrowse Adaptors"
From GMOD
(fixing a few links) |
m (→Email Threads: Fixing link) |
||
(14 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | GBrowse has | + | [[GBrowse]] has a flexible adaptor (yes, it is spelled that way and is not "adapter") system for running off various types of databases/sources. A common question is "which adaptor should I be using?" This attempts to answer that question. |
− | + | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 10: | Line 9: | ||
! Cons | ! Cons | ||
|- | |- | ||
− | | {{BPM|Bio::DB::GFF}} | + | | {{BPM|Bio::DB::SeqFeature::Store}} (use bp_seqfeature_load.pl) |
+ | | [[MySQL]], [[PostgreSQL]], SQLite, BerkeleyDB | ||
+ | | Many and growing fast. | ||
+ | | Roughly 4X faster than Bio::DB::GFF for the same data; designed to work with [[GFF3]] | ||
+ | | Developed for use with [[GFF3]]; about 2X slower than Bio::DB::GFF to load a database | ||
+ | |- | ||
+ | | {{BPM|Bio::DB::GFF}} (use bp_load_gff.pl, bp_bulk_load_gff.pl, bp_fast_load_gff.pl) | ||
| A [[Glossary#Database Management System|relational database server]]: [[MySQL]], [[PostgreSQL]], Oracle, or BerkeleyDB | | A [[Glossary#Database Management System|relational database server]]: [[MySQL]], [[PostgreSQL]], Oracle, or BerkeleyDB | ||
| Lots! (Especially [[MySQL]]) | | Lots! (Especially [[MySQL]]) | ||
− | | Quite fast; large user base | + | | Quite fast; large user base; Have to use this if your data is in the (now deprecated) [[GFF2]] format. |
− | | | + | | Does not work well with [[GFF3]] formatted data |
|- | |- | ||
− | | {{ | + | | {{CPAN|Bio::DB::Sam}} (available from CPAN) |
− | | [ | + | | [http://samtools.sourceforge.net/ SAMtools] |
− | | | + | | Growing (particularly with GBrowse2) |
− | | | + | | Very fast access to NextGen sequencing data |
− | | | + | | Difficult to use with GBrowse 1.70 |
|- | |- | ||
− | | Bio::DB:: | + | | {{CPAN|Bio::DB::BigWig}} and {{CPAN|Bio::DB::BigWigSet}} (available from CPAN) |
+ | | [http://genome.ucsc.edu/FAQ/FAQformat.html UCSC Formats] | ||
+ | | Growing (particularly with GBrowse2) | ||
+ | | Very fast access to data in [http://genome.ucsc.edu/FAQ/FAQformat.html#format6.1 bigWig] format | ||
+ | | Difficult to use with GBrowse 1.70 | ||
+ | |- | ||
+ | | {{CPAN|Bio::DB::BigBed}} (available from CPAN) | ||
+ | | [http://genome.ucsc.edu/FAQ/FAQformat.html UCSC Formats] | ||
+ | | Growing (particularly with GBrowse2) | ||
+ | | Very fast access to data in [http://genome.ucsc.edu/FAQ/FAQformat.html#format1.5 bigBed] format | ||
+ | | Difficult to use with GBrowse 1.70 | ||
+ | |- | ||
+ | | {{CPAN|Bio::DB::Das::Chado}} (available from CPAN) | ||
| [[PostgreSQL]] and a [[Chado]] [[Glossary#Database Schema|schema]] | | [[PostgreSQL]] and a [[Chado]] [[Glossary#Database Schema|schema]] | ||
| Relatively few due to the specialized nature of Chado | | Relatively few due to the specialized nature of Chado | ||
Line 28: | Line 45: | ||
| Slow compared to Bio::DB::GFF | | Slow compared to Bio::DB::GFF | ||
|- | |- | ||
− | | Bio::DB::Das::BioSQL ( | + | | {{CPAN|Bio::DB::Das::BioSQL}} (available from CPAN) |
| [[MySQL]] and a [[BioSQL]] schema | | [[MySQL]] and a [[BioSQL]] schema | ||
| Relatively few due to the small number of BioSQL users | | Relatively few due to the small number of BioSQL users | ||
Line 34: | Line 51: | ||
| Slow compared to Bio::DB::GFF | | Slow compared to Bio::DB::GFF | ||
|- | |- | ||
− | | Memory (ie, flat file database) | + | | Memory (ie, flat file database using either Bio::DB::GFF or SeqFeature::Store) |
| None | | None | ||
| For real servers, none | | For real servers, none | ||
Line 40: | Line 57: | ||
| Very slow for more than a few thousand features | | Very slow for more than a few thousand features | ||
|- | |- | ||
− | | [[ | + | | [[LuceGene]] |
| Lucene (searches indexed flat files) | | Lucene (searches indexed flat files) | ||
| Relatively few | | Relatively few | ||
Line 46: | Line 63: | ||
| | | | ||
|} | |} | ||
+ | |||
+ | == Email Threads == | ||
+ | |||
+ | There have been some useful email threads on adaptor choices and tradeoffs. | ||
+ | * {{NabbleThreadLink|Memory-Database-td862590.html|Memory Database}}, 2010/06 | ||
[[Category:GBrowse]] | [[Category:GBrowse]] | ||
[[Category:DAS]] | [[Category:DAS]] | ||
+ | [[Category:BioPerl]] | ||
+ | [[Category:Chado]] | ||
+ | [[Category:LuceGene]] | ||
+ | [[Category:MySQL]] | ||
+ | [[Category:PostgreSQL]] |
Latest revision as of 16:22, 7 August 2012
GBrowse has a flexible adaptor (yes, it is spelled that way and is not "adapter") system for running off various types of databases/sources. A common question is "which adaptor should I be using?" This attempts to answer that question.
Adaptor | Other required software | Roughly how many users | Pros | Cons |
---|---|---|---|---|
Bio::DB::SeqFeature::Store (use bp_seqfeature_load.pl) | MySQL, PostgreSQL, SQLite, BerkeleyDB | Many and growing fast. | Roughly 4X faster than Bio::DB::GFF for the same data; designed to work with GFF3 | Developed for use with GFF3; about 2X slower than Bio::DB::GFF to load a database |
Bio::DB::GFF (use bp_load_gff.pl, bp_bulk_load_gff.pl, bp_fast_load_gff.pl) | A relational database server: MySQL, PostgreSQL, Oracle, or BerkeleyDB | Lots! (Especially MySQL) | Quite fast; large user base; Have to use this if your data is in the (now deprecated) GFF2 format. | Does not work well with GFF3 formatted data |
Bio::DB::Sam (available from CPAN) | SAMtools | Growing (particularly with GBrowse2) | Very fast access to NextGen sequencing data | Difficult to use with GBrowse 1.70 |
Bio::DB::BigWig and Bio::DB::BigWigSet (available from CPAN) | UCSC Formats | Growing (particularly with GBrowse2) | Very fast access to data in bigWig format | Difficult to use with GBrowse 1.70 |
Bio::DB::BigBed (available from CPAN) | UCSC Formats | Growing (particularly with GBrowse2) | Very fast access to data in bigBed format | Difficult to use with GBrowse 1.70 |
Bio::DB::Das::Chado (available from CPAN) | PostgreSQL and a Chado schema | Relatively few due to the specialized nature of Chado | Allows 'live' viewing of the features in a Chado database | Slow compared to Bio::DB::GFF |
Bio::DB::Das::BioSQL (available from CPAN) | MySQL and a BioSQL schema | Relatively few due to the small number of BioSQL users | Allows 'live' viewing of the features in a BioSQL database | Slow compared to Bio::DB::GFF |
Memory (ie, flat file database using either Bio::DB::GFF or SeqFeature::Store) | None | For real servers, none | Easy for rapid development and testing | Very slow for more than a few thousand features |
LuceGene | Lucene (searches indexed flat files) | Relatively few |
Email Threads
There have been some useful email threads on adaptor choices and tradeoffs.
- Memory Database, 2010/06