NOTE: We are working on migrating this site away from MediaWiki, so editing pages will be disabled for now.
GBrowse syn Database
From GMOD
Schema
- The alignment database is very simple; it has a tables for all reciprocal 'hits', or alignment features and a table for (optional) 1:1 coordinate maps
- The alignments table contains coordinate information and also support cigar-line representations and the alignment to facilitate future reconstruction of the alignment within GBrowse_syn.
Loading
- The starting point for loading the alignment database is a CLUSTALW format multiple sequence alignment
- The source of the alignment data is up to you but the supported format for entry is currently CLUSTALW
Clustal alignment format
CLUSTAL W(1.81) multiple sequence alignment c_briggsae-chrII(+)/43862-46313 ATGAGCTTCCACAAAAGCATGAGCTTTCTCAGCTTCTGCCACATCAGCATTCAAATGATC c_remanei-Crem_Contig172(-)/123228-124941 ATGAGCCTCTACAACCGCATGATTCTTTTCAGCCTCTGCCACGTCCGCATTCAAATGCTC c_brenneri-Cbre_Contig60(+)/627772-630087 ATGAGCCTCCACAACAGCATGATTTTTCTCGGCTTCCGCCACATCCGCATTCAAATGATC c_elegans-II(+)/9706834-9708803 ATGAGCCTCTACTACAGCATGATTCTTCTCAGCTTCTGCAACGTCAGCATTCAGATGATC ****** ** ** * ****** ** ** ** ** ** ** ** ******* *** ** c_briggsae-chrII(+)/43862-46313 CGCACAAATATGATGCACAAATCCACAACCTAAAGCATCTCCGATAACGTTGACCGAAGT c_remanei-Crem_Contig172(-)/123228-124941 AGCACAAATGTAATGAACGAATCCGCATCCCAACGCATCGCCAATCACATTCACAGATGT c_brenneri-Cbre_Contig60(+)/627772-630087 CGCACAAATGTAGTGGACAAATCCGCATCCCAAAGCGTCTCCGATAACATTTACCGAAGT c_elegans-II(+)/9706834-9708803 TGCACAAATGTGATGAACGAATCCACATCCCAATGCATCACCGATCACATTGACAGATGT ******** * ** ** ***** ** ** ** ** ** ** ** ** ** ** ** ** c_briggsae-chrII(+)/43862-46313 CCGGAGTCGATCCCTGAAT----------------------------------------- c_remanei-Crem_Contig172(-)/123228-124941 ACGAAGTCGGTCCCTATAAGGTATGATTTTATATGA----TGTACCATAAGGAAATAGTC c_brenneri-Cbre_Contig60(+)/627772-630087 ACGAAGTCGATCCCTGAAA---------TCAGATGAGCGGTTGACCA---GAGAACAACC c_elegans-II(+)/9706834-9708803 ACGAAGTCGGTCCCTGAAC--AATTATTT----TGA----TATA---GAAAGAAACGGTA ** ***** ***** *
NOTE: The sequence naming convention "species-seqid(strand)/start-end" shown in the above example is essential for the data to be loaded correctly
Database Loading Scripts
1) split large clustal files into one alignment/file with the script split_clustal.pl