GBrowse karyotype ideogram.pl

From GMOD
Revision as of 14:25, 5 May 2011 by DanBolser (Talk | contribs)

Jump to: navigation, search

How to get cytoband data for the ideogram glyph of GBrowse_karyotype

perl script

  • Cytoband data for the human genome and also for other species is in the ensembl database.
  • Below is a perl script that can be used to grab the cytoband data from ensembl's public ftp server at ensembldb.ensembl.org.
  • It is included with the GBrowse_karyotype distribution

<perl>

  1. !/usr/bin/perl -w

use strict; use DBI;

my $database = shift;

unless ($database) {

 print "No database specified: Usage: ./get_ensembl_cytoband_data.pl database\n";
 print "This is a list of ensembl databases\n";
 open IN, "mysql -uanonymous -hensembldb.ensembl.org -e 'show databases' | grep core | grep -v 'expression' |";
 my @string;
 while (<IN>) {
   chomp;
   push @string, $_;
   if (@string == 4) {
     print join("\t", @string), "\n";
     @string = ();
   }
 }
 print join("\t", @string), "\n" if @string;
 exit;

}

my $host = 'ensembldb.ensembl.org'; my $query = 'SELECT name,seq_region_start,seq_region_end,band,stain

FROM seq_region,karyotype
WHERE seq_region.seq_region_id = karyotype.seq_region_id;';


my $dbh = DBI->connect( "dbi:mysql:$database:$host", 'anonymous' )

   or die DBI->errstr;

my $sth = $dbh->prepare($query) or die $dbh->errstr; $sth->execute or die $sth->errstr;

my ($cent_start,$prev_chr,$chr_end,$segments,$gff,$chroms); my $chr_start = 1; while (my @band = $sth->fetchrow_array ) {

 my ($chr,$start,$end,$band,$stain) = @band;
 my $class = 'Chromosome';
 my $method;
 $chr =~ s/chr//;
 if ($stain eq 'acen' && !$cent_start) {
   $cent_start = $start;
   next;
 }
 elsif ($cent_start) {
   $method = 'centromere';
   $band   = "$chr\_cent";
   $start  = $cent_start;
   $stain  = ;
   $cent_start = 0;
 }
 else {
   $method = 'chromosome_band';
 }
 $gff .= join("\t", $chr, 'ensembl', lc $method, $start, $end,
              qw/. . ./,qq{Parent=$chr;Name=$band;Alias=$band});
 $gff .= $stain ? ";stain=$stain\n" : "\n";
 if ($prev_chr && $prev_chr !~ /$chr/) {
   $segments .= "\#\#sequence-region $prev_chr $chr_start $chr_end\n";
   $chroms   .=join( "\t",
                     $prev_chr,
                     qw/ensembl chromosome/,
                     $chr_start,
                     $chr_end,
                     qw/. . ./,
                     "ID=$prev_chr")."\n";
   $chr_start = 1;
 }
 $prev_chr = $chr;
 $chr_end  = $end;

}

if (!$gff) {

 print "\nSorry, there are no cytoband data for $database\n\n";
 exit;

}

$segments .= "\#\#sequence-region $prev_chr $chr_start $chr_end\n"; $chroms .=join( "\t",

                 $prev_chr,
                 qw/ensembl chromosome/,
                 $chr_start,
                 $chr_end,
                 qw/. . ./,
                 "ID=$prev_chr")."\n";

print "##gff-version 3\n"; print "#Source ENSEMBL database: $database\n"; print $segments,$chroms,$gff;

</perl>

Other information

  • Currently ideograms for human, rat and mouse are available
  • To see the current database list, try the command:
mysql -uanonymous -hensembldb.ensembl.org -e 'show databases' \
| grep core | grep 'sapiens\|rattus\|mus' | grep -v 'expression'