This glossary explains terms that
- are specific to the GMOD project, or
- are computing terms that are used in the GMOD project.
This glossary does not define biology terms.
CVS is the source code control system used by most of GMOD. Source code control systems, also known as revision control or version control systems are used to record changes to computer files.
A database can be any set of organized data that is readable by a computer. It can be anywhere from an implementation of a database schema in a particular database management system to regular files that have a defined format.
Database Management System
Database management systems (DBMSs) are software systems that can manage data. PostgreSQL, MySQL, Oracle and Sybase are all examples of DBMSs. DBMSs are containers of databases. That is, they are the systems that manage databases, which is distinct from the data that they manage.
Most DBMSs are relational, which is a particular way of representing data. All DBMSs that GMOD is concerned with are relational, so GMOD uses the termsdatabase management system and relational database management system (RDBMS) interchangeably.
A database schema is the design of a particular database, independent of its contents. Chado is an example of a database schema. Designs (like Chado) can be reused across multiple databases.
Gene Finder Format
General Feature Format
If you get into the more technical side of GMOD, loading databases, you will come across this term. It refers to a tab-delimited file format for storing sequence annotations (curiously, the acronym has different definitions, Gene Finder Format, or General Feature Format). Here is an example:
test.fa RepeatMasker similarity 238 289 15.4 + . Target "Motif:(TA)n" 2 53
The line above describes a match to a sequence motif (TAn) on a sequence contained in the "file.fa", where the match goes from position 238 to position 289 on the "+" strand.
One encounters GFF files frequently in the GMOD world. It's used as interchange format, so a script or an application may create GFF as output and some other script or application may load this GFF into a database. Or it may the database itself. There are ways to create databases directly from GFF files, though it turns out that these work well only with smaller sets of data. See GFF for more information.
Java is arguably the world's most popular programming language but it is not as popular for command-line work on Unix as Perl. It's encountered in GMOD primarily as a language to construct user interfaces (e.g. Apollo).
- Category:Java - GMOD pages tagged as related to Java.
Middleware is software that connects other software components so they can talk together. You can think of it as project plumbing. Like plumbing, it is hard to do well, and people take it for granted until it does not work.
- Category:Middleware - List of GMOD pages tagged as related to middleware.
See Operating System.
Perl is the programming language most used in the bioinformatics realm, and it is the language most used by GMOD developers. It is well-suited to text and data processing and is also characterized by an extensive open source library, so it's highly functional. Many of GMOD components use BioPerl, a bioinformatics toolkit written in Perl.
Some parts of GMOD, like GBrowse, can be extended or customized using Perl but beginners' skills in Perl is sufficient for this work.
- Perl Home Page
- Perl's open source library repository.
- Category:Perl - GMOD pages tagged as related to Perl.
Relational Database Management System
See Database Schema
SQL is a standard query language used with relational database management systems (DBMSs). Is is used to update and retrieve data that is in a database.
SQL is generally similar for different DBMSs but varies in many details from one DBMS to another.
XML is an acronym for eXtensible Markup Language, a data format used primarily for sharing data. It looks similar to HTML, but has a much tighter syntax than does HTML.