Difference between revisions of "MotifFinder.pm"
Line 20: | Line 20: | ||
However, you can also add your own PFMs to the toggle section "Paste PFMs Here" in fasta format(arrange rows in A C G T order) | However, you can also add your own PFMs to the toggle section "Paste PFMs Here" in fasta format(arrange rows in A C G T order) | ||
− | + | ||
e.g. | e.g. | ||
>name of the matrix | >name of the matrix | ||
Line 35: | Line 35: | ||
The problem is to find occurrences of known patterns(represented by position matrix) in new sequences. | The problem is to find occurrences of known patterns(represented by position matrix) in new sequences. | ||
− | + | =caculate similarity score= | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | =algorithms= | |
− | + | *Backtrack: use recursive function to build all possible motifs, terminate recursion when an intermediate score is not reached. | |
+ | *Brute-Force: calculate the similarity score across the whole region using a sliding window of motif size | ||
− | + | This program uses a combined strategy by choosing between above two methods depending on the length of the motif and the cutoff score. |
Revision as of 23:51, 5 April 2010
MotifFinder.pm is a GBrowse plugin written by Xiaoqi Shi. It finds sequence specific motifs using Position Weight Matrix
and display results graphically as tracks in the genome browser. Please feel free to contact the author for help or more information.
- Follow this link for background reading of Position Weight Matrix
Contents
How to use MotifFinder plugin
MotifFinder parameters
- Reasonable default options are provided for each parameter.
- Threshold: a cutoff score between 0.8 to 1 is recommended.
- Background Probability: should be inputed in (A C G T) order.
- Indel Size: currently only small Indels(length under 6) can be handled.
Position Frequency Matrices
Existing PFMs were loaded from file 'matrices.txt' under GBrowse configuration directory, they are mostly curated PFMs from existing publications.
Click here for a list of all the available PFMs from WormBase
However, you can also add your own PFMs to the toggle section "Paste PFMs Here" in fasta format(arrange rows in A C G T order)
e.g. >name of the matrix 0 1 1 1 1 23 0 0 1 7 0 0 19 10 18 1 13 14 2 20 0 17 0 7 16 0 2 4 24 1 0 0 0 26 8 2 0 10 7 14 3 0 11 11 1 6 0 0 17 19 0 0
Indel detection
User can search for sequence motifs that contain Indels up to certain length. This part hasn't been fully tested and depends on future improvement.
How is the motif predicted?
The problem is to find occurrences of known patterns(represented by position matrix) in new sequences.
caculate similarity score
algorithms
- Backtrack: use recursive function to build all possible motifs, terminate recursion when an intermediate score is not reached.
- Brute-Force: calculate the similarity score across the whole region using a sliding window of motif size
This program uses a combined strategy by choosing between above two methods depending on the length of the motif and the cutoff score.