Difference between revisions of "Textpresso"

From GMOD
Jump to: navigation, search
m
m
Line 13: Line 13:
 
|language=Perl
 
|language=Perl
 
|audience=public
 
|audience=public
 +
|value=yes=Yes
 
|logo=TextpressoLogo.jpg
 
|logo=TextpressoLogo.jpg
 
|contact=[mailto:mueller@caltech.edu Hans-Michael Müller]
 
|contact=[mailto:mueller@caltech.edu Hans-Michael Müller]
Line 32: Line 33:
 
|doc=* [http://www.textpresso.org/celegans/misc/Textpresso-2.0-documentation/ Installation Guide]
 
|doc=* [http://www.textpresso.org/celegans/misc/Textpresso-2.0-documentation/ Installation Guide]
 
* [http://textpresso-www.caltech.edu/cgi-bin/celegans/user_guide User Guide].
 
* [http://textpresso-www.caltech.edu/cgi-bin/celegans/user_guide User Guide].
}}
 
[[File:TextpressoLogo.jpg|center|Textpresso]]
 
 
 
{{ComponentBox
 
|{{ComponentBoxStatus}}
 
|{{ComponentBoxSectionHeader|Resources}}
 
*[http://www.textpresso.org/ Home Page]
 
*[http://www.textpresso.org/cgi-bin/celegans/user_guide User Guide]
 
*[http://www.textpresso.org/downloads.html Download]
 
| | | | |}}
 
 
'''Textpresso''' is an information extracting and processing (text mining) package for biological literature whose  capabilities go far beyond that of a simple keyword search engine. The two  key elements are the collection of the full text of scientific articles split  into individual sentences, and the implementation of semantic categories, for  which a database of articles and individual sentences can be searched. The  source of the full text articles are PDFs, and additional bibliographical  information that is obtained from other citation databases can be processed  as well.  [http://ilex.caltech.edu/trac/alere/ Alere] is a package of scripts that can be used to construct a corpus (retrieve articles) for use with '''Textpresso'''.  Textpresso is supported by a grant from the National Human Genome Research Institute at the US National Institutes of Health # HG004090.
 
 
== Versions ==
 
 
=== Textpresso 1 & 2 ===
 
 
Textpresso was initially developed by Hans-Michael Müller, Eimear Kenny and Paul W. Sternberg, with contributions from Juancarlos Chan and David Chen. Textpresso 2.0 was developed by Hans-Michael Müller with contributions from Arun Rangarajan and Tracy K. Teal. Textpresso is part of [[:Category:WormBase|WormBase]] at the California Institute of Technology, California.
 
  
 
=== Textpresso 2 Extensions ===
 
=== Textpresso 2 Extensions ===
Line 63: Line 45:
  
 
These extensions were written by Nathan Liles of the [[User:JimHu|Hu Lab]] at Texas A&M University. Nathan [[:Image:Jan2010Testpresso.pdf|presented this work]] at the [[January 2010 GMOD Meeting]].  The Textpresso team plans to fold these extensions back into the main Textpresso code base in the future.
 
These extensions were written by Nathan Liles of the [[User:JimHu|Hu Lab]] at Texas A&M University. Nathan [[:Image:Jan2010Testpresso.pdf|presented this work]] at the [[January 2010 GMOD Meeting]].  The Textpresso team plans to fold these extensions back into the main Textpresso code base in the future.
 
+
}}
==Demo & Screenshots==
+
 
+
Please visit the live main site at [http://www.textpresso.org www.textpresso.org] for examples and  screenshots.
+
 
+
 
+
==Requirements==
+
 
+
The package is designed for [[:Category:Linux|Linux]] operating systems and is tested to run on  an Intel x86 based hardware. The required minimal disk space is around 6GB  per 1000 full text papers, half of it is used by the publically (via WWW)  accessible database, while the other half is needed for database preparation  and maintenance. If necessary, the latter can be reduced.
+
 
+
* Software for a  world wide web server such as Apache needs to be installed, and an Internet  connection should exist
+
* Perl 5.6.1 or higher  should be present, and the most common Perl packages.
+
* The installation script requires bash
+
* {{CPAN|XML::Checker::Parser}}
+
* {{CPAN|XML::DOM::Parser}}
+
* {{CPAN|XML::XQL::DOM}}
+
* {{CPAN|XML::Checker::Parser}},
+
* {{CPAN|Mailer::Mail}} (in MailTools-1.58)
+
* {{CPAN|PDF::Create}} (in PDF-Create).
+
* If the model organism database is based on ACeDB then {{CPAN|AcePerl}} is required
+
* XPDF  (http://www.foolabs.com/xpdf/), the pdftotext converter
+
* RBT, a part-of-speech tagger developed by Eric Brill  ([http://research.microsoft.com/~brill/blog.htm blog], [http://research.microsoft.com/~brill/ homepage]''deprecated'').  RBT seems to be no longer available at JHU.  A copy appears to be available at [http://www.cst.dk/download/tagger/RBT1_14.tar.Z Københavns Universitet] (I didn't download and check it).  RBT is distributed free of charge  under a license of the Massachusetts Institute of Technology and the University of Pennsylvania. If you want to recompile either of the packages,  you additionally need a C compiler.
+
 
+
This package has been tested with the Linux RedHat 9.0 distribution  (http://www.redhat.com) and Debian Linux 3.1 (http://www.debian.org) . Both  work with a 2.4.20 kernel or higher.
+
 
+
==Documentation==
+
 
+
* [http://www.textpresso.org/celegans/misc/Textpresso-2.0-documentation/ Installation Guide]
+
* [http://textpresso-www.caltech.edu/cgi-bin/celegans/user_guide User Guide].
+
 
+
==Contact==
+
 
+
Hans-Michael Müller, mueller (at) caltech.edu
+
 
+
==Downloads==
+
 
+
[http://www.textpresso.org/downloads.html Download from textpresso.org]
+
 
+
 
[[Category:GMOD Components]]
 
[[Category:GMOD Components]]
 
[[Category:Textpresso]]
 
[[Category:Textpresso]]

Revision as of 17:54, 17 October 2013

Facts about "Textpresso"RDF feed
Available on platformweb +
Has URLhttp://textpresso.org +, http://textpresso.org/downloads.html +, http://textpresso-www.caltech.edu/cgi-bin/celegans/user_guide +, http://whis.caltech.edu/textpresso/ +, http://textpresso.yeastgenome.org/textpresso/ + and http://www.textpresso.org/celegans/ +
Has descriptionTextpresso is an information extracting anTextpresso is an information extracting and processing (text mining) package for biological literature whose capabilities go far beyond that of a simple keyword search engine. The two key elements are the collection of the full text of scientific articles split into individual sentences, and the implementation of semantic categories, for which a database of articles and individual sentences can be searched. The source of the full text articles are PDFs, and additional bibliographical information that is obtained from other citation databases can be processed as well. Alere is a package of scripts that can be used to construct a corpus (retrieve articles) for use with Textpresso. Textpresso is supported by a grant from the National Human Genome Research Institute at the US National Institutes of Health # HG004090. National Institutes of Health # HG004090. +
Has development statusactive +
Has input formatPlain text +, PDF + and html +
Has licenceModified GPL +
Has logoTextpressoLogo.jpg +
Has output formatXML + and text +
Has software maturity statusmature +
Has support statusactive +
Has titleTextpresso User Guide +, Textpresso for Sea Urchin +, Textpresso for S. cerevisiae + and Textpresso for C. elegans +
Has topicTextpresso +
Is open sourceCaveats apply +
Link typewebsite +, download +, documentation + and wild URL +
Tool functionality or classificationLiterature tools + and Text mining +
Written in languagePerl +
Has subobjectThis property is a special property in this wiki.Textpresso#http://textpresso.org +, Textpresso#http://textpresso.org/downloads.html +, Textpresso#http://textpresso-www.caltech.edu/cgi-bin/celegans/user_guide +, Textpresso#http://whis.caltech.edu/textpresso/ +, Textpresso#http://textpresso.yeastgenome.org/textpresso/ + and Textpresso#http://www.textpresso.org/celegans/ +