Difference between revisions of "Textpresso"

From GMOD
Jump to: navigation, search
m (Requirements)
(Added alere link)
Line 1: Line 1:
 
==Description==
 
==Description==
  
Textpresso is a text mining system for scientific literature whose  capabilities go far beyond that of a simple keyword search engine. The two  key elements are the collection of the full text of scientific articles split  into individual sentences, and the implementation of semantic categories, for  which a database of articles and individual sentences can be searched. The  source of the full text articles are PDFs, and additional bibliographical  information that is obtained from other citation databases can be processed  as well.
+
Textpresso is a text mining system for scientific literature whose  capabilities go far beyond that of a simple keyword search engine. The two  key elements are the collection of the full text of scientific articles split  into individual sentences, and the implementation of semantic categories, for  which a database of articles and individual sentences can be searched. The  source of the full text articles are PDFs, and additional bibliographical  information that is obtained from other citation databases can be processed  as well.  [http://ilex.caltech.edu/trac/alere/ Alere] is a package of scripts that can be used wto retrieve articles for use with Textpresso .
  
  
Line 46: Line 46:
 
http://www.textpresso.org/textpresso/downloads.html
 
http://www.textpresso.org/textpresso/downloads.html
  
 +
[[Category:GMOD Components]]
 +
[[Category:Textpresso]]
  
 
[[Category:GMOD Components]]
 
[[Category:GMOD Components]]
 
[[Category:Textpresso]]
 
[[Category:Textpresso]]

Revision as of 15:48, 9 July 2008

Description

Textpresso is a text mining system for scientific literature whose capabilities go far beyond that of a simple keyword search engine. The two key elements are the collection of the full text of scientific articles split into individual sentences, and the implementation of semantic categories, for which a database of articles and individual sentences can be searched. The source of the full text articles are PDFs, and additional bibliographical information that is obtained from other citation databases can be processed as well. Alere is a package of scripts that can be used wto retrieve articles for use with Textpresso .


Demo & Screenshots

Please visit the live main site at www.textpresso.org for examples and screenshots.


Requirements

The package is designed for Linux operating systems and is tested to run on an Intel x86 based hardware. The required minimal disk space is around 6GB per 1000 full text papers, half of it is used by the publically (via WWW) accessible database, while the other half is needed for database preparation and maintenance. If necessary, the latter can be reduced.

RBT is distributed free of charge under a license of the Massachusetts Institute of Technology and the University of Pennsylvania. If you want to recompile either of the packages, you additionally need a C compiler.

This package has been tested with the Linux RedHat 9.0 distribution (http://www.redhat.com) and Debian Linux 3.1 (http://www.debian.org) . Both work with a 2.4.20 kernel or higher.

Documentation

Installation instruction can be found in the tarzipped package file and is called TextpressoManual.pdf.

A user guide is available online.


Contact

Hans-Michael Muller, mueller (at) caltech.edu


Downloads

http://www.textpresso.org/textpresso/downloads.html