Products
 

 

 

Manatee

Manatee is a corpus manager --- software tool for a text processing. A corpus is understood here as a huge collection of texts in electronic form. It is used as a resource of the empirical language data, i.e. words, their meanings and contexts they occur in. Corpora can be employed in many fields of linguistics (morphology, syntax, semantics, stylistics, sociolinguistics etc.) and the corpus managers are primary tools enabling corpus exploration.

The whole system consists of two parts: server (manateesrv) and a graphical user interface (GUI) client Bonito. You can download either the Manatee (server and client) or separate client from the download page.

This is a complete solution for a text corpus management. 

System requirements

Server
OS Linux (more platforms are in preparation), hardware requirements depends on corpora sizes and respective annotation.
Client
UNIX + X Window System, Windows 95/98/2000/NT, Macintosh 

The new version of the client is a web-based application. Any modern web browser can be used. The server is running as a CGI script on a web server.


Pre-encoded SUSANNE Corpus

If you would like to test the Manatee system immediately you can download pre-encoded SUSANNE Corpus from the download page. From its documentation:
The SUSANNE Corpus was created, with the sponsorship of the Economic and Social Research Council (UK) at the University of Sussex, as part of the process of developing a comprehensive language-engineering-oriented taxonomy and annotation scheme for the (logical and surface) grammar of English. The Corpus itself comprises an approximately 130,000-word subset of the Brown Corpus of American English, annotated in accordance with the SUSANNE scheme.

The Corpus was build from the original SUSANNE (Release 5) sources (see http://www.grsampson.net/) using a tranformation script which made some simplification:

  • elimination of reference and parse columns
  • translation of character references into ISO-Latin-1 characters
  • creation of head line, paragraph and document boundaries