The Sherlock Network Search Engine

Sherlock is an universal extensible system for collecting documents distributed across the network (e.g., on the World-Wide Web), indexing them and offering full-text search capabilities.

The system is under development now and it's sometimes being run for experimental purposes. Currently finished modules include:

gatherd
Information gathering supervisor (starts all download and analysis modules of the system and controls object queue).
httpget
Downloads files via HTTP.
fileget
Processes locally accessible files.
htmlchew
Analyses HTML documents and extracts data from them.
textchew
The same for ASCII texts.
gived
Object information distribution daemon.
objget
Client for gived.
dbuild
Database builder for the search engine.
sherlockd
The full-text search engine.
scgi
WWW interface for the search engine.

New modules will probably appear soon.

The current version can be downloaded here and sometimes runs here.

If you want to help with this projcet, suggest any extensions or report bugs, feel free to contact the author.


Last modification 28. 7. 1997 by Martin Mares