MAN page from Mandriva 2010 webgrep-2.12-5mdv2010.0.i586.rpm
Section: User Commands (1)
Updated: Feb 1999Index
webfgrep - a poor man's web search engine.
[-ahist] [-p prefix] -- [key1,...] html-files
uses memory mapped file access and can therefore search a large number ofhtml pages in a short time. With webfgrep
and a cgi-bin front-end it is possible to build a fast web search enginefor small web sites with about 1Mb of html pages. You can specify up to 3 key word. A web page matches when it containsall 3 keys.
Please note that you must consider a number of important security issueswhen writing a cgi-bin front-end. The minimum security is to escapeall non word characters ([^A-Z_a-z0-9]) before passing the search keysto the webfgrep command line. A better security mechanism would removeany "garbage characters" and use the -s option to feed the user inputdirectly to webfgrep without passing this data to the shell. 2 samplecgi-bin front-ends are provided in the distribution of webfgrep.The sample cgi-bins are designed for searching English web pages butcan easily be modified to search also web pages based on other charactersets. You mainly need to address the issue of how characters that arespecific to your language are represented in html format.
- Anchor search, search whole words no substring search
- Prints a little help/usage information.
- Search case insensitive (works only with ISO-8859-1 character sets)
- -p prefix
- Path prefix to add when displaying the result
- Read the keys form stdin rather than from the command line.
- Text output (default is html)
Search for the complete words guido and File in all webpages in the current directory. Only web pages that containboth words do match:webfgrep -a -p http://some.hostname.com/ -- guido,File *.html
Search all html files for the sub string Linux:
(cd /home/http/html;webfgrep -p http://some.hostname.com/ -- linux `find . -name '*.htm*' -print`)
no known bugs
Guido Socher (guido.sAATTwriteme.com)
- SEE ALSO
This document was created byman2html,using the manual pages.