MAN page from PLD mnogosearch-3.2.25-1.i386.rpm
Section: mnoGoSearch 3.2 reference manual (1)
Updated: 23 December 2002Index
indexer - indexing WWW space.
] [-s status
indexer -C[-R] [-t tag] [-u pattern] [-s status] [-ycontent-type] [configfile]
indexer -S[-R] [-ttag] [-u pattern] [-s status] [-ycontent-type] [configfile]
indexer -I[-R] [-t tag] [-u pattern] [-s status] [-ycontent-type] [configfile]
is a part ofmnoGoSearch
- search engine. The purpose ofindexer
is to walk through HTTP, HTTPS, FTP, NEWS servers as well as local file system, recursively grabbing all the documents and storing metadata about documents into SQL or built-in database in a smart and effective manner. Since every document is referenced by its corresponding URL, metadata collected byindexer
is used later in a search process.
The behaviour ofindexeris controlled mainly via configuration fileindexer.conf (5), which it reads on startup. There is a compiled-in default for configurationfile name and location, so you don't need to specify it every time you runindexer, but you can specify alternative configuration file as the last argument.
indexersupports HTML-formatted (text/html MIME type), XML-formated (text/xml MIME type) and plain text(text/plain MIME type) documents. Support for other data types is providedby using external programs, which are called "parsers". Parser should getdata of some type from stdin and put text/html or text/plain data to stdout.Seeindexer.conf(5)for details.
You may runindexerregularly fromcron (8)to keep metadata up-to-date.
indexeris also used to manipulate database. It may be used to clear some datafrom database, to output some statistics and to calculate popolarity ranking.
- Reindex all documents even if not expired.
By default indexer reindex only whose documents that are "expired", e.g.time since their last reindexing is greater than "Period" fromindexer.conf (5)file. This option disables the feature, so all documents will be reindexed,irrelevant to their state.To achieve this,indexerjust first marks all URLs as "expired". This gives thefollowing side effect: if you startindexer-aand then terminate it (for example, by pressingCtrl-C) and start again, all URLs will be considered "expired" and will bereindexed again.
- This option forceindexerto reindex documents, even if their content has not been modified. It is achived by disabling If-Modified-Since HTTP header and MD5 hash check.This is usable if you have changed someAllow,Disallow,MaxHopsor other directives in yourindexer.conf(5)file. Thus, there will be different set of rules for storing document URLs andso different set of URLs. To find out that URLs, there is a need to reindexeven-not-changed documents.
- numberReindex only givennumberof URLs and exit.
- secondslimit indexing time to a given number ofseconds
- Reindex most expired documents first.That option forces the list of documents to reindex to be sorted by lastreindexing time. That means that most "expired" documents will be reindexed first. You may or may not experience some minor delay with that option,but at least in theory it should slow down indexer a bit.
The combination of-eand-n numberis seems to be of some value. So, you can useindexer-e-n 100to reindex just 100 most expired documents.
- Quick startup. This mode is useful if you haven't added or modifiedServercommands.indexerwill not insert URLs given in Server commands into database which leadsto some startup speed-up.
- skip locking (this option affects only MySQL and PostgreSQL only).
- Isert new URLs. New URL must be specified using-uor-foptions.
- secondsSpecifies time in seconds to pause after each URL.
- Turns off warnings before clearing database.
- Index documents with less depth (hops value) first.
- Do not try to reduce remote servers load by randomising url fetch list before indexing (recommended for very big number of URLs).
- Block start more than one indexer instances
- numberRun numberthreads, if multithreaded mnoGoSearch version was compiled.
- Calculate popularity rank before program exit.
- -t tag
- -u pattern
- -s status
- -g category
- -y content-type
- Set URL filters on tag, pattern,status,categoryandcontent-typerespectively.
tagis a server tag that you can arbitrary set in config fileindexer.conf (5)
patternis a SQL LIKE wildcard for URL. In short, underscore (_) means "any symbol", and per cent (%) means "any symbols", and the comparison is case insensitive. For example,indexer-u %izhcom.ru%will reindex all documents that URLs contains string "izhcom.ru".
statusis a filter on document's HTTP status obtained during last reindexing.For example,-s 0is a filter for all documents that has not been indexed before.-s 200is a filter for all documents that was retrieved with "HTTP 200 Ok" status,and-s 301is a filter for all documents that was retrieved with "HTTP 301 Redirect"status.See HTTP protocol specificationsfor details on HTTP status codes and their respective meanings.
categoryis a filter for documents that match specific category. Categories are almost like tagsbut nested.
content-typeis a MIME type for documents with that Content-Type.
You can freely combine any number of-t,-u,-s,-gand-yoptions. The filters of the same class (tag, pattern, status) are be combinedusing logical OR, and the filters of different classes will be combined usinglogical AND. That means, if you typeindexer -u %izhcom.ru% -u %udm.net% -t 1 -s 200the documents-to-index will be those with tag 1 and HTTP status 200, which URLs contains the strings "izhcom.ru" or "udm.net".
- filenameRead URL to be indexed/inserted/cleared from a file. (With -a or -C option,it supports SQL LIKE wildcard '%', has no effect when combined with-moption.
- -Use STDIN instead of a file to read URL list
- Do not log to stdout/stderr.
- levelVerbose level, can be set to 0-5.
- Clear databases.
This will erase data previously collected by indexer from the mnoGoSearchdatabases. You can use options-t,-uand-sdescribed above to select what do you want to delete.
WARNING:Use this option with extreme caution!
- Show statistics.
This option outputs a brief statistics of how many documents are there indatabase, their HTTP status, and how many documents are expired. You can useoptions-t,-uand-sdescribed above to select what documents do you want statistics on.
- Show referrers.
This option shows you the referrers of URLs. Or, in other words, all hyperlinksfrom the document. You can useoptions-t,-uand-sdescribed above to select what documents do you want to show referrers on.
- Shows help screen with brief overall description of indexer options.
If you think you've found a bug in indexer, please report it tomnoGoSearch bugreport system at http://www.mnogosearch.org/bugs/
(please post in English only).
Copyright © 1998 - 2004 Lavtech.Com Corp.(http://www.mnogosearch.org/).
This program is free software; you can redistribute it and/or modifyit under the terms of the GNU General Public License as published bythe Free Software Foundation; either version 2 of the License, or(at your option) any later version.
This program is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- SEE ALSO
This document was created byman2html,using the manual pages.