j  � ht://Dig: htsearch� � 0  

htsearch



W ht://Dig Copyright © 1995-2002 The ht://Dig Group
8 Please see the file COPYING for license information.




 Search Method Used



> The way htsearch performs it search and applies its rankingA rules are fairly complicated. This is an attempt at explaining7 in global terms what goes on when htsearch searches.



@ htsearch gets a list of words from the HTML form that invoked> it. If htsearch was invoked with boolean expression parsing? enabled, it will do a quick syntax check on the input words.? If there are syntax errors, it will display the syntax error" file that is specified with the? syntax_error_file attribute.



> If the boolean parser was not enabled, the list of words is? converted into a boolean expression by putting either "and"s: or "or"s between the words. (This depends on the search type.)



? In both cases, each of the words in the list is now expanded9 using the search algorithms that were specified in the= search_algorithm? attribute. For example, the endings algorithm will convert a@ word like "person" into "person or persons". In this fashion,= all the specified algorithms are used on each of the words. and the result is a new boolean expression.



? The next step is to perform database lookups on the words in> the expression. The result of these lookups are then passed$ to the boolean expression parser.



> The boolean expression parser is a simple recursive descent: parser with an operand stack. It knows how to deal with? "not", "and", "or" and parenthesis. The result of the parser" will be one set of matches.
A Note that the operator "not" is used as the word 'without' and> is binary: You can not write "cat and not dog" or just "not( dog" but you can write "cat not dog".



@ At this point, the matches are ranked. The rank of a match is> determined by the weight of the words that caused the match@ and the weight of the algorithm that generated the word. Word< weights are generally determined by the importance of the; word in a document. For example, words in the title of a> document have a much higher weight than words at the bottom of the document.



@ Finally, when the document ranks have been determined and the< documents sorted, the resulting matches are displayed. If= paged output is required, only a subset of all the matches will be displayed.


+Last modified: $Date: 2002/01/27 05:33:20 $ ÿÿ