j  � ht://Dig: Configuration � 0  

 Configuration



W ht://Dig Copyright © 1995-2002 The ht://Dig Group
8 Please see the file COPYING for license information.




@ ht://Dig requires a configuration file and several HTML files6 to operate correctly. Fortunately, when ht://Dig is? installed, a very reasonable configuration is created and in7 most cases only minor modifications to the files are necessary.



> Below, we will use the variables that were set in CONFIG to designate specific paths.



 Standard files:




5 ${CONFIG_DIR}/htdig.conf



? This is the main runtime configuration file for all programs< that make up ht://Dig. The file is fully described in the< Configuration file manual.



9 When ht://Dig is installed, several attributes will be@ customized to your particular environment, but for reference,2 here is a sample copy of what it can look like:


### Example config file for ht://Dig.#L# This configuration file is used by all the programs that make up ht://Dig.I# Please refer to the attribute reference manual for more details on whatC# can be put into this file.  (http://www.htdig.org/confindex.html)F# Note that most attributes have very reasonable default values so youM# really only have to add attributes here if you want to change the defaults.#J# What follows are some of the common attributes you might want to change.##G# Specify where the database files need to go.  Make sure that there isF# plenty of free disk space available for the databases.  They can get
# pretty big.#)database_dir:           /opt/www/htdig/db#M# This specifies the URL where the robot (htdig) will start.  You can specify=# multiple URLs here.  Just separate them by some whitespace.K# The example here will cause the ht://Dig homepage and related pages to be
# indexed.6# You could also index all the URLs in a file like so:0# start_url:           `${common_dir}/start.url`#-start_url:              http://www.htdig.org/#M# This attribute limits the scope of the indexing process.  The default is toM# set it to the same as the start_url above.  This way only pages that are onL# the sites specified in the start_url attribute will be indexed and it will1# reject any URLs that go outside of those sites.#I# Keep in mind that the value for this attribute is just a list of stringK# patterns. As long as URLs contain at least one of the patterns it will be)# seen as part of the scope of the index.#$limit_urls_to:          ${start_url}#M# If there are particular pages that you definitely do NOT want to index, youN# can use the exclude_urls attribute.  The value is a list of string patterns.H# If a URL matches any of the patterns, it will NOT be indexed.  This isK# useful to exclude things like virtual web trees or database accesses.  ByN# default, all CGI URLs will be excluded.  (Note that the /cgi-bin/ conventionK# may not work on your web server.  Check the  path prefix used on your web
# server.)#&exclude_urls:           /cgi-bin/ .cgi#G# Since ht://Dig does not (and cannot) parse every document type, this J# attribute is a list of strings (extensions) that will be ignored during B# indexing. These are *only* checked at the end of a URL, whereas ,# exclude_url patterns are matched anywhere.#Lbad_extensions:         .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \L        .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css#L# The string htdig will send in every request to identify the robot.  Change# this to your email address.#Bmaintainer:             unconfigured@htdig.searchengine.maintainer#L# The excerpts that are displayed in long results rely on stored informationM# in the index databases.  The compiled default only stores 512 characters ofL# text from each document (this excludes any HTML markup...)  If you plan onM# using the excerpts you probably want to make this larger.  The only concernL# here is that more disk space is going to be needed to store the additionalL# information.  Since disk space is cheap (! :-)) you might want to set thisK# to a value so that a large percentage of the documents that you are goingI# to be indexing are stored completely in the database.  At SDSU we foundH# that by setting this value to about 50k the index would get 97% of allL# documents completely and only 3% was cut off at 50k.  You probably want to# experiment with this value.K# Note that if you want to set this value low, you probably want to set theO# excerpt_show_top attribute to false so that the top excerpt_length characters## of the document are always shown.#max_head_length:        10000#M# To limit network connections, ht://Dig will only pull up to a certain limitJ# of bytes. This prevents the indexing from dying because the server keepsM# sending information. However, several FAQs happen because people have filesM# bigger than the default limit of 100KB. This sets the default a bit higher.6# (see <http://www.htdig.org/FAQ.html> for more)#max_doc_size:           200000#I# Most people expect some sort of excerpt in results. By default, if the M# search words aren't found in context in the stored excerpt, htsearch shows 4# the text defined in the no_excerpt_text attribute:D# (None of the search words were found in the top of this document.):# This attribute instead will show the top of the excerpt.#no_excerpt_show_top:    true#L# Depending on your needs, you might want to enable some of the fuzzy searchK# algorithms.  There are several to choose from and you can use them in anyJ# combination you feel comfortable with.  Each algorithm will get a weightN# assigned to it so that in combinations of algorithms, certain algorithms getK# preference over others.  Note that the weights only affect the ranking of(# the results, not the actual searching.# The available algorithms are:#       accents
#       exact#       endings#       metaphone#       prefix#       soundex#       substring#       synonyms># By default only the "exact" algorithm is used with weight 1.M# Note that if you are going to use the endings, metaphone, soundex, accents,B# or synonyms algorithms, you will need to run htfuzzy to generate# the databases they use.#8search_algorithm:       exact:1 synonyms:0.5 endings:0.1#D# The following are the templates used in the builtin search resultsH# The default is to use compiled versions of these files, which producesE# slightly faster results. However, uncommenting these lines makes it3# very easy to change the format of search results.G# See <http://www.htdig.org/hts_templates.html> for more details.#3# template_map: Long long ${common_dir}/long.html \4#               Short short ${common_dir}/short.html# template_name: long#?# The following are used to change the text for the page index.@# The defaults are just boring text numbers.  These images spiceF# up the result pages quite a bit.  (Feel free to do whatever, though)#/xnext_page_text:         <img src="/htdig/buttonr.gif" border="0" align="middle" width="30" height="30" alt="next">no_next_page_text:xprev_page_text:         <img src="/htdig/buttonl.gif" border="0" align="middle" width="30" height="30" alt="prev">no_prev_page_text:ypage_number_text:       '<img src="/htdig/button1.gif" border="0" align="middle" width="30" height="30" alt="1">' \ry                        '<img src="/htdig/button2.gif" border="0" align="middle" width="30" height="30" alt="2">' \ay                        '<img src="/htdig/button3.gif" border="0" align="middle" width="30" height="30" alt="3">' \ty                        '<img src="/htdig/button4.gif" border="0" align="middle" width="30" height="30" alt="4">' \sy                        '<img src="/htdig/button5.gif" border="0" align="middle" width="30" height="30" alt="5">' \ty                        '<img src="/htdig/button6.gif" border="0" align="middle" width="30" height="30" alt="6">' \ey                        '<img src="/htdig/button7.gif" border="0" align="middle" width="30" height="30" alt="7">' \ly                        '<img src="/htdig/button8.gif" border="0" align="middle" width="30" height="30" alt="8">' \My                        '<img src="/htdig/button9.gif" border="0" align="middle" width="30" height="30" alt="9">' \ry                        '<img src="/htdig/button10.gif" border="0" align="middle" width="30" height="30" alt="10">'g#tE# To make the current page stand out, we will put a border around the # image for that page.#fyno_page_number_text:    '<img src="/htdig/button1.gif" border="2" align="middle" width="30" height="30" alt="1">' \ty                        '<img src="/htdig/button2.gif" border="2" align="middle" width="30" height="30" alt="2">' \ y                        '<img src="/htdig/button3.gif" border="2" align="middle" width="30" height="30" alt="3">' \ey                        '<img src="/htdig/button4.gif" border="2" align="middle" width="30" height="30" alt="4">' \ y                        '<img src="/htdig/button5.gif" border="2" align="middle" width="30" height="30" alt="5">' \ty                        '<img src="/htdig/button6.gif" border="2" align="middle" width="30" height="30" alt="6">' \ y                        '<img src="/htdig/button7.gif" border="2" align="middle" width="30" height="30" alt="7">' \ty                        '<img src="/htdig/button8.gif" border="2" align="middle" width="30" height="30" alt="8">' \ y                        '<img src="/htdig/button9.gif" border="2" align="middle" width="30" height="30" alt="9">' \Ly                        '<img src="/htdig/button10.gif" border="2" align="middle" width="30" height="30" alt="10">'be

i

l6 ${SEARCH_DIR}/search.html



A This is the default search form. It is an example interface to@ the search engine, htsearch. The file contains a form with asB its action a call to htsearch. There are several form variables@ which htsearch will use. More about those can be found in the1 htsearchU documentation.t

t

% An example file can be as follows:h

i
 D<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><html><head>.<title>ht://Dig WWW Search</title>
</head>r<body bgcolor="#eef7ff">
<h1>z<a href="http://www.htdig.org"><IMG SRC="/htdig/htdig.gif" align="bottom" alt="ht://Dig" border="0"></a>WWW Site Search</h1><hr noshade size="4"> 4This search will allow you to search the contents of6all the publicly available WWW documents at this site.
<br>	<p>x5<form method="post" action="/cgi-bin/htsearch"> <font size="-1">#Match: <select name="method"> <option value="and">Allr<option value="or">Any%<option value="boolean">Booleann</select>d$Format: <select name="format">'<option value="builtin-long">Longe)<option value="builtin-short">Shortl</select> #Sort by: <select name="sort"> !<option value="score">Score<option value="time">Time.!<option value="title">Title ,<option value="revscore">Reverse Score*<option value="revtime">Reverse Time,<option value="revtitle">Reverse Title</select> 
</font>7<input type="hidden" name="config" value="htdig"> 4<input type="hidden" name="restrict" value="">3<input type="hidden" name="exclude" value="">o
<br>Search:9<input type="text" size="30" name="words" value=""> *<input type="submit" value="Search">
</form>s<hr noshade size="4">p
</body> 
</html>ua



l6 ${COMMON_DIR}/header.html



@ This file is the file that is output before any of the search@ results are produced in a search. This file can be customizedB to reflect your particular web look-and-feel, for example. Take< note that this file is only the top part of the full HTML? document that is produced when search results are displayed. 7 This means that it should start with the proper HTML  introductory tags and title.t

v

@ This file will not just simply be copied. Instead, the search@ engine will look for special variables inside the file. These= variables will be replaced with the appropriate values forn@ the particular search it is used for. For more details of the& use of these variables, consult theE htsearch templates documentation.e

h

= Below is the default header.html file that gets installed.e? Note that it contains a form to allow the user to refine theB search.

a
iD<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">^<html><head><title>Search results for '$&(WORDS)'</title></head><body bgcolor="#eef7ff">;<h2><img src="/htdig/htdig.gif" alt="ht://Dig">d1Search results for '$&(LOGICAL_WORDS)'</h2>f<hr noshade size="4">i)<form method="get" action="$(CGI)">u<font size="-1"><<input type="hidden" name="config" value="$&(CONFIG)">@<input type="hidden" name="restrict" value="$&(RESTRICT)">><input type="hidden" name="exclude" value="$&(EXCLUDE)">Match: $(METHOD)Format: $(FORMAT)oSort by: $(SORT)
<br>Refine search:B<input type="text" size="30" name="words" value="$&(WORDS)">*<input type="submit" value="Search">
</font>t
</form>h<hr noshade size="1">S<strong>Documents $(FIRSTDISPLAYED) - $(LASTDISPLAYED) of $(MATCHES) matches. IMore <img src="/htdig/star.gif" alt="*">'s indicate a better match.f</strong>e<hr noshade size="1"> s



6 ${COMMON_DIR}/footer.html



= This file is output after all the search results have beeni@ displayed. All the same header.html rules apply to this file,B except that it is supposed to contain all the ending HTML tags.



= Below is the default footer.html file that gets installed.s3 Note that it contains the page navigation stuff.f

h
d
$(PAGEHEADER)H#$(PREVPAGE) $(PAGELIST) $(NEXTPAGE)s<hr noshade size="4">h&<a href="http://www.htdig.org/">Y<img src="/htdig/htdig.gif" border="0" alt="ht://Dig">ht://Dig $(VERSION)</a>n</body></html> 

o

r8 ${COMMON_DIR}/wrapper.html



D This file may be used in place of the header.html and footer.htmlB files above. It is simply the concatenation of these two files,C with the pseudo-variable $(HTSEARCH_RESULTS) as;2 a separator for the header and footer sections.E All the same header.html and footer.html rules apply to this file._D To make this file override the header and footer files above, you? must define the e( search_results_wrapper attribute.

g
a

i8 ${COMMON_DIR}/nomatch.html



? If a search produces no matches, this file is displayed. All3@ the relevant variables will be replaced as in the header.htmlA and footer.html files. The default nomatch.html is little more=- than header.html and footer.html appended:&

s
dD<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">`<html><head><title>No match for '$&(LOGICAL_WORDS)'</title></head><body bgcolor="#eef7ff">;<h1><img src="/htdig/htdig.gif" alt="ht://Dig">nSearch results</h1>=<hr noshade size="4"> B<h2>No matches were found for '$&(LOGICAL_WORDS)'</h2>	<p>t2Check the spelling of the search word(s) you used.6If the spelling is correct and you only used one word,Stry using one or more similar search words with "<strong>Any</strong>."s</p><p>f5If the spelling is correct and you used more than one"Rword with "<strong>Any</strong>," try using one or more similar searchAwords with "<strong>Any</strong>."</p><p>h5If the spelling is correct and you used more than onehUword with "<strong>All</strong>," try using one or more of the same wordsf2with "<strong>Any</strong>."</p><hr noshade size="4"> )<form method="get" action="$(CGI)"> <font size="-1"><<input type="hidden" name="config" value="$&(CONFIG)">@<input type="hidden" name="restrict" value="$&(RESTRICT)">><input type="hidden" name="exclude" value="$&(EXCLUDE)">Match: $(METHOD)Format: $(FORMAT)rSort by: $(SORT)
<br>Refine search:B<input type="text" size="30" name="words" value="$&(WORDS)">*<input type="submit" value="Search">
</font>"
</form> <hr noshade size="4">c&<a href="http://www.htdig.org/">Y<img src="/htdig/htdig.gif" border="0" alt="ht://Dig">ht://Dig $(VERSION)</a>g</body></html>e

"

"6 ${COMMON_DIR}/syntax.html



B If a boolean expression search causes a syntax error, this file will be displayed.t

c
gD<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">g<html><head><title>Error in Boolean search for '$&(WORDS)'</title></head>1<body bgcolor="#eef7ff">;<h1><img src="/htdig/htdig.gif" alt="ht://Dig">a:Error in Boolean search for '$&(LOGICAL_WORDS)'</h1><hr noshade size="4">e@Boolean expressions need to be 'correct' in order for the searchsystem to use them. 5The expression you entered has errors in it.<p>w`Examples of correct expressions are: <strong>cat and dog</strong>, <strong>cat^not dog</strong>, <strong>cat or (dog not nose)</strong>.<br>Note thatKthe operator <strong>not</strong> has the meaning of 'without'.0 <blockquote><strong>$(SYNTAXERROR)"</strong></blockquote><hr noshade size="4">)<form method="get" action="$(CGI)"><font size="-1"><<input type="hidden" name="config" value="$&(CONFIG)">@<input type="hidden" name="restrict" value="$&(RESTRICT)">><input type="hidden" name="exclude" value="$&(EXCLUDE)">Match: $(METHOD)Format: $(FORMAT)a
Sort: $(SORT)c
<br>Refine search:B<input type="text" size="30" name="words" value="$&(WORDS)">*<input type="submit" value="Search">
</font>d
</form>i<hr noshade size="4">p&<a href="http://www.htdig.org/">Y<img src="/htdig/htdig.gif" border="0" alt="ht://Dig">ht://Dig $(VERSION)</a>v</body></html>n

, Last modified: $Date: 2002/01/27 05:33:20 $ ;lÿÿ="score">Score<option value="time">Time.!<option value="title">Title ,<option value="revscore">Reverse Score*<option value