j  � ht://Dig: Bug Reporting � 0  

 Bug Reporting



W ht://Dig Copyright © 1995-2002 The ht://Dig Group
8 Please see the file COPYING for license information.




C If you are having problems or have suggestions for ht://Dig feelD free to fill out a bug report form. Before you do this, please do the following:

 

= If you've done this, you can easily submit a bug report or a feature request through theb bug database.


+Last modified: $Date: 2002/01/27 05:33:20 $ ÿÿto go. Make sure that there isF# plenty of free disk space available for the databases. They can get # pretty big.#)database_dir: /opt/www/htdig/db#M# This specifies the URL where the robot (htdig) will start. You can specify=# multiple URLs here. Just separate them by some whitespace.K# The example here will cause the ht://Dig homepage and related pages to be # indexed.6# You could also index all the URLs in a file like so:0# start_url: `${common_dir}/start.url`#-start_url: http://www.htdig.org/#M# This attribute limits the scope of the indexing process. The default is toM# set it to the same as the start_url above. This way only pages that are onL# the sites specified in the start_url attribute will be indexed and it will1# reject any URLs that go outside of those sites.#I# Keep in mind that the value for this attribute is just a list of stringK# patterns. As long as URLs contain at least one of the patterns it will be)# seen as part of the scope of the index.#$limit_urls_to: ${start_url}#M# If there are particular pages that you definitely do NOT want to index, youN# can use the exclude_urls attribute. The value is a list of string patterns.H# If a URL matches any of the patterns, it will NOT be indexed. This isK# useful to exclude things like virtual web trees or database accesses. ByN# default, all CGI URLs will be excluded. (Note that the /cgi-bin/ conventionK# may not work on your web server. Check the path prefix used on your web # server.)#&exclude_urls: /cgi-bin/ .cgi#G# Since ht://Dig does not (and cannot) parse every document type, this J# attribute is a list of strings (extensions) that will be ignored during B# indexing. These are *only* checked at the end of a URL, whereas ,# exclude_url patterns are matched anywhere.#Lbad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \L .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css#L# The string htdig will send in every request to identify the robot. Change# this to your email address.#Bmaintainer: unconfigured@htdig.searchengine.maintainer#L# The excerpts that are displayed in long results rely on stored informationM# in the index databases. The compiled default only stores 512 characters ofL# text from each document (this excludes any HTML markup...) If you plan onM# using the excerpts you probably want to make this larger. The only concernL# here is that more disk space is going to be needed to store the additionalL# information. Since disk space is cheap (! :-)) you might want to set thisK# to a value so that a large percentage of the documents that you are goingI# to be indexing are stored completely in the database. At SDSU we foundH# that by setting this value to about 50k the index would get 97% of allL# documents completely and only 3% was cut off at 50k. You probably want to# experiment with this value.K# Note that if you want to set this value low, you probably want to set theO# excerpt_show_top attribute to false so that the top excerpt_length characters## of the document are always shown.#max_head_length: 10000#M# To limit network connections, ht://Dig will only pull up to a certain limitJ# of bytes. This prevents the indexing from dying because the server keepsM# sending information. However, several FAQs happen because people have filesM# bigger than the default limit of 100KB. This sets the default a bit higher.6# (see <http://www.htdig.org/FAQ.html> for more)#max_doc_size: 200000#I# Most people expect some sort of excerpt in results. By default, if the M# search words aren't found in context in the stored excerpt, htsearch shows 4# the text defined in the no_excerpt_text attribute:D# (None of the search words were found in the top of this document.):# This attribute instead will show the top of the excerpt.#no_excerpt_show_top: true#L# Depending on your needs, you might want to enable some of the fuzzy searchK# algorithms. There are several to choose from and you can use them in anyJ# combination you feel comfortable with. Each algorithm will get a weightN# assigned to it so that in combinations of algorithms, certain algorithms getK# preference over others. Note that the weights only affect the ranking of(# the results, not the actual searching.# The available algorithms are:# accents # exact# endings# metaphone# prefix# soundex# substring# synonyms># By default only the "exact" algorithm is used with weight 1.M# Note that if you are going to use the endings, metaphone, soundex, accents,B# or synonyms algorithms, you will need to run htfuzzy to generate# the databases they use.#8search_algorithm: exact:1 synonyms:0.5 endings:0.1#D# The following are the templates used in the builtin search resultsH# The default is to use compiled versions of these files, which producesE# slightly faster results. However, uncommenting these lines makes it3# very easy to change the format of search results.G# See <http://www.htdig.org/hts_templates.html> for more details.#3# template_map: Long long ${common_dir}/long.html \4# Short short ${common_dir}/short.html# template_name: long#?# The following are used to change the text for the page index.@# The defaults are just boring text numbers. These images spiceF# up the result pages quite a bit. (Feel free to do whatever, though)#/xnext_page_text: <img src="/htdig/buttonr.gif" border="0" align="middle" width="30" height="30" alt="next">no_next_page_text:xprev_page_text: <img src="/htdig/buttonl.gif" border="0" align="middle" width="30" height="30" alt="prev">no_prev_page_text:ypage_number_text: '<img src="/htdig/button1.gif" border="0" align="middle" width="30" height="30" alt="1">' \ry '<img src="/htdig/button2.gif" border="0" align="middle" width="30" height="30" alt="2">' \ay '<img src="/htdig/button3.gif" border="0" align="middle" width="30" height="30" alt="3">' \ty '<img src="/htdig/button4.gif" border="0" align="middle" width="30" height="30" alt="4">' \sy '<img src="/htdig/button5.gif" border="0" align="middle" width="30" height="30" alt="5">' \ty '<img src="/htdig/button6.gif" border="0" align="middle" width="30" height="30" alt="6">' \ey '<img src="/htdig/button7.gif" border="0" align="middle" width="30" height="30" alt="7">' \ly '<img src="/htdig/button8.gif" border="0" align="middle" width="30" height="30" alt="8">' \My '<img src="/htdig/button9.gif" border="0" align="middle" width="30" height="30" alt="9">' \ry '<img src="/htdig/button10.gif" border="0" align="middle" width="30" height="30" alt="10">'g#tE# To make the current page stand out, we will put a border around the # image for that page.#fyno_page_number_text: '<img src="/htdig/button1.gif" border="2" align="middle" width="30" height="30" alt="1">' \ty '<img src="/htdig/button2.gif" border="2" align="middle" width="30" height="30" alt="2">' \ y '<img src="/htdig/button3.gif" border="2" align="middle" width="30" height="30" alt="3">' \ey '<img src="/htdig/button4.gif" border="2" align="middle" width="30" height="30" alt="4">' \ y '<img src="/htdig/button5.gif" border="2" align="middle" width="30" height="30" alt="5">' \ty '<img src="/htdig/button6.gif" border="2" align="middle" width="30" height="30" alt="6">' \ y '<img src="/htdig/button7.gif" border="2" align="middle" width="30" height="30" alt="7">' \ty '<img src="/htdig/button8.gif" border="2" align="middle" width="30" height="30" alt="8">' \ y '<img src="/htdig/button9.gif" border="2" align="middle" width="30" height="30" alt="9">' \Ly '<img src="/htdig/button10.gif" border="2" align="middle" width="30" height="30" alt="10">'be
i

l6 ${SEARCH_DIR}/search.html



A This is the default search form. It is an example interface to@ the search engine, htsearch. The file contains a form with asB its action a call to htsearch. There are several form variables@ which htsearch will use. More about those can be found in the1 htsearchU documentation.t

t

% An example file can be as follows:h

i
 D<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><html><head>.<title>ht://Dig WWW Search</title>
</head>r<body bgcolor="#eef7ff">
<h1>z<a href="http://www.htdig.org"><IMG SRC="/htdig/htdig.gif" align="bottom" alt="ht://Dig" border="0"></a>WWW Site Search</h1><hr noshade size="4"> 4This search will allow you to search the contents of6all the publicly available WWW documents at this site.
<br>	<p>x5<form method="post" action="/cgi-bin/htsearch"> <font size="-1">#Match: <select name="method"> <option value="and">Allr<option value="or">Any%<option value="boolean">Booleann</select>d$Format: <select name="format">'<option value="builtin-long">Longe)<option value="builtin-short">Shortl</select> #Sort by: <select name="sort"> !<option value="score">Score<option value="time">Time.!<option value="title">Title ,<option value="revscore">Reverse Score*<option value="revtime">Reverse Time,<option value="revtitle">Reverse Title</select> 
</font>7<input type="hidden" name="config" value="htdig"> 4<input type="hidden" name="restrict" value="">3<input type="hidden" name="exclude" value="">o
<br>Search:9<input type="text" size="30" name="words" value=""> *<input type="submit" value="Search">
</form>s<hr noshade size="4">p
</body> 
</html>ua



l6 ${COMMON_DIR}/header.html



@ This file is the file that is output before any of the search@ results are produced in a search. This file can be customizedB to reflect your particular web look-and-feel, for example. Take< note that this file is only the top part of the full HTML? document that is produced when search results are displayed. 7 This means that it should start with the proper HTML  introductory tags and title.t

v

@ This file will not just simply be copied. Instead, the search@ engine will look for special variables inside the file. These= variables will be replaced with the appropriate values forn@ the particular search it is used for. For more details of the& use of these variables, consult theE htsearch templates documentation.e

h

= Below is the default header.html file that gets installed.e? Note that it contains a form to allow the user to refine theB search.

a
iD<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">^<html><head><title>Search results for '$&(WORDS)'</title></head><body bgcolor="#eef7ff">;<h2><img src="/htdig/htdig.gif" alt="ht://Dig">d1Search results for '$&(LOGICAL_WORDS)'</h2>f<hr noshade size="4">i)<form method="get" action="$(CGI)">u<font size="-1"><<input type="hidden" name="config" value="$&(CONFIG)">@<input type="hidden" name="restrict" value="$&(RESTRICT)">><input type="hidden" name="exclude" value="$&(EXCLUDE)">Match: $(METHOD)Format: $(FORMAT)oSort by: $(SORT)
<br>Refine search:B<input type="text" size="30" name="words" value="$&(WORDS)">*<input type="submit" value="Search">
</font>t
</form>h<hr noshade size="1">S<strong>Documents $(FIRSTDISPLAYED) - $(LASTDISPLAYED) of $(MATCHES) matches. IMore <img src="/htdig/star.gif" alt="*">'s indicate a better match.f</strong>e<hr noshade size="1"> s



6 ${COMMON_DIR}/footer.html



= This file is output after all the search results have beeni@ displayed. All the same header.html rules apply to this file,B except that it is supposed to contain all the ending HTML tags.



= Below is the default footer.html file that gets installed.s3 Note that it contains the page navigation stuff.f

h
d
$(PAGEHEADER)H#$(PREVPAGE) $(PAGELIST) $(NEXTPAGE)s<hr noshade size="4">h&<a href="http://www.htdig.org/">Y<img src="/htdig/htdig.gif" border="0" alt="ht://Dig">ht://Dig $(VERSION)</a>n</body></html> 

o

r8 ${COMMON_DIR}/wrapper.html



D This file may be used in place of the header.html and footer.htmlB files above. It i