j  � Configuration file � 0 H 

6
 Configuration file

 Navigate
c ^ ht://Dig

> File format
g * General
m * Attribute types
t * Variable expansion

= Attributes
u > By program
w > Alphabetical

D
 Quick Search:
 ' , . 
ÿÿduct to index PDF files?
1.14. Why do you have all those SourceForge logos on your website?
1.15. My question isn't answered here. Where should I go for help?
1.16. Why do the developers get annoyed when I e-mail questions directly to them rather than the mailing list?

2. Getting ht://Dig

2.1. What's the latest version of ht://Dig?
2.2. Are there binary distributions of ht://Dig?
2.3. Are there mirror sites for ht://Dig?
2.4. Is ht://Dig available by ftp?
2.5. Are patches around to upgrade between versions?
2.6. Is there a Windows 95/98/2000/NT version of ht://Dig?
2.7. Where can I find the documentation for my version of ht://Dig?

3. Compiling

3.1. When I compile ht://Dig I get an error about libht.a.
3.2. I get an error about -lg
3.3. I'm compiling on Digital Unix and I get mesages about "unresolved" and "db_open."
3.4. I'm compiling on FreeBSD and I get lots of messages about '___error' being unresolved.
3.5. I'm compiling on HP/UX and I get a complaint about "Large Files not supported."
3.6. I'm compiling on Solaris and when I run the programs I get complaints about not finding libstdc++.
3.7. I'm compiling on IRIX and I'm having database problems when I run the program.

4. Configuration

4.1. How come I can't index my site?
4.2. How can I change the output format of htsearch?
4.3. How do I index pages that start with '~'?
4.4. Can I use multiple databases?
4.5. OK, I can use multiple databases. Can I merge them into one?
4.6. Wow, ht://Dig eats up a lot of disk space. How can I cut down?
4.7. Can I use SSI or other CGIs in my htsearch results?
4.8. How do I index Word, Excel, PowerPoint or PostScript documents?
4.9. How do I index PDF files?
4.10. How do I index documents in other languages?
4.11. How do I get rotating banner ads in search results?
4.12. How do I index numbers in documents?
4.13. How can I call htsearch from a hypertext link, rather than from a search form?
4.14. How do I restrict a search to only meta keywords entries in documents?
4.15. Can I use meta tags to prevent htdig from indexing certain files?
4.16. How do I get htsearch to use the star image in a different directory than the default /htdig?
4.17. How do I get htdig or htsearch to rewrite URLs in the search results?
4.18. What are all the options in htdig.conf, and are there others?
4.19. How do I get more than 10 pages of 10 search results from htsearch?
4.20. How do I restrict a search to only certain subdirectories or documents?
4.21. How can I allow people to search while the index is updating?
4.22. How can I get htdig to ignore the robots.txt file or meta robots tags?
4.23. How can I get htdig not to index some directories, but still follow links?
4.24. How can I get rid of duplicates in search results?

5. Troubleshooting

5.1. I can't seem to index more than X documents in a directory.
5.2. I can't index PDF files.
5.3. When I run "rundig," I get a message about "DATABASE_DIR" not being found.
5.4. When I run htmerge, it stops with an "out of diskspace" message.
5.5. I have problems running rundig from cron under Linux.
5.6. When I run htmerge, it stops with an "Unexpected file type" message.
5.7. When I run htsearch, I get lots of Internal Server Errors (#500).
5.8. I'm having problems with indexing words with accented characters.
5.9. When I run htmerge, it stops with a "Word sort failed" message.
5.10. When htsearch has a lot of matches, it runs extremely slowly.
5.11. When I run htsearch, it gives me a count of matches, but doesn't list the matching documents.
5.12. I can't seem to index documents with names like left_index.html with htdig.
5.13. I get Premature End of Script Headers errors when running htsearch.
5.14. I get Segmentation faults when running htdig, htsearch or htfuzzy.
5.15. Why does htdig 3.1.3 mangle URL parameters that contain bare "&" characters?
5.16. When I run htmerge, it stops with an "Unable to open word list file '.../db.wordlist'" message.
5.17. When using Netscape, htsearch always returns the "No match" page.
5.18. Why doesn't htdig follow links to other pages in JavaScript code?
5.19. When I run htsearch from the web server, it returns a bunch of binary data.
5.20. Why are the betas of 3.2 so slow at indexing?
5.21. Why does htsearch use ";" instead of "&" to separate URL parameters for the page buttons?
5.22. Why does htsearch show the "&" character as "&" in search results?
5.23. I get Internal Server or Unrecognized character errors when running htsearch.
5.24. I took some settings out of my htdig.conf but they're still set.
5.25. When I run htdig on my site, it misses entire directories.
5.26. What do all the numbers and symbols in the htdig -v output mean?
5.27. Why is htdig rejecting some of the links in my documents?
5.28. When I run htdig or htmerge, I get a "DB2 problem...: missing or empty key value specified" message.
5.29. When I run htdig on my site, it seems to go on and on without ending.

Answers

1. General

1.1. Can I search the internet with ht://Dig?

No, ht://Dig is a system for indexing and searching a finite (not necessarily small) set of sites or intranet. It is not meant to replace any of the many internet-wide search engines.

1.2. Can I index the internet with ht://Dig?

No, as above, ht://Dig is not meant as an internet-wide search engine. While there is theoretically nothing to stop you from indexing as much as you wish, practical considerations (e.g. time, disk space, memory, etc.) will limit this.

1.3. What's the difference between htdig and ht://Dig?

The complete ht://Dig package consists of several programs, one of which is called "htdig." This program performs the "digging" or indexing of the web pages. Of course an index doesn't do you much good without a program to sort it, search through it, etc.

1.4. I sent mail to Andrew or Geoff or Gilles, but I never got a response!

Andrew no longer does much work on ht://Dig. He has started a company, called Contigo Software and is quite busy with that. To contact any of the current developers, send mail to <htdig-dev>. This list is intended primarily for the discussion of current and future development of the software.

Geoff and Gilles are currently the maintainers of ht://Dig, but they are both volunteers. So while they do read all the e-mail they receive, they may not respond immediately. Questions about ht://Dig in general, and especially questions or requests for help in configuring the software, should be posted to the <htdig-general> mailing list. When posting a followup to a message on the list, you should use the "reply to all" or "group reply" feature of your mail program, to make sure the mailing list address is included in the reply, rather than replying only to the author of the message. See also question 1.16 and the mailing list page.

1.5. I sent a question to the mailing list but I never got a response!

Development of ht://Dig is done by volunteers. Since we all have other jobs, it make take a while before someone gets back to you. Please be patient and don't hound the volunteers with direct or repeated requests. If you don't get a response after 3 or 4 days, then a reminder may help. See also question 1.16.

1.6. I have a great idea/patch for ht://Dig!

Great! Development of ht://Dig continues through suggestions and improvements from users. If you have an idea (or even better, a patch), please send it to the ht://Dig mailing list so others can use it. For suggestions on how to submit patches, please check the Guidelines for Patch Submissions. If you'd like to make a feature request, you can do so through the ht://Dig bug database

1.7. Is ht://Dig Y2K compliant?

ht://Dig should be y2k compliant since it never stores dates as two-digit years. Under ht://Dig's copyright (GPL), there is no warranty whatsoever as permitted by law. If you would like an iron-clad, legally-binding guarantee, feel free to check the source code itself. Versions prior to 3.1.2 did have a problem with the parsing of the Last-Modified header returned by the HTTP server, which will cause incorrect dates to be stored for documents modified after February 28, 2000 (yes, it didn't recognize 2000 as a leap year). Versions prior to 3.1.5 didn't correctly handle servers that return two digit years in the Last-Modified header, for years after 99. These problems are fixed in the current release. If you discover something else, please let us know!

1.8. I think I found a bug. What should I do?

Well, there are probably bugs out there. You have two options for bug-reporting. You can either mail the ht://Dig mailing list at <htdig-general@lists.sourceforge.net> or better yet, report it to the bug database, which ensures it won't become lost amongst all of the other mail on the list. Please try to include as much information as possible, including the version of ht://Dig, the OS, and anything else that might be helpful. Often, running the programs with one "-v" or more (e.g. "-vvv") gives useful debugging information. If you are unsure whether the problem is a bug or a configuration problem, you should discuss the problem on <htdig-general> (after carefully reading the FAQ and searching the mail archive and patch archive, of course) to sort out what it is. The mailing list has a wider audience, so you're more likely to get help with configuration problems there than by reporting them to the bug database.

Whether reporting problems to the bug database or mailing list, we cannot stress enough the importance of always indicating which version of ht://Dig you are running. There are still a lot of users, ISPs and software distributors using older versions, and there have been a lot of bug fixes and new features added in recent versions. Knowing which version you're running is absolutely essential in helping to find a solution. If you're unsure if your version is current, or what fixes and features have been added in more recent versions, please see the release notes. See also question 2.1.

1.9. Does ht://Dig support phrase or near matching?

Phrase searching has been added for the 3.2 release, which is currently in the beta phase (3.2.0b3 as of this writing). Near or proximity matching will probably be added in a future beta.

1.10. What are the practical and/or theoretical limits of ht://Dig?

The code itself doesn't put any real limit on the number of pages. There are several sites in the hundreds of thousands of pages. As for practical limits, it depends a lot on how many pages you plan on indexing. Some operating systems limit files to 2 GB in size, which can become a problem with a