0I:Berkeley DB Reference Guide: Access Methods[P4

Berkeley DB Reference Guide: Access Methods



,

Selecting a cache size



GThe size of the cache used for the underlying database can be specifiedhas part of the db_open call to open the database, specifically byrsetting the db_cachesize element of the DB_INFO structure.

HChoosing a cache size is, unfortunately, an art. Your cache mustCbe at least large enough for your working set plus some overlap forunexpected situations.

DWhen using the Btree access method, you must have a cache bigJenough for the minimum working set for a single access. This will includeGa root page, one or more internal pages (depending on the depth of yourItree), and a leaf page. If your cache is any smaller than that, each newCpage will force out the least-recently-used page, and you'll end upCrequesting the root page of the tree anew on each database request.

FIf your keys are of moderate size (a few tens of bytes) and your pagesCare on the order of 4K to 8K, most Btrees will be only three levelsF(e.g., if we assume 20 byte keys with 20 bytes of data associated withFeach key, a 8KB page can hold roughly 400 keys and 200 key/data pairs,Dso a fully populated three level Btree will hold 32 million key/dataFpairs, and a tree with only a 50% page-fill factore will still hold 16million key/data pairs).GWe rarely expect trees to exceed five levels, although Berkeley DB willsupport trees up to 255 levels.

GEven a small initial increase of the size of the cache over the minimumEworking set gives you a good return, as this allows you to cache moreHinternal pages, which are more likely to be accessed in a Btree requeststhan any single leaf page.

FRegardless, the rule-of-thumb is that cache is good, and more cache isGbetter. Generally, applications benefit from increasing the cache sizeHup to a point, where the performance will stop increasing when the cacheIsize increases, or, that the performance will only increase a very littleJbit in response to increasing the cache size. When this point is reached,Hone of two things have happened: First, either the cache is large enoughHthat the application is almost never having to retrieve information fromCdisk. Second, your application is doing truly random accesses, andItherefore increasing size of the cache doesn't significantly increase theHodds of finding the next requested information in the cache. The latterGis fairly rare -- almost all applications show some form of locality of reference!

IFor example, even if our accesses are truly random within a B+-tree, yourEaccess pattern will favor internal pages to leaf pages, so your cacheGshould be large enough to hold all internal pages. This will result inHyour requiring at most one I/O per operation to retrieve the appropriate leaf page.

nFinally, you can use the db_stat utility to monitor the effectivenessIof your cache. The following output is excerpted from the output of thatutility's -m option:



HThe statistics for this cache are that there have been 4,273 requests ofJthe cache, and only 116 of those requests required an I/O from disk. ThisImeans that the cache is working well, yielding a 97% cache hit rate. Theidb_stat utility will present these statistics both for the cache9as a whole and for each file within the cache separately.

JAKÿÿ