+IMBerkeley DB: Berkeley DB Access Method Specific Initialization[P:

Berkeley DB Access Method Specific Initialization




`Access method specific information is provided to db_open using the DB_INFO data argument.FThe DB_INFO structure has a large number of fields, most specific to aEsingle access method, although a few are shared. The fields that areBcommon to all access methods are listed here; those specific to anJindividual access method are described below. No reference to the DB_INFOOstructure is maintained by Berkeley DB, so it is possible to discard it as soonCas the db_open call returns.

PIn order to ensure compatibility with future releases of Berkeley DB, all fieldsHof the DB_INFO structure should be initialized to 0 before the structureis used.HDo this by declaring the structure external or static, or by calling the7C library function bzero(3) or memset(3).

IIf possible, defaults appropriate for the system are used for the DB_INFOFfields if dbinfo is NULL or any fields of the DB_INFO structure are set to 0.>The following DB_INFO fields may be initialized before calling/db_open:

[

size_t db_cachesize;
A suggested maximum size of the memory pool cache, in bytes.<If db_cachesize is 0, an appropriate default is used.=It is an error to specify both the mp_info field and anon-zero db_cachesize.

CNote, the minimum number of pages in the cache should be no lessDthan 10, and the access methods will fail if an insufficiently largecache is specified.CIn addition, for applications that exhibit strong locality in theirHdata access patterns, increasing the size of the cache can significantly improve application performance.

9For information on tuning the Berkeley DB cache size, seeASelecting a cache size.U

int db_lorder;
The byte order for integers in the stored database metadata.AThe number should represent the order as an integer, for example,@big endian order is the number 4,321, and little endian order isthe number 1,234.MIf db_lorder is 0, the host order of the machine where the Berkeley DBlibrary was compiled is used.

HThe value of db_lorder is ignored except when databases are beingcreated.GIf a database already exists, the byte order it uses is determined whenthe file is read.

JThe access methods provide no guarantees about the byte ordering of the(application data stored in the database,Land applications are responsible for maintaining any necessary ordering.a

size_t db_pagesize;
The size of the pages used to hold items in the database, in bytes.JThe minimum page size is 512 bytes and the maximum page size is 64K bytes.KIf db_pagesize is 0, a page size is selected based on the underlyingfilesystem I/O block size.CThe selected size has a lower limit of 512 bytes and an upper limit of 16K bytes.

8For information on tuning the Berkeley DB page size, see?Selecting a page size.

int (*dup_compare)(const DBT *, const DBT *);
Specify a duplicate comparison function. It must return an integer lessAthan, equal to, or greater than zero if the first key argument isFconsidered to be respectively less than, equal to, or greater than theIsecond key argument. The same comparison function must be used on a givenltree every time it is opened. See DB_DUPSORT for more information.

void *(*db_malloc)(size_t);
The flag DB_DBT_MALLOC, when specified in the DBT structure,Lwill cause the Berkeley DB library to allocate memory which then becomes the*responsibility of the calling application.

JOn systems where there may be multiple library versions of malloc (notablyuWindows NT), specifying the DB_DBT_MALLOC flag will fail becauseKthe Berkeley DB library will allocate memory from a different heap than the application will use to free it.BTo avoid this problem, the db_malloc field should be set to.point to the application's allocation routine.GIf db_malloc is non-NULL, it will be used to allocate the memoryareturned when the DB_DBT_MALLOC flag is set.GThe db_malloc function must match the calling conventions of the!malloc(3) library routine.



Btree



KThe following additional fields and flags may be initialized in the DB_INFOlstructure before calling db_open, when using the Btree access method:

u

int (*bt_compare)(const DBT *, const DBT *);
The bt_compare function is the key comparison function.JIt must return an integer less than, equal to, or greater than zero if theHfirst key argument is considered to be respectively less than, equal to,(or greater than the second key argument.GThe same comparison function must be used on a given tree every time it is opened.

JThe data and size fields of the DBT are the only fields that0may be used for the purposes of this comparison.

CIf bt_compare is NULL, the keys are compared lexically, with*shorter keys collating before longer keys.a

u_int32_t bt_minkey;
The minimum number of keys that will be stored on any single page.EThis value is used to determine which keys will be stored on overflowEpages, i.e. if a key or data item is larger than the pagesize dividedBby the bt_minkey value, it will be stored on overflow pagesinstead of in the page itself.;The bt_minkey value specified must be at least 2; if,bt_minkey is 0, a value of 2 is used.y

size_t (*bt_prefix)(const DBT *, const DBT *);
The bt_prefix function is the prefix comparison function.MIf specified, this function must return the number of bytes of the second keyJargument that are necessary to determine that it is greater than the first key argument.9If the keys are equal, the key length should be returned.

EThe data and size fields of the DBT are the only fields5that may be used for the purposes of this comparison.

GThis function is used to compress the keys stored on the btree internalGpages. The usefulness of this is data dependent, but in some data sets>can produce significantly reduced tree sizes and search times.

;If bt_prefix is NULL, and no key comparison function?is specified, a default lexical comparison function is used for:prefix compression. If bt_prefix is NULL and a key@comparison function is specified, no prefix compression is done.?It is an error to specify a prefix compression function without*also specifying a key comparison function.u

u_int32_t flags;
The following additional flags may be specified by logically OR'ing together one ormore of the following values:
X

DB_DUP
Permit duplicate data items in the tree, i.e. insertion when the key ofCthe key/data pair being inserted already exists in the tree will be!Hsuccessful. The ordering of duplicates in the tree is determined by theHorder of insertion, unless the ordering is otherwise specified by use of,a cursor or a duplicate comparison function.4It is an error to specify both DB_DUP and DB_RECNUM."]

DB_DUPSORT
Sort duplicates within a set of data items. If the application does notd>specify a comparison function using the dup_compare element ofBthe DB_INFO structure, a default, lexical comparison will be used.

aGSpecifying that duplicates are to be sorted changes the behavior of the<ŒDB->put operation as well as the DBcursor->c_put operation whenÁthe DB_KEYFIRST, DB_KEYLAST and DB_CURRENTmflags are specified.G

DB_RECNUM
Support retrieval from btrees using record numbers.NdFor more information, see the DB_SET_RECNO flag to the DB->getXfunction and the cursor DBcursor->c_get function.

tBLogical record numbers in btrees are mutable in the face of recordinsertion or deletion.KSee the DB_RENUMBER flag in the RECNO section below for further discussion.r

rFMaintaining record counts within a btree introduces a serious point ofIcontention, namely the page locations where the record counts are stored.eFIn addition, the entire tree must be locked during both insertions andFdeletions, effectively single-threading the tree for those operations.FSpecifying DB_RECNUM can result in serious performance degradation for some applications and data sets.4It is an error to specify both DB_DUP and DB_RECNUM.

m
f

Hash

l

eKThe following additional fields and flags may be initialized in the DB_INFOtkstructure before calling db_open, when using the hash access method:t

I

u_int32_t h_ffactor;
The desired density within the hash table.uIIt is an approximation of the number of keys allowed to accumulate in anyh=one bucket, determining when the hash table grows or shrinks.rHThe default value is 0, indicating that the fill factor will be selected dynamically as pages are filled.Dp

u_int32_t (*h_hash)(const void *, u_int32_t);
The h_hash field is a user defined hash function;:if h_hash is NULL, a default hash function is used.BSince no hash function performs equally well on all possible data,Fthe user may find that the built-in hash function performs poorly witha particular data set.FUser specified hash functions must take a pointer to a byte string and3a length as arguments and return a u_int32_t value.

KIf a hash function is specified, hash_open will attempt to determineaMif the hash function specified is the same as the one with which the database8was created, and will fail if it detects that it is not.M

u_int32_t h_nelem;
An estimate of the final size of the hash table.oIf not set or set too low,7hash tables will expand gracefully as keys are entered,>9although a slight performance degradation may be noticed.tThe default value is 1.au

u_int32_t flags;
The following additional flags may be specified by logically OR'ing together one orcmore of the following values:i
eX

DB_DUP
Permit duplicate data items in the tree, i.e. insertion when the key ofCthe key/data pair being inserted already exists in the tree will befHsuccessful. The ordering of duplicates in the tree is determined by theHorder of insertion, unless the ordering is otherwise specified by use of,a cursor or a duplicate comparison function. ]

DB_DUPSORT
Sort duplicates within a set of data items. If the application does notu>specify a comparison function using the dup_compare element ofBthe DB_INFO structure, a default, lexical comparison will be used.

eGSpecifying that duplicates are to be sorted changes the behavior of theeŒDB->put operation as well as the DBcursor->c_put operation whenÁthe DB_KEYFIRST, DB_KEYLAST and DB_CURRENTtflags are specified.

v


Recno



iKThe following additional fields and flags may be initialized in the DB_INFO_lstructure before calling db_open, when using the recno access method:

e^

int re_delim;
For variable length records, if the re_source file is specifiedJand the DB_DELIMITER flag is set, the delimiting byte used to mark the endof a record in the source file.wJIf the re_source file is specified and the DB_DELIMITER flag is not<set, characters (i.e. \n, 0x0a) are interpreted asend-of-record markers.l@

u_int32_t re_len;
The length of a fixed-length record.l`

int re_pad;
For fixed length records, if the DB_PAD flag is set, the pad character forHshort records. If the DB_PAD flag is not set, characters (i.e.,0x20) are used for padding.bb

char *re_source;
The purpose of the re_source field is to provide fast access andFmodification to databases that are normally stored as flat text files.

tJIf the re_source field is non-NULL, it specifies an underlying flatNtext database file that is read to initialize a transient record number index.MIn the case of variable length records, the records are separated by the byte value re_delim.MFor example, standard UNIX byte stream files can be interpreted as a sequence/=of variable length records separated by characters.s

aNIn addition, when cached data would normally be written back to the underlyingEdatabase file (e.g., the close(2) or sync functions aresGcalled), the in-memory copy of the database will be written back to the,re_source file.

sNBy default, the backing source file is read lazily, i.e., records are not read:from the file until they are requested by the application.EIf multiple processes (not threads) are accessing a recno database/6concurrently and either inserting or deleting records,Ethe backing source file must be read in its entirety before more than<'a single process accesses the database,cDand only that process should specify the backing source file as part?of the db_open call.t4See the DB_SNAPSHOT flag below for more information.

EReading and writing the backing source file specified by re_sourcebBcannot be transactionally protected because it involves filesystemLoperations that are not part of the Berkeley DB transaction methodology.EFor this reason, if a temporary database is used to hold the records,i9i.e., a NULL was specified as the file argument tofZdb_open, it is possible to lose the contents of theHre_source file, e.g., if the system crashes at the right instant.GIf a file is used to hold the database, i.e., a file name was specifiedgas the file argument to db_open, normal database recoveryrJon that file can be used to prevent information loss, although it is stillIpossible that the contents of re_source will be lost if the systemacrashes.

bJThe re_source file must already exist (but may be zero-length) when9db_open is called.u

tJFor all of the above reasons, the re_source field is generally usedRto specify databases that are read-only for Berkeley DB applications, and that areLeither generated on the fly by software tools, or modified using a differentmechanism, e.g., a text editor. lu

u_int32_t flags;
The following additional flags may be specified by logically OR'ing together one orSmore of the following values:
d8

DB_DELIMITER
The re_delim field is set.G

DB_FIXEDLEN
The records are fixed-length, not byte delimited.iGThe structure element re_len specifies the length of the record,=Eand the structure element re_pad is used as the pad character."

/BAny records added to the database that are less than re_len$bytes long are automatically padded.EAny attempt to insert records into the database that are greater thanADre_len bytes long will cause the call to fail immediately andreturn an error._0

DB_PAD
The re_pad field is set.]

DB_RENUMBER
Specifying the DB_RENUMBER flag causes the logical record numbers to bemJmutable, and change as records are added to and deleted from the database.FFor example, the deletion of record number 4 causes records numbered 5+and greater to be renumbered downward by 1.dBIf a cursor was positioned to record number 4 before the deletion,Eit will reference the new record number 4, if any such record exists,oafter the deletion. EIf a cursor was positioned after record number 4 before the deletion,t-it will be shifted downward 1 logical record,c9continuing to reference the same record as it did before.l

e†Using the DB->put or DBcursor->c_put interfaces to createLnew records will cause the creation of multiple records if the record numberKis more than one greater than the largest record currently in the database.lNFor example, creating record 28, when record 25 was previously the last record=in the database, will create records 26 and 27 as well as 28.>LAttempts to retrieve records that were created in this manner will result inan error return of DB_KEYEMPTY.p

a6If a created record is not at the end of the database,Eall records following the new record will be automatically renumberede upward by 1. For example,Ethe creation of a new record numbered 8 causes records numbered 8 and.%greater to be renumbered upward by 1.(NIf a cursor was positioned to record number 8 or greater before the insertion,+it will be shifted upward 1 logical record,e9continuing to reference the same record as it did before.m

aMFor these reasons, concurrent access to a recno database with the DB_RENUMBERoDflag specified may be largely meaningless, although it is supported.]

DB_SNAPSHOT
This flag specifies that any specified re_source file be read in Kits entirety when db_open is called.sLIf this flag is not specified, the re_source file may be read lazily.
d
a
tt ÿÿt.M

u_int32_t h_nelem;
An estimate of the final size of the hash table.oIf not set or set too low,7hash tables will expand gracefully as keys are entered,>9although a slight performance degradation may be noticed.tThe default value is 1.au

u_int32_t flags;
The following additional flags may be specified by logically OR'ing together one orcmore of the