Skip to content
Snippets Groups Projects
Commit 45b41fb4 authored by Tim Mooney's avatar Tim Mooney Committed by Howard Chu
Browse files

ITS#6906 Update cachesize recommendations

to remove references to indexes in Hash format

Fix whitespace error -- hyc
parent 57ce05ec
No related branches found
No related tags found
No related merge requests found
......@@ -234,10 +234,10 @@ will tell you how many internal pages are present in a database. You should
check this number for both dn2id and id2entry.
Also note that {{id2entry}} always uses 16KB per "page", while {{dn2id}} uses whatever
the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing the,
the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing,
your cache must be at least as large as the number of internal pages in both
the {{dn2id}} and {{id2entry}} databases, plus some extra space to accommodate the actual
leaf data pages.
the {{dn2id}} and {{id2entry}} databases, plus some extra space to accommodate
the actual leaf data pages.
For example, in my OpenLDAP 2.4 test database, I have an input LDIF file that's
about 360MB. With the back-hdb backend this creates a {{dn2id.bdb}} that's 68MB,
......@@ -252,23 +252,17 @@ This doesn't take into account other library overhead, so this is even lower
than the barest minimum. The default cache size, when nothing is configured,
is only 256KB.
This 2.5MB number also doesn't take indexing into account. Each indexed attribute
uses another database file of its own, using a Hash structure.
This 2.5MB number also doesn't take indexing into account. Each indexed
attribute results in another database file. Earlier versions of OpenLDAP
kept these index databases in Hash format, but from OpenLDAP 2.2 onward
the index databases are in B-tree format so the same procedure can
be used to calculate the necessary amount of cache for each index database.
Unlike the B-trees, where you only need to touch one data page to find an entry
of interest, doing an index lookup generally touches multiple keys, and the
point of a hash structure is that the keys are evenly distributed across the
data space. That means there's no convenient compact subset of the database that
you can keep in the cache to insure quick operation, you can pretty much expect
references to be scattered across the whole thing. My strategy here would be to
provide enough cache for at least 50% of all of the hash data.
For example, if your only index is for the objectClass attribute and db_stat
reveals that {{objectClass.bdb}} has 339 internal pages and uses 4096 byte
pages, the additional cache needed for just this attribute index is
> (Number of hash buckets + number of overflow pages + number of duplicate pages) * page size / 2.
The objectClass index for my example database is 5.9MB and uses 3 hash buckets
and 656 duplicate pages. So:
> ( 3 + 656 ) * 4KB / 2 =~ 1.3MB.
> (339+1) * 4KB =~ 1.3MB.
With only this index enabled, I'd figure at least a 4MB cache for this backend.
(Of course you're using a single cache shared among all of the database files,
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment