Jens Alfke | 1 Jul 2011 17:23
Favicon
Gravatar

Re: Frugal Erlang vs Resources Hungry CouchDB


On Jun 30, 2011, at 11:37 PM, Zdravko Gligic wrote:

> But neither one even bothered trying to answer my question of whether
> just the last updated header or perhaps the last few are ever used.

Just the last one. But at any point in time, the last one is vital for recovery. It just becomes useless after
another one is successfully appended.

> I was also under an impression that the update headers (pointers to
> the root of btree) are also somehow being used for reading
> consistency.  If so then this might suggest that a database could be
> rolled back to some previous point in time.  How far back and how
> practical is another question.

Sort of. My understanding (I haven’t looked at the source) is that when a request handler begins, it reads
the header at the current EOF and finds the root node. After that it reads by starting from that root node.
But I think that after the request handling begins, the header isn’t looked at anymore.

I am not sure whether the db looks up older revisions of documents by starting from an earlier header
(“going back in time”); I don’t think so, because this would be inefficient (O(N)) for finding a
specific revision of a document. Instead my hunch is that each document points back to the position in the
file of its previous revision. (Again, disclaimer, I am extrapolating based on my knowledge of similar
data structures.)

You might find the blog post on CouchDBs internal structures interesting. It’s two years old, though, so
I don’t know how much of it is still accurate:
	http://horicky.blogspot.com/2008/10/couchdb-implementation.html

—Jens

Gmane