2 Jan 18:42
Batching improvements
Hi. Inspired by some work in collective.solr, I made some improvements to the batching logic in Plone over x-mas. This is a short explanation - I suspect the p.a.contentlisting and maybe p.a.search code needs to be adjusted to this as well. So far we did a catalog query of some sort, often with a sort_on argument (score, folder position, publication date, ...), then later on wrap it in Plone's Batch class in some template and at last decide which batch of 10 or 20 items to show. After my changes, we decide which batch to show first (reading variables from the request), then do the catalog query passing the batch variables into the query and at last wrap the result in the Batch class. The catalog can then use the batching hints internally to optimize things. I did these changes in the generic getFolderContents skin script and the queryCatalog method of the topic class. The optimizations are in ZCatalog itself in Zope 2.13 / Plone 4.1 and backported to Plone 4 via experimental.catalogqueryplan. You can do the template / code changes in any Plone version without any negative effect, as extra arguments to the catalog will simply be ignored. The two query arguments you need to pass on are b_start and b_size. b_start is usually just read from the request. b_size depends on the template. The b_size you pass to the catalog needs to be the batch size plus the orphan value - so usually b_size + 1. There's currently two optimizations in the catalog: 1. An implicit sort_limit is calculated as b_start + b_size. So if you only want to show the first 10 items, the catalog doesn't need to sort the entire resultset of possibly 10.000 matching items, but can stop after it has 10. Depending on the ratio of the limit to the resultset there's different strategies used for sorting which make this more performant. 2. In case a sort_limit is specified you only get as many brains back as the limit says (though maybe a couple more). In addition the result value has an "actual_result_count" attribute. This attribute states the number of matches, so the Batch class can still calculate the correct batch pagination links. The second optimization protects you from some bad code that we've gotten lately. Some people started constructing a list of dicts in a view method, instead of operating on the brains inside the template. If you do this, you so far instantiated every brain in the resultset and potentially called some expensive methods on them like toLocalizedTime. If you did this in the template code, you usually did it only for the current batch. With the new optimization, you won't get as many brains, so your code isn't quite as expensive. There's a number of obvious next steps we can do here to optimize things further. For example: 1. If b_start is greater than zero, only return the brains from that starting point on. We still need to sort, but you should only need to get back b_size+orphan items and not more. This avoids the extra cost of the dict-from-view-method-pattern. 2. If you request a batch from the second half of the resultset, invert the sorting order and only sort len(resultset)-b_start items. This would make the later batches more performant and the last one as performant as the first batch. Cheers, Hanno ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl
RSS Feed