Jason Garrett-Glaser | 4 Jul 11:55 2012

Re: variance-based adaptive quantization

On Tue, Jul 3, 2012 at 9:59 PM, Khaled MAMOU <kmamou <at> gmail.com> wrote:
> Dear all,
> First, congratulations for the great work!
> I am interested in learning more about the "variance-based adaptive
> quantization" algorithm
> (http://git.videolan.org/?p=x264.git;a=commitdiff;h=dc4f40ce74c996a1e15021b82ab71ebcb8652d0b).
> The idea seems to be simple and efficient. Please, is there any
> theoretical foundations behind the formula used to adjust the QP based
> on the variance of each MB?

The (very rough) intuitive justification works something like this.

Imagine every macroblock has just one frequency coefficient.  This
coefficient can be big or small.

If a macroblock's coefficient is 1.4, it gets quantized to 1.  That's
an error of 28.5%.

If a macroblock's coefficient is 9.4, it gets quantized to 9.  That's
an error of 4.3%.

Clearly, larger coefficients are coded more precisely than smaller
ones when using a linear quantizer.  Visually, it is obvious this is
somewhat wrong; just because a detail is 10 times higher-contrast in
block X than block Y doesn't mean block Y should be completely

One solution to this would be to use a nonlinear quantizer like in
AAC.  Obviously this is not possible without changing the H.264 spec,
and would also be much slower.  Turns out that it's actually not that
useful from my own testing -- this is because large coefficients in a
block tend to mask small coefficients.  If you have a few big ones,
there's no point in coding the small ones with tons of precision.

Since quantizer in H.264 is exponential, using log2(variance) to
change QP is equivalent to using (variance) to change quantizer step
size.  Therefore:

qp += log(variance)*C
qp = qp + log(variance) * C
e^qp = e^(qp + log(variance) * C)
qscale = qscale * e^(log(variance)*C)
qscale = qscale * variance^C
qscale *= variance^C

The constant C was decided through wild guessing and experimentation
and is based on nothing in particular.

As it happens, SSIM basically is PSNR weighted by variance in a
similar fashion, albeit less explicitly so.


P.S. This is all handwaving.
x264-devel mailing list
x264-devel <at> videolan.org