Heiko Strathmann | 24 Apr 12:44 2012

GSOC Introduction - Kernel based two-sample and independence tests

Dear all,

my name is Heiko Strathmann. You might know me from IRC (nickname:
karlnapf). During this summer, I will be working on implementing
kernel based statistical tests for SHOGUN.
I am currently preparing for exams in my M.Sc. Machine Learning course
at UCL, London. Originally, I am from Germany where I did a Bachelor
in Computer Science. Last year, I participated in the GSoC and created
a framework for model selection for SHOGUN.

My project is mentored by Arthur Gretton, who I met him here at UCL. I
am doing some research about a linear time kernel two sample test with
him in context of my Master's project. That's why I came up with this
project idea for GSoC.

The project will make two methods available for SHOGUN:
1) A kernel two sample test, based on the Maximum Mean Discrepancy (MMD) [1].
2) A kernel based test for statistical independence based on the
Hilbert Schmidt Independence Criterion (HSIC) [2]

Both tests will be implemented in multiple versions (linear&quadratic
time, multiple ways of computing the significance threshold).
Before I start with the algorithms, I will create a framework for
statistical tests in SHOGUN to ensure extendibility. For both tests,
there are Matlab proof-of-concept implementations available (see

When the basic tests work, I will spend some time on kernel selection
and evtl. feature selection through HSIC (BAHSIC)
Finally, I will spend time on producing real world examples to
demonstrate usefulness of these methods

I am looking extremely forward to this project and also to get to know
all of you other guys. I am sure it will be a cool summer ;)
Many thanks to Sören, Sergey, all the mentors, and finally Google for
making all this possible!

See you around!


[1] http://people.kyb.tuebingen.mpg.de/arthur/mmd.htm
[2] http://people.kyb.tuebingen.mpg.de/arthur/indep.htm