Irina Preșa | 15 Jul 01:18 2011

[GSoC] Checkpointing vkernels status update


Below is the status for my GSoC project - Checkpointing Vkernels - and
a description of some of the encountered issues.

As a first part of the project, I implemented the checkpoint and
restore of a multithreaded program.
Basically, on thaw, I recreate and reschedule all the threads. I also
restore thread specific data such as TLS, signal mask or registers.

As for the vkernel's checkpointing subsystem I saved and restored the
network interfaces' state (currently only simulated with SIGCKPT and
SIGTHAW signal handlers). I am currently working at the save and
restore of the vmspaces but encountered the following issues:

1. We shouldn't let a cothread do the checkpoint and restore job (it
doesn't know about mycpu abstraction so no thread is associated with
it). So the checkpoint signals should be masked upon its creation.

2. Secondly, if the vkernel's idle thread is running when the SIGCKPT
is caught, I've decided to create and schedule another kernel thread
in order to handle the checkpoint job. (We shouldn't let the idle
thread work, especially if the checkpoint code might call a sleep).

It seems that I can't recreate vmspace0 from a normal vkernel thread
(we can't let a cothread handle checkpoint). I guess that its stack
isn't mapped on restore, so it crashes immediately. As a  solution, I
was thinking to recreate at least vmspace0 from kernel space, and let
the vkernel restore its other vmspaces afterwards.

Currently, the vmspace0 (vkernel's memory) seems to restore correctly
since I can access the struct proc for the virtual processes. I also
use a syscall module in order to analyse the vmspaces from the host
kernel's side before checkpoint and after restore, and they seem to be
correctly remapped. After recreating all the vmspaces it crashes when
coming back to the process that was running before.

I am currently debugging this crash (maybe the vkernel's side vmspaces
aren't restored ok). I'll also have to analyse carefully
synchronization/locks issues (for the moment I'm running the  vkernel
with UP) and the restore of the console mode (at a first look it seems
that a call to vcons_set_mode might be enough but haven't dig very

A more detailed description (status) of what I did so far can be found here [1].