Paolo Bonzini | 23 Dec 09:36 2011

[libvirt] virtio-scsi support proposal, v2

Here is a revised version of the virtio-scsi proposal.  There's actually
not too much left intact from v1. :)

The main simplification is in how SCSI hosts can be addressed in a stable

SCSI controller models

Existing controller models are "auto", "buslogic", "lsilogic", "lsias1068",
or "vmpvscsi".  The new controller model "virtio-scsi" is added.  The model
"lsilogic" is mapped to the existing "lsi" device in QEMU.

When PPC64 support will be added, another controller model "spapr-vscsi"
will be added.

Stable addressing for SCSI devices

The existing <address type='drive' ...> element will be extended as follows:

   <address type='drive' controller='...'
                        bus='...' target='...' unit='...'/>

where controller selects the qdev parent device, while bus/target/unit
are passed as qdev properties (the QEMU names are respectively channel,
scsi-id, lun).

Libvirt should check for the QEMU "" property.  If it
is unavailable, QEMU will only support channel=lun=0 and 0<=target<=7.

LUN passthrough: block devices

A SCSI block device from the host can be attached to a domain in two
ways: as an emulated LUN with SCSI commands implemented within QEMU,
or by passing SCSI commands down to the block device.  The former is
handled by the existing <disk type='file'>, <disk type='block'> and
<disk type='network'> XML syntax.  The latter is not yet supported.

On the QEMU side, LUN passthrough is implemented by one of the
scsi-generic and scsi-block devices.  Scsi-generic requires a /dev/sg
device name, and can be applied to any device.  scsi-block is only
available in QEMU 1.0 or newer, requires a block device, can be applied
only to block devices (sd/sr) and has better performance.

To implement LUN passthrough for block device, libvirt will add a new
<disk device='lun'> attribute.  When, device='lun' is passed, the device
attribute is ignored.


  <disk type='block' device='lun'>
    <disk name='qemu' type='raw'/>
    <source dev='/dev/sda'/>
    <target dev='sda' bus='scsi'>
    <address type='drive' controller='...'
                        bus='...' target='...' unit='...'/>

Also, virtio-blk handling will be enhanced to disable SG_IO passthrough
when <disk device='disk'>, and only enable it when <disk device='lun'>.

(I am not sure whether the 'lun' value should be for the type or device
attribute.  Laine has a patch to implement it for virtio disks which
uses "type").

This syntax makes it clear what is the passed-through device, and at
the same time it makes it very easy to switch a disk between emulated
and passthrough modes.  Also, a stable addressing for the source device
is provided by /dev/disk/by-id and /dev/disk/by-path.

Stable SCSI host addressing

SCSI host number in Linux is not stable.  An alternative stable
addressing is required to pass a whole host or target to a guest.

One place in which this could be supported is the SCSI volume pool

      <pool type='scsi'>
          <adapter name='host0'/>

libvirt will deprecate the above form for the adapter element and
provide the following forms:

          <adapter name='scsi_host0'/>

          <adapter parent='pci_0000_00_1f_2' unique_id='1'/>

The existing form changes from host0 to scsi_host0, for
consistency with the naming that is used in nodedev.  The new
parent/unique_id addressing uses a parent PCI device and a unique
id that Linux provides in sysfs.  In order to determine the SCSI
host number, libvirt would scan all files matched by the glob pattern
/sys/bus/pci/devices/0000:00:1f.2/*/scsi_host/*/unique_id, looking for
the one that contains "1".

The unique_id can be omitted.  In this case, the pool will refer
to the host with the smallest unique_id under the given device.

Furthermore, a SCSI pool can be restricted to one target using an
additional element:

          <adapter name='scsi_host0'/>
          <address type='scsi' bus='0' target='0'/>

(bus defaults to 0, target is mandatory).

Generic passthrough

Generic device passthrough at the LUN, target or host level builds
on the extensions to SCSI addressing from the previous section.

Passing a single LUN extends the <hostdev> tag as follows:

  <hostdev type='scsi'>
      <adapter name='scsi_host0'/>
      <address type='scsi' bus='0' target='0' unit='0'/>
      <address type='scsi' controller='...'
                        bus='...' target='...' unit='...'/>

This will map to a -drive QEMU option referring to a scsi-generic
device, and a "-device scsi-generic" option referring to the drive.
libvirt can determine the /dev/sg file to use by reading the directory
/sys/bus/scsi/devices/target*/*/scsi_generic.  These devices might also
be shown in the nodedev tree, similar to block devices.

Whenever a domain should receive all devices belonging to a SCSI host,
a similar <source> item should be included within the <controller
type='scsi'> element:

        <controller type='scsi' model='virtio-scsi'>
            <adapter name='scsi_host0'/>

In this case, libvirt should use scsi-block rather than scsi-generic
for block devices.

NPIV-based SCSI host passthrough

In NPIV, a virtual HBA is created using "virsh nodedev-create" and passed
to the guest.  Passing through a whole SCSI host is quite common when
using NPIV.  As a result, it is desirable to easily address virtual HBAs
both in SCSI storage pools and in <controller type='scsi'> elements.

Here are two proposals for how to refer to NPIV adapters:

1) add persistent nodedevs via commands nodedev-define, nodedev-undefine,
nodedev-start.  The persistent nodedevs have a name, and this can be
used simply with <adapter name='NAME'>.

2) Virtual adapters do have a stable address, namely its WWN.  This
can be used in a third <adapter> syntax:

      <adapter type='fc_host' wwpn='...' wwnn='...'/>