Paolo Bonzini | 23 Dec 09:36 2011
Picon

[libvirt] virtio-scsi support proposal, v2

Here is a revised version of the virtio-scsi proposal.  There's actually
not too much left intact from v1. :)

The main simplification is in how SCSI hosts can be addressed in a stable
manner.

SCSI controller models
======================

Existing controller models are "auto", "buslogic", "lsilogic", "lsias1068",
or "vmpvscsi".  The new controller model "virtio-scsi" is added.  The model
"lsilogic" is mapped to the existing "lsi" device in QEMU.

When PPC64 support will be added, another controller model "spapr-vscsi"
will be added.

Stable addressing for SCSI devices
==================================

The existing <address type='drive' ...> element will be extended as follows:

   <address type='drive' controller='...'
                        bus='...' target='...' unit='...'/>

where controller selects the qdev parent device, while bus/target/unit
are passed as qdev properties (the QEMU names are respectively channel,
scsi-id, lun).

Libvirt should check for the QEMU "scsi-disk.channel" property.  If it
is unavailable, QEMU will only support channel=lun=0 and 0<=target<=7.

LUN passthrough: block devices
==============================

A SCSI block device from the host can be attached to a domain in two
ways: as an emulated LUN with SCSI commands implemented within QEMU,
or by passing SCSI commands down to the block device.  The former is
handled by the existing <disk type='file'>, <disk type='block'> and
<disk type='network'> XML syntax.  The latter is not yet supported.

On the QEMU side, LUN passthrough is implemented by one of the
scsi-generic and scsi-block devices.  Scsi-generic requires a /dev/sg
device name, and can be applied to any device.  scsi-block is only
available in QEMU 1.0 or newer, requires a block device, can be applied
only to block devices (sd/sr) and has better performance.

To implement LUN passthrough for block device, libvirt will add a new
<disk device='lun'> attribute.  When, device='lun' is passed, the device
attribute is ignored.

Example:

  <disk type='block' device='lun'>
    <disk name='qemu' type='raw'/>
    <source dev='/dev/sda'/>
    <target dev='sda' bus='scsi'>
    <address type='drive' controller='...'
                        bus='...' target='...' unit='...'/>
  </disk>

Also, virtio-blk handling will be enhanced to disable SG_IO passthrough
when <disk device='disk'>, and only enable it when <disk device='lun'>.

(I am not sure whether the 'lun' value should be for the type or device
attribute.  Laine has a patch to implement it for virtio disks which
uses "type").

This syntax makes it clear what is the passed-through device, and at
the same time it makes it very easy to switch a disk between emulated
and passthrough modes.  Also, a stable addressing for the source device
is provided by /dev/disk/by-id and /dev/disk/by-path.

Stable SCSI host addressing
===========================

SCSI host number in Linux is not stable.  An alternative stable
addressing is required to pass a whole host or target to a guest.

One place in which this could be supported is the SCSI volume pool
syntax:

      <pool type='scsi'>
        <name>virtimages</name>
        <source>
          <adapter name='host0'/>
        </source>
        <target>
          <path>/dev/disk/by-id</path>
        </target>
      </pool>

libvirt will deprecate the above form for the adapter element and
provide the following forms:

          <adapter name='scsi_host0'/>

          <adapter parent='pci_0000_00_1f_2' unique_id='1'/>

The existing form changes from host0 to scsi_host0, for
consistency with the naming that is used in nodedev.  The new
parent/unique_id addressing uses a parent PCI device and a unique
id that Linux provides in sysfs.  In order to determine the SCSI
host number, libvirt would scan all files matched by the glob pattern
/sys/bus/pci/devices/0000:00:1f.2/*/scsi_host/*/unique_id, looking for
the one that contains "1".

The unique_id can be omitted.  In this case, the pool will refer
to the host with the smallest unique_id under the given device.

Furthermore, a SCSI pool can be restricted to one target using an
additional element:

        <source>
          <adapter name='scsi_host0'/>
          <address type='scsi' bus='0' target='0'/>
        </source>

(bus defaults to 0, target is mandatory).

Generic passthrough
===================

Generic device passthrough at the LUN, target or host level builds
on the extensions to SCSI addressing from the previous section.

Passing a single LUN extends the <hostdev> tag as follows:

  <hostdev type='scsi'>
    <source>
      <adapter name='scsi_host0'/>
      <address type='scsi' bus='0' target='0' unit='0'/>
    </source>
    <target>
      <address type='scsi' controller='...'
                        bus='...' target='...' unit='...'/>
    </target>
  </hostdev>

This will map to a -drive QEMU option referring to a scsi-generic
device, and a "-device scsi-generic" option referring to the drive.
libvirt can determine the /dev/sg file to use by reading the directory
/sys/bus/scsi/devices/target*/*/scsi_generic.  These devices might also
be shown in the nodedev tree, similar to block devices.

Whenever a domain should receive all devices belonging to a SCSI host,
a similar <source> item should be included within the <controller
type='scsi'> element:

        <controller type='scsi' model='virtio-scsi'>
          <source>
            <adapter name='scsi_host0'/>
          </source>
        </controller>

In this case, libvirt should use scsi-block rather than scsi-generic
for block devices.

NPIV-based SCSI host passthrough
================================

In NPIV, a virtual HBA is created using "virsh nodedev-create" and passed
to the guest.  Passing through a whole SCSI host is quite common when
using NPIV.  As a result, it is desirable to easily address virtual HBAs
both in SCSI storage pools and in <controller type='scsi'> elements.

Here are two proposals for how to refer to NPIV adapters:

1) add persistent nodedevs via commands nodedev-define, nodedev-undefine,
nodedev-start.  The persistent nodedevs have a name, and this can be
used simply with <adapter name='NAME'>.

2) Virtual adapters do have a stable address, namely its WWN.  This
can be used in a third <adapter> syntax:

    <source>
      <adapter type='fc_host' wwpn='...' wwnn='...'/>
    </source>


Gmane