[PATCH 3/3] drivers: Add visorbus to the drivers/virt directory

Tue Aug 30 16:29:07 UTC 2016

> -----Original Message-----
> From: Greg KH [mailto:gregkh at linuxfoundation.org]
> Sent: Sunday, August 21, 2016 2:05 PM
> To: Kershner, David A <David.Kershner at unisys.com>
> 
> On Fri, Jun 10, 2016 at 11:23:56PM -0400, David Kershner wrote:
> > visorbus is currently located at drivers/staging/visorbus,
> > this patch moves it to drivers/virt.
> >
> > Signed-off-by: David Kershner <david.kershner at unisys.com>
> > Reviewed-by: Tim Sell <Timothy.Sell at unisys.com>
> 
> I picked one random file here, this last one, and found a number of
> "odd" things in it.
> 
> So, given that I can't really comment on the patch itself, I'm going to
> include the file below, quote it, and then provide some comments, ok?

There's obviously a lot of work we have to do here (and that work is
underway), but I'd like to focus this email on explaining a class of comments
you made about our error-handling.

You have correctly pointed out some areas where we are deficient in our
error-handling, but I think there are others where our error-handling is
indeed sufficient, but this is perhaps NOT obvious because we are sending
error codes to our partitioning firmware environment (via a message-passing
interface across a shared-memory interface known as the 'controlvm channel')
instead of specifying them in function return values.  This partitioning
firmware environment is NOT Linux, and is external to the Linux OS environment
where the visorbus driver is running.

It helps to understand the overall flow of visorchipset.c, which basically
fills the role of the server responsible for handling requests initiated by
the partitioning firmware environment:

* controlvm_periodic_work() / handle_command() reads a request from the
  partitioning firmware environment, which will be something like "create a
  virtual bus" or "create a virtual device".

* A case statement in handle_command() calls one of various functions to handle
  the request, which sends the success/failure result code of the operation
  back to the partitioning firmware environment using one of the paths:
  * bus_epilog() / bus_responder() / controlvm_respond()
  * device_epilog() / device_responder() / controlvm_respond()
  This success/failure result code is indicated via one of those
  CONTROLVM_RESP_* error mnemonics (which you clearly indicated your hatred
  for, and which we will change).

* The partitioning firmware environment will react accordingly to the error
  code returned.  E.g., if the request was to create the virtual bus containing
  the boot device, and the error code was CONTROLVM_RESP_ERROR_KMALLOC_FAILED,
  the partitioning firmware environment would fail the boot of the Linux guest
  and inform the user that the Linux guest boot failed due to an out-of-memory
  condition.

(This also should clarify why we use awkward mnemonics like 
CONTROLVM_RESP_ERROR_KMALLOC_FAILED instead of -ENOMEM -- it's because the
error codes are not defined by a Linux environment, because it isn't a Linux
environment that is actually making the controlvm requests.  But that's another
issue.)

Let me use visorchipset.c bus_create() as an example:

static void
bus_create(struct controlvm_message *inmsg)
{
         struct controlvm_message_packet *cmd = &inmsg->cmd;
         u32 bus_no = cmd->create_bus.bus_no;
         int rc = CONTROLVM_RESP_SUCCESS;
         struct visor_device *bus_info;
         struct visorchannel *visorchannel;

         bus_info = visorbus_get_device_by_id(bus_no, BUS_ROOT_DEVICE, NULL);
         if (bus_info && (bus_info->state.created == 1)) {
                 POSTCODE_LINUX_3(BUS_CREATE_FAILURE_PC, bus_no,
                                  POSTCODE_SEVERITY_ERR);
                 rc = -CONTROLVM_RESP_ERROR_ALREADY_DONE;
                 goto out_bus_epilog;
         }
         bus_info = kzalloc(sizeof(*bus_info), GFP_KERNEL);
         if (!bus_info) {
                 POSTCODE_LINUX_3(BUS_CREATE_FAILURE_PC, bus_no,
                                  POSTCODE_SEVERITY_ERR);
                 rc = -CONTROLVM_RESP_ERROR_KMALLOC_FAILED;
                 goto out_bus_epilog;
         }

         INIT_LIST_HEAD(&bus_info->list_all);
         bus_info->chipset_bus_no = bus_no;
         bus_info->chipset_dev_no = BUS_ROOT_DEVICE;

         POSTCODE_LINUX_3(BUS_CREATE_ENTRY_PC, bus_no, POSTCODE_SEVERITY_INFO);

         visorchannel = visorchannel_create(cmd->create_bus.channel_addr,
                                            cmd->create_bus.channel_bytes,
                                            GFP_KERNEL,
                                            cmd->create_bus.bus_data_type_uuid);

         if (!visorchannel) {
                 POSTCODE_LINUX_3(BUS_CREATE_FAILURE_PC, bus_no,
                                  POSTCODE_SEVERITY_ERR);
                 rc = -CONTROLVM_RESP_ERROR_KMALLOC_FAILED;
                 kfree(bus_info);
                 bus_info = NULL;
                 goto out_bus_epilog;
         }
         bus_info->visorchannel = visorchannel;
         if (uuid_le_cmp(cmd->create_bus.bus_inst_uuid, spar_siovm_uuid) == 0) {
                 dump_vhba_bus = bus_no;
                 save_crash_message(inmsg, CRASH_BUS);
         }

         POSTCODE_LINUX_3(BUS_CREATE_EXIT_PC, bus_no, POSTCODE_SEVERITY_INFO);

out_bus_epilog:
        bus_epilog(bus_info, CONTROLVM_BUS_CREATE, &inmsg->hdr,
                    rc, inmsg->hdr.flags.response_expected == 1);
}

This function obviously has NO return value, so one might assume that it
doesn't detect nor recover from errors.

But errors are indeed detected, and returned to the partitioning firmware
environment via the <rc> argument to bus_epilog().  Non-zero values for <rc>
will cause the partitioning firmware environment to most-likely stop the
boot-up of the Linux guest, as bus creation is typically critical.  The
error presented to the end-user will depend on the value of <rc>.

E.g., so even though no obvious error-recovery occurs above in-response to
kzalloc() failures, the fact that -CONTROLVM_RESP_ERROR_KMALLOC_FAILED is
provided to bus_epilog() is in-fact sufficient to report the error.

Is this making sense?
Can you suggest how we might modify our code to make this error-handling /
recovery strategy clearer?

Thanks.

Tim Sell