[PATCH v2 2/2] staging: greybus: loopback_test: Fix race preventing test completion

Tue Jan 3 09:33:17 UTC 2017

On 02/01/17 17:27, Axel Haslam wrote:
> Hi Bryan,
> 
> On Mon, Jan 2, 2017 at 3:32 PM, Johan Hovold <johan at kernel.org> wrote:
>> Adding Axel on CC.
>>
>> On Thu, Dec 22, 2016 at 12:37:29AM +0000, Bryan O'Donoghue wrote:
>>> commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of
>>> inotify") changes the flow of determining when to break out of a loop
>>> polling for loopback test completion.
>>>
>>> The clause is_complete() which determines if all tests are complete - as
>>> used is subject to a race condition where one of the tests has completed
>>> but at least one other test has not. On real hardware this typically
>>> doesn't present itself however in gbsim - which is a lot slower due in-part
>>> to a panopoly of printouts - we see that running a loopback test to more
>>> than one Interface in gbsim will fail in all instances printing out
>>> "Iteration count did not finish".
>>
> 
> Im not sure why you might be getting this error. I think the while(1) loop
> should have exited when each interface has sent its event, and all the
> tests are finished.

Alex,

Feliz año nuevo (Google translate skillz at work)

What's happening is we break the loop when the number_of_events ==
number of fd indices captured here @ t->poll_count

open_poll_files() {
    /* Set the poll count equal to the number of handles to track */
    t->poll_count = fds_idx;
}

wait_for_complete() {

    while(1) {
        for (i = 0; i < t->poll_count; i++) {
            if(happy)
                number_of_events++;
        }
        if (number_of_events == t->poll_count)
            break;
    }

    if (!is_complete(t)) {
        fprintf(stderr, "life stinks\n");
        return -BROKEN;
    }
}

is_complete() - then wants all iteration_counts to be equal to maximum
or the test fails.

OTOH if the loop doesn't break until all of the tests are complete we
never hit that problem. The responsibility should be on kernel-space to
ensure all tests complete anyway IMO.

---
bod