v5.1-rc1 binder_alloc_do_buffer_copy() BUG_ON triggered by selinux-testsuite

Wed Mar 20 15:54:01 UTC 2019

On Tue, Mar 19, 2019 at 8:04 PM Paul Moore <paul at paul-moore.com> wrote:
>
> On Tue, Mar 19, 2019 at 9:08 PM Todd Kjos <tkjos at google.com> wrote:
> > Paul,
> >
> > Looking at a snippet of the test output:
> >
> >     Service Provider read_consumed: 8
> >     Service Provider command: BR_NOOP
> >     Service Provider command: BR_FAILED_REPLY  <-------------- the txn
> > failed as expected.
> >     Manager read_consumed: 8
> >     Manager command: BR_NOOP
> >     Manager command: BR_TRANSACTION_COMPLETE
> >     not ok 3
> >         <--------- but for some reason didn't exit(103)
> >     #   Failed test at ./test line 56.
> >
> > It looks like the test script expects test_binder to fail with
> > exit(103) after processing the Server Provider commands. It's not
> > clear why it didn't, since the return of a BR_FAILED_REPLY for that
> > transaction should have executed this code (line 392 of
> > test_binder.c):
> >
> >     if (cmd == BR_FAILED_REPLY ||
> >         cmd == BR_DEAD_REPLY ||
> >         cmd == BR_DEAD_BINDER) {
> >         fprintf(stderr,
> >                   "Failed response from Service Provider using Managers FD\n");
> >         exit(103);
> >     }
> >
> > Could this be an issue with the test? At least it doesn't look like a
> > transaction is succeeding when it shouldn't.
>
> Hi Todd,
>
> Thanks for looking into this further.  Look a bit more at the test, it
> appears that the code above (line 392) only comes into play if the
> service provider is handling a BR_REPLY, but looking at the test
> output it doesn't appear that this is the case.
>
> # runcon -t test_binder_provider_no_im_t ./test_binder -v -r 2 provider
> Service Provider PID: 2095 Process context:
>        unconfined_u:unconfined_r:test_binder_provider_no_im_t:s0-s0:c0.c1023
> Service Provider sending transaction to Manager - ADD_TEST_SERVICE
> Service Provider read_consumed: 48
> Service Provider command: BR_NOOP
> Service Provider command: BR_INCREFS
> Service Provider command: BR_ACQUIRE
> Service Provider command: BR_TRANSACTION_COMPLETE
>
> Service Provider read_consumed: 8
> Service Provider command: BR_NOOP
> Service Provider command: BR_FAILED_REPLY

So, then it sounds like the test is not running properly and not
really testing what it intends to test (which I guess is consistent
with the fact that it triggered that BUG_ON -- the transaction is
malformed and failing early). It doesn't look like there is a
successful transaction that should have failed due to selinux policy
-- it looks like there is an invalid transaction that probably fails
earlier and doesn't return 103 (it probably returns 8 -- it would be
useful if the test script displayed the exit value that was detected
as a failure).

I don't think there is much I can do on this now given the apparent
flakiness, but I'm happy to help when there is a stable issue. I'll
look a little more at the test code to see if I can spot what is wrong
with the transaction.

Can I add a "Tested-by: Paul Moore <paul at paul-moore.com>" on my patch
submission to fix the BUG_ON (the exact patch you tested) ?

-Todd

>
> However, things get weird.  In the course of writing this email I
> booted into my 5.0.0-1.1.secnext kernel (which passed the binder test
> earlier) and now that kernel is failing in the same way (the test
> hasn't changed).  This test system is a Fedora Rawhide system which is
> updated before I make a test run and I'm wondering if there is some
> other userspace component which may be affecting this ... ?  I've now
> tried this on two different, current Rawhide VMs, hosted on two
> different systems and I'm seeing the same thing, so I don't think it's
> a *bad* system/VM?
>
> > On Tue, Mar 19, 2019 at 5:15 PM Todd Kjos <tkjos at google.com> wrote:
> > >
> > > [...]
> > >
> > > > > Is there a public dashboard where I can take a look at those binder failures?
> > > >
> > > > Not really.  I send test results to a not-yet-publicized mailing list,
> > > > but there is more detail in the GitHub issue below (my last comment
> > > > has the verbose test output):
> > > >
> > > > * https://github.com/SELinuxProject/selinux-kernel/issues/46
> > > >
> > >
> > > Ok, so it looks like something was introduced that causes binder to be
> > > too permissive (test 3 transaction succeeded when failure is
> > > expected). I don't know of any recent binder changes that could have
> > > caused that.
> > >
> > > It will take me a while to set up this test environment. Is this easy
> > > for you to run? Any chance of bisecting or at least trying a few
> > > versions to narrow it down? Here's a list of the recent patchset -- it
> > > would be useful to know which caused it (or if none of them did):
> > >
> > > 9e98c678c2d6a Linux 5.1-rc1
> > > ...
> > > 26528be6720bb binder: fix handling of misaligned binder object
> > > bde4a19fc04f5 binder: use userspace pointer as base of buffer space
> > > c41358a5f5217 binder: remove user_buffer_offset
> > > db6b0b810bf94 binder: avoid kernel vm_area for buffer fixups
> > > 7a67a39320dfb binder: add function to copy binder object from buffer
> > > 8ced0c6231ead binder: add functions to copy to/from binder buffers
> > > 1a7c3d9bb7a92 binder: create userspace-to-binder-buffer copy function
> > > ...
> > > 1c163f4c7b3f6 (tag: v5.0) Linux 5.0
> > >
> > > Thanks,
> > >
> > > -Todd
>
>
>
> --
> paul moore
> www.paul-moore.com