[PATCH] android: binder: Disable preemption while holding the global binder lock

Tue Sep 13 03:44:09 UTC 2016

On Sat, Sep 10, 2016 at 10:28 AM, Greg Kroah-Hartman
<gregkh at linuxfoundation.org> wrote:
> On Sat, Sep 10, 2016 at 06:37:29PM +0200, Thomas Gleixner wrote:
>> On Sat, 10 Sep 2016, Peter Zijlstra wrote:
>>
>> > On Sat, Sep 10, 2016 at 09:16:59AM -0700, Christoph Hellwig wrote:
>> > > On Thu, Sep 08, 2016 at 09:12:50AM -0700, Todd Kjos wrote:
>> > > > In Android systems, the display pipeline relies on low
>> > > > latency binder transactions and is therefore sensitive to
>> > > > delays caused by contention for the global binder lock.
>> > > > Jank is siginificantly reduced by disabling preemption
>> > > > while the global binder lock is held.
>> > >
>> > > That's now how preempt_disable is supposed to use.  It is for critical
>> >
>> > not, that's supposed to be _not_. Just to be absolutely clear, this is
>> > NOT how you're supposed to use preempt_disable().
>> >
>> > > sections that use per-cpu or similar resources.
>> > >
>> > > >
>> > > > Originally-from: Riley Andrews <riandrews at google.com>
>> > > > Signed-off-by: Todd Kjos <tkjos at google.com>
>> >
>> > > > @@ -389,7 +390,11 @@ static int task_get_unused_fd_flags(struct
>> > > > binder_proc *proc, int flags)
>> > > >   rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
>> > > >   unlock_task_sighand(proc->tsk, &irqs);
>> > > >
>> > > > - return __alloc_fd(files, 0, rlim_cur, flags);
>> > > > + preempt_enable_no_resched();
>> > > > + ret = __alloc_fd(files, 0, rlim_cur, flags);
>> > > > + preempt_disable();
>> >
>> > And the fact that people want to use preempt_enable_no_resched() shows
>> > that they're absolutely clueless.
>> >
>> > This is so broken its not funny.
>> >
>> > NAK NAK NAK
>>
>> Indeed. Sprinkling random preempt_enabe/disable() pairs all over the place
>> documents clearly that this is tinkering and not proper software
>> engineering.
>
> I have pointed out in the other thread for this patch (the one that had
> a patch that could be applied) that the single lock in the binder code
> is the main problem here, it should be solved instead of this messing
> around with priorities.
>

While removing the single lock in the binder driver would help reduce
the problem that this patch tries to work around, it would not fix it.
The largest problems occur when a very low priority thread gets
preempted while holding the lock. When a high priority thread then
needs the same lock it can't get it. Changing the driver to use more
fine-grained locking would reduce the set of threads that can trigger
this problem, but there are processes that receive work from both high
and low priority threads and could still end up in the same situation.

A previous attempt to fix this problem, changed the lock to use
rt_mutex instead of mutex, but this apparently did not work as well as
this patch. I believe the added overhead was noticeable, and it did
not work when the preempted thread was in a different cgroup (I don't
know if this is still the case).

It would be useful to generic solution to this problem.

> So don't worry, I'm not taking this change :)
>
> thanks,
>
> greg k-h

-- 
Arve Hjønnevåg