[RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

peter enderborg peter.enderborg at sonymobile.com
Mon Apr 3 13:47:59 UTC 2017


On 03/31/2017 07:53 PM, Douglas Anderson wrote:
> Sometimes when we're out of memory the OOM killer decides to kill a
> process that's in binder_thread_read().  If we happen to be waiting
> for work we'll get the kill signal and wake up.  That's good.  ...but
> then we try to grab the binder lock before we return.  That's bad.
>
> The problem is that someone else might be holding the one true global
> binder lock.  If that one other process is blocked then we can't
> finish exiting.  In the worst case, the other process might be blocked
> waiting for memory.  In that case we'll have a really hard time
> exiting.
>
> On older kernels that don't have the OOM reaper (or something
> similar), like kernel 4.4, this is a really big problem and we end up
> with a simple deadlock because:
> * Once we pick a process to OOM kill we won't pick another--we first
>   wait for the process we picked to die.  The reasoning is that we've
>   given the doomed process access to special memory pools so it can
>   quit quickly and we don't have special pool memory to go around.
> * We don't have any type of "special access donation" that would give
>   the mutex holder our special access.
>
> On kernel 4.4 w/ binder patches, we easily see this happen:
>
> We just attempted this OOM kill:
>   Killed process 4132 (Binder_1) total-vm:949904kB, anon-rss:4kB, file-rss:0kB
>
> The doomed task:
>   Stack traceback for pid 4132
>        4132     3380  0    0   D  Binder_1
>   Call trace:
>    __switch_to+0x9c/0xa8
>    __schedule+0x504/0x740
>    schedule+0x88/0xa8
>    schedule_preempt_disabled+0x28/0x44
>    __mutex_lock_slowpath+0xf8/0x1a4
>    mutex_lock+0x4c/0x68
>    binder_thread_read+0x68c/0x11bc
>    binder_ioctl_write_read.constprop.46+0x1e8/0x318
>    binder_ioctl+0x370/0x778
>    compat_SyS_ioctl+0x134/0x10ac
>    el0_svc_naked+0x24/0x28
>
> The binder lock holder:
>        4001     3380  0    4   R    Binder_7
>   Call trace:
>    __switch_to+0x9c/0xa8
>    __schedule+0x504/0x740
>    preempt_schedule_common+0x28/0x48
>    preempt_schedule+0x24/0x2c
>    _raw_spin_unlock+0x44/0x50
>    list_lru_count_one+0x40/0x50
>    super_cache_count+0x68/0xa0
>    shrink_slab.part.54+0xfc/0x4a4
>    shrink_zone+0xa8/0x198
>    try_to_free_pages+0x408/0x590
>    __alloc_pages_nodemask+0x60c/0x95c
>    __read_swap_cache_async+0x98/0x214
>    read_swap_cache_async+0x48/0x7c
>    swapin_readahead+0x188/0x1b8
>    handle_mm_fault+0xbf8/0x1050
>    do_page_fault+0x140/0x2dc
>    do_mem_abort+0x64/0xd8
>   Exception stack: < ... cut ... >
>    el1_da+0x18/0x78
>    binder_ioctl+0x370/0x778
>    compat_SyS_ioctl+0x134/0x10ac
>    el0_svc_naked+0x24/0x28
>
> There's really not a lot of reason to grab the binder lock when we're
> just going to exit anyway.  Add a little bit of complexity to the code
> to allow binder_thread_read() to sometimes return without the lock
> held.
>
> NOTE: to do this safely we need to make sure that we can safely do
> these things _without_ the global binder lock:
> * Muck with proc->ready_threads
> * Clear a bitfield in thread->looper
>
> It appears that those two operations don't need to be done together
> and it's OK to have different locking solutions for the two.  Thus:
>
> 1. We change thread->looper to atomic_t and use atomic ops to muck
>    with it.  This means we're not clobbering someone else's work with
>    our read/modify/write.
>
>    Note that I haven't confirmed that every modify of "thread->looper"
>    can be done without the binder mutex or without some
>    coarser-grained lock.  ...but clearing the
>    BINDER_LOOPER_STATE_WAITING bit should be fine.  The only place
>    looking at it is binder_deferred_flush().  Also: while erroring out
>    we also may clear BINDER_LOOPER_STATE_NEED_RETURN.  This looks to
>    be OK, though the code isn't trivial to follow.
>
> 2. We add a new fine-grained mutex (per "proc" structure) to guard all
>    of the "_threads" counters.  Technically the only value we're
>    modifying is "proc->ready_threads" and we're just decrementing it
>    (so maybe we could use atomic_t here, too?).  ...but it appears
>    that all of the "_threads" work together so protecting them
>    together seems to make the most sense.
>
>    Note that to avoid deadlock:
>    * We never first grab the fine grained mutex and _then_ the binder
>      mutex.
>    * We always grab the fine grained mutex for short periods and never
>      do anything blocking while holding it.
>
> To sum it all up:
>
> 1. This patch does improve the chances that we will be able to kill a
>    task quickly when we want to.  That means this patch has some
>    merit.
> 2. Though this patch has merit, it isn't isn't actually all that
>    critical because:
>    2a) On newer kernels the OOM reaper should keep us from getting
>        deadlocked.
>    2b) Even on old kernels there are other deadlock cases we hit even
>        if we fix binder in this way.
>
> Signed-off-by: Douglas Anderson <dianders at chromium.org>
> ---
> It's unclear to me if we really want this patch since, as per the
> description, it's not all that critical.  I also don't personally have
> enough context with Binder to prove that every change it makes is
> guaranteed not to screw something up.
>
> I post it in case someone is interested or someone who knows the code
> well can say "yes, this is right and good".  :)
>
> This patch was tested against the Chrome OS 4.4 kernel.  It was only
> compile-tested against upstream since I don't have binder up and
> running there.
>
>  drivers/android/binder.c | 116 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 81 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index aae4d8d4be36..220b74b7100d 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -18,6 +18,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <asm/cacheflush.h>
> +#include <linux/atomic.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -347,10 +348,13 @@ struct binder_proc {
>  	wait_queue_head_t wait;
>  	struct binder_stats stats;
>  	struct list_head delivered_death;
> +
>  	int max_threads;
>  	int requested_threads;
>  	int requested_threads_started;
>  	int ready_threads;
> +	struct mutex threads_lock;
> +
>  	long default_priority;
>  	struct dentry *debugfs_entry;
>  	struct binder_context *context;
> @@ -369,7 +373,7 @@ struct binder_thread {
>  	struct binder_proc *proc;
>  	struct rb_node rb_node;
>  	int pid;
> -	int looper;
> +	atomic_t looper;
>  	struct binder_transaction *transaction_stack;
>  	struct list_head todo;
>  	uint32_t return_error; /* Write failed, return error code in read buf */
> @@ -458,6 +462,15 @@ static inline void binder_lock(const char *tag)
>  	trace_binder_locked(tag);
>  }
>  
> +static inline int __must_check binder_lock_interruptible(const char *tag)
> +{
> +	trace_binder_lock(tag);
> +	if (mutex_lock_interruptible(&binder_main_lock))
> +		return -ERESTARTSYS;
> +	trace_binder_locked(tag);
> +	return 0;
> +}
> +
>  static inline void binder_unlock(const char *tag)
>  {
>  	trace_binder_unlock(tag);
> @@ -2450,39 +2463,41 @@ static int binder_thread_write(struct binder_proc *proc,
>  		}
>  
>  		case BC_REGISTER_LOOPER:
> +			mutex_lock(&proc->threads_lock);
>  			binder_debug(BINDER_DEBUG_THREADS,
>  				     "%d:%d BC_REGISTER_LOOPER\n",
>  				     proc->pid, thread->pid);
> -			if (thread->looper & BINDER_LOOPER_STATE_ENTERED) {
> -				thread->looper |= BINDER_LOOPER_STATE_INVALID;
> +			if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_ENTERED) {
> +				atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
>  				binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called after BC_ENTER_LOOPER\n",
>  					proc->pid, thread->pid);
>  			} else if (proc->requested_threads == 0) {
> -				thread->looper |= BINDER_LOOPER_STATE_INVALID;
> +				atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
>  				binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called without request\n",
>  					proc->pid, thread->pid);
>  			} else {
>  				proc->requested_threads--;
>  				proc->requested_threads_started++;
>  			}
> -			thread->looper |= BINDER_LOOPER_STATE_REGISTERED;
> +			atomic_or(BINDER_LOOPER_STATE_REGISTERED, &thread->looper);
> +			mutex_unlock(&proc->threads_lock);
>  			break;
>  		case BC_ENTER_LOOPER:
>  			binder_debug(BINDER_DEBUG_THREADS,
>  				     "%d:%d BC_ENTER_LOOPER\n",
>  				     proc->pid, thread->pid);
> -			if (thread->looper & BINDER_LOOPER_STATE_REGISTERED) {
> -				thread->looper |= BINDER_LOOPER_STATE_INVALID;
> +			if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_REGISTERED) {
> +				atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
>  				binder_user_error("%d:%d ERROR: BC_ENTER_LOOPER called after BC_REGISTER_LOOPER\n",
>  					proc->pid, thread->pid);
>  			}
> -			thread->looper |= BINDER_LOOPER_STATE_ENTERED;
> +			atomic_or(BINDER_LOOPER_STATE_ENTERED, &thread->looper);
>  			break;
>  		case BC_EXIT_LOOPER:
>  			binder_debug(BINDER_DEBUG_THREADS,
>  				     "%d:%d BC_EXIT_LOOPER\n",
>  				     proc->pid, thread->pid);
> -			thread->looper |= BINDER_LOOPER_STATE_EXITED;
> +			atomic_or(BINDER_LOOPER_STATE_EXITED, &thread->looper);
>  			break;
>  
>  		case BC_REQUEST_DEATH_NOTIFICATION:
> @@ -2538,7 +2553,7 @@ static int binder_thread_write(struct binder_proc *proc,
>  				ref->death = death;
>  				if (ref->node->proc == NULL) {
>  					ref->death->work.type = BINDER_WORK_DEAD_BINDER;
> -					if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> +					if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
>  						list_add_tail(&ref->death->work.entry, &thread->todo);
>  					} else {
>  						list_add_tail(&ref->death->work.entry, &proc->todo);
> @@ -2562,7 +2577,7 @@ static int binder_thread_write(struct binder_proc *proc,
>  				ref->death = NULL;
>  				if (list_empty(&death->work.entry)) {
>  					death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
> -					if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> +					if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
>  						list_add_tail(&death->work.entry, &thread->todo);
>  					} else {
>  						list_add_tail(&death->work.entry, &proc->todo);
> @@ -2604,7 +2619,7 @@ static int binder_thread_write(struct binder_proc *proc,
>  			list_del_init(&death->work.entry);
>  			if (death->work.type == BINDER_WORK_DEAD_BINDER_AND_CLEAR) {
>  				death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
> -				if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> +				if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
>  					list_add_tail(&death->work.entry, &thread->todo);
>  				} else {
>  					list_add_tail(&death->work.entry, &proc->todo);
> @@ -2638,19 +2653,20 @@ static int binder_has_proc_work(struct binder_proc *proc,
>  				struct binder_thread *thread)
>  {
>  	return !list_empty(&proc->todo) ||
> -		(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
> +		(atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
>  }
>  
>  static int binder_has_thread_work(struct binder_thread *thread)
>  {
>  	return !list_empty(&thread->todo) || thread->return_error != BR_OK ||
> -		(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
> +		(atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
>  }
>  
>  static int binder_thread_read(struct binder_proc *proc,
>  			      struct binder_thread *thread,
>  			      binder_uintptr_t binder_buffer, size_t size,
> -			      binder_size_t *consumed, int non_block)
> +			      binder_size_t *consumed, int non_block,
> +			      bool *locked)
>  {
>  	void __user *buffer = (void __user *)(uintptr_t)binder_buffer;
>  	void __user *ptr = buffer + *consumed;
> @@ -2687,21 +2703,24 @@ static int binder_thread_read(struct binder_proc *proc,
>  		goto done;
>  	}
>  
> -
> -	thread->looper |= BINDER_LOOPER_STATE_WAITING;
> +	atomic_or(BINDER_LOOPER_STATE_WAITING, &thread->looper);
> +	mutex_lock(&proc->threads_lock);
>  	if (wait_for_proc_work)
>  		proc->ready_threads++;
> +	mutex_unlock(&proc->threads_lock);
>  
>  	binder_unlock(__func__);
> +	*locked = false;
>  
>  	trace_binder_wait_for_work(wait_for_proc_work,
>  				   !!thread->transaction_stack,
>  				   !list_empty(&thread->todo));
>  	if (wait_for_proc_work) {
> -		if (!(thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
> -					BINDER_LOOPER_STATE_ENTERED))) {
> +		if (!(atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
> +						      BINDER_LOOPER_STATE_ENTERED))) {
>  			binder_user_error("%d:%d ERROR: Thread waiting for process work before calling BC_REGISTER_LOOPER or BC_ENTER_LOOPER (state %x)\n",
> -				proc->pid, thread->pid, thread->looper);
> +				proc->pid, thread->pid,
> +				atomic_read(&thread->looper));
>  			wait_event_interruptible(binder_user_error_wait,
>  						 binder_stop_on_user_error < 2);
>  		}
> @@ -2719,14 +2738,23 @@ static int binder_thread_read(struct binder_proc *proc,
>  			ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread));
>  	}
>  
> -	binder_lock(__func__);
> -
> +	/*
> +	 * Update these _without_ grabbing the binder lock since we might be
> +	 * about to return an error.
> +	 */
> +	mutex_lock(&proc->threads_lock);
>  	if (wait_for_proc_work)
>  		proc->ready_threads--;
> -	thread->looper &= ~BINDER_LOOPER_STATE_WAITING;
> +	mutex_unlock(&proc->threads_lock);
> +	atomic_and(~BINDER_LOOPER_STATE_WAITING, &thread->looper);
> +
> +	if (ret)
> +		return ret;
>  
> +	ret = binder_lock_interruptible(__func__);
>  	if (ret)
>  		return ret;
> +	*locked = true;
>  
>  	while (1) {
>  		uint32_t cmd;
> @@ -2743,7 +2771,7 @@ static int binder_thread_read(struct binder_proc *proc,
>  		} else {
>  			/* no data added */
>  			if (ptr - buffer == 4 &&
> -			    !(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN))
> +			    !(atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN))
>  				goto retry;
>  			break;
>  		}
> @@ -2957,19 +2985,23 @@ static int binder_thread_read(struct binder_proc *proc,
>  done:
>  
>  	*consumed = ptr - buffer;
> +	mutex_lock(&proc->threads_lock);
>  	if (proc->requested_threads + proc->ready_threads == 0 &&
>  	    proc->requested_threads_started < proc->max_threads &&
> -	    (thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
> +	    (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
>  	     BINDER_LOOPER_STATE_ENTERED)) /* the user-space code fails to */
>  	     /*spawn a new thread if we leave this out */) {
>  		proc->requested_threads++;
>  		binder_debug(BINDER_DEBUG_THREADS,
>  			     "%d:%d BR_SPAWN_LOOPER\n",
>  			     proc->pid, thread->pid);
> -		if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer))
> +		if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer)) {
> +			mutex_unlock(&proc->threads_lock);
>  			return -EFAULT;
> +		}
>  		binder_stat_br(proc, thread, BR_SPAWN_LOOPER);
>  	}
> +	mutex_unlock(&proc->threads_lock);
>  	return 0;
>  }
>  
> @@ -3051,7 +3083,7 @@ static struct binder_thread *binder_get_thread(struct binder_proc *proc)
>  		INIT_LIST_HEAD(&thread->todo);
>  		rb_link_node(&thread->rb_node, parent, p);
>  		rb_insert_color(&thread->rb_node, &proc->threads);
> -		thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
> +		atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
>  		thread->return_error = BR_OK;
>  		thread->return_error2 = BR_OK;
>  	}
> @@ -3133,7 +3165,8 @@ static unsigned int binder_poll(struct file *filp,
>  
>  static int binder_ioctl_write_read(struct file *filp,
>  				unsigned int cmd, unsigned long arg,
> -				struct binder_thread *thread)
> +				struct binder_thread *thread,
> +				bool *locked)
>  {
>  	int ret = 0;
>  	struct binder_proc *proc = filp->private_data;
> @@ -3172,7 +3205,7 @@ static int binder_ioctl_write_read(struct file *filp,
>  		ret = binder_thread_read(proc, thread, bwr.read_buffer,
>  					 bwr.read_size,
>  					 &bwr.read_consumed,
> -					 filp->f_flags & O_NONBLOCK);
> +					 filp->f_flags & O_NONBLOCK, locked);
>  		trace_binder_read_done(ret);
>  		if (!list_empty(&proc->todo))
>  			wake_up_interruptible(&proc->wait);
> @@ -3243,6 +3276,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  	struct binder_thread *thread;
>  	unsigned int size = _IOC_SIZE(cmd);
>  	void __user *ubuf = (void __user *)arg;
> +	bool locked;
>  
>  	/*pr_info("binder_ioctl: %d:%d %x %lx\n",
>  			proc->pid, current->pid, cmd, arg);*/
> @@ -3258,6 +3292,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		goto err_unlocked;
>  
>  	binder_lock(__func__);
> +	locked = true;
>  	thread = binder_get_thread(proc);
>  	if (thread == NULL) {
>  		ret = -ENOMEM;
> @@ -3266,15 +3301,18 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  
>  	switch (cmd) {
>  	case BINDER_WRITE_READ:
> -		ret = binder_ioctl_write_read(filp, cmd, arg, thread);
> +		ret = binder_ioctl_write_read(filp, cmd, arg, thread, &locked);
>  		if (ret)
>  			goto err;
>  		break;
>  	case BINDER_SET_MAX_THREADS:
> +		mutex_lock(&proc->threads_lock);
>  		if (copy_from_user(&proc->max_threads, ubuf, sizeof(proc->max_threads))) {
>  			ret = -EINVAL;
> +			mutex_unlock(&proc->threads_lock);
>  			goto err;
>  		}
> +		mutex_unlock(&proc->threads_lock);
>  		break;
>  	case BINDER_SET_CONTEXT_MGR:
>  		ret = binder_ioctl_set_ctx_mgr(filp);
> @@ -3308,8 +3346,9 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  	ret = 0;
>  err:
>  	if (thread)
> -		thread->looper &= ~BINDER_LOOPER_STATE_NEED_RETURN;
> -	binder_unlock(__func__);
> +		atomic_and(~BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> +	if (locked)
> +		binder_unlock(__func__);
>  	wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2);
>  	if (ret && ret != -ERESTARTSYS)
>  		pr_info("%d:%d ioctl %x %lx returned %d\n", proc->pid, current->pid, cmd, arg, ret);
> @@ -3473,6 +3512,7 @@ static int binder_open(struct inode *nodp, struct file *filp)
>  	binder_dev = container_of(filp->private_data, struct binder_device,
>  				  miscdev);
>  	proc->context = &binder_dev->context;
> +	mutex_init(&proc->threads_lock);
>  
>  	binder_lock(__func__);
>  
> @@ -3521,8 +3561,9 @@ static void binder_deferred_flush(struct binder_proc *proc)
>  	for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) {
>  		struct binder_thread *thread = rb_entry(n, struct binder_thread, rb_node);
>  
> -		thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
> -		if (thread->looper & BINDER_LOOPER_STATE_WAITING) {
> +		atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> +		if (atomic_read(&thread->looper) &
> +		    BINDER_LOOPER_STATE_WAITING) {
>  			wake_up_interruptible(&thread->wait);
>  			wake_count++;
>  		}
> @@ -3827,7 +3868,8 @@ static void print_binder_thread(struct seq_file *m,
>  	size_t start_pos = m->count;
>  	size_t header_pos;
>  
> -	seq_printf(m, "  thread %d: l %02x\n", thread->pid, thread->looper);
> +	seq_printf(m, "  thread %d: l %02x\n", thread->pid,
> +		   atomic_read(&thread->looper));
>  	header_pos = m->count;
>  	t = thread->transaction_stack;
>  	while (t) {
> @@ -4019,6 +4061,8 @@ static void print_binder_proc_stats(struct seq_file *m,
>  	struct rb_node *n;
>  	int count, strong, weak;
>  
> +	mutex_lock(&proc->threads_lock);
> +
>  	seq_printf(m, "proc %d\n", proc->pid);
>  	seq_printf(m, "context %s\n", proc->context->name);
>  	count = 0;
> @@ -4064,6 +4108,8 @@ static void print_binder_proc_stats(struct seq_file *m,
>  	seq_printf(m, "  pending transactions: %d\n", count);
>  
>  	print_binder_stats(m, "  ", &proc->stats);
> +
> +	mutex_unlock(&proc->threads_lock);
>  }
>  
>  

I think the real problem is that get locked in kernel space waiting for a mutex lock. It should use  mutex_lock_interruptible.



More information about the devel mailing list