[PATCH v3 00/11] Introduce Simple atomic counters

Shuah Khan skhan at linuxfoundation.org
Fri Oct 16 21:56:53 UTC 2020


On 10/14/20 5:31 PM, Kees Cook wrote:
> On Wed, Oct 14, 2020 at 11:17:20AM +0200, Peter Zijlstra wrote:
>> On Tue, Oct 13, 2020 at 08:12:20PM -0600, Shuah Khan wrote:
>>
>>> They don't add any new behavior, As Kees mentioned they do give us a
>>> way to clearly differentiate atomic usages that can wrap.
>>
>> No it doesn't! atomic_t can wrap, this thing can wrap, no distinction.
>>
>> All it does is fragment the API and sow confusion. FOR NO BENEFIT.
> 
> I really don't see it this way. It's a distinct subset of the atomic_t
> API. The trouble that has existed here has been with an atomic_t being
> originally used NOT for lifetime management, that mutates into something
> like that because of the available API, but doing so without realizing
> it. atomic_t gets used for all kinds of algorithms, and the "counter"
> type is way too easily accidentally transformed into a "lifetime
> tracker" and we get bugs.
> 
> If we have a distinct type for wrapping-counters that limits the API,
> then it is much harder for folks to shoot themselves in the foot. I don't
> see why this is so bad: we end up with safer usage, more easily auditable
> code behavior ("how was this atomic_t instance _intended_ to be used?"),
> and no change in binary size.
> 
>>> There is no need to keep inc_return in this API as such. I included it
>>> so it can be used for above cases 1 and 2, so the users don't have to
>>> call inc() followed by read(). It can be left out of the API.
> 
> I go back and forth on this, but after looking at these instances,
> it makes sense to have inc_return(), for where counters are actually
> "serial numbers". An argument could be made[1], however, that such uses
> should not end up in the position of _reusing_ earlier identifiers, which
> means it's actually can't wrap. (And some cases just need u64 to make this
> happen[2] -- and in that specific case, don't even need to be atomic_t).
> 
> [1] https://lore.kernel.org/lkml/202010071334.8298F3FA7@keescook/
> [2] https://git.kernel.org/linus/d1e7fd6462ca9fc76650fbe6ca800e35b24267da
> 
>> Wrong! The atomic usage in mutex doesn't fall in any of those
>> categories.
> 
> But the atomic usage in mutex is *IN* mutex -- it's a separate data
> type, etc. We don't build mutexes manually, so why build counters
> manually?
> 
>> The only thing you're all saying that makes sense is that unintentional
>> wrapping can have bad consequences, the rest is pure confusion.
>>
>> Focus on the non-wrapping cases, _everything_ else is not going
>> anywhere.
> 
> I view this as a way to do so: this subset of wrapping cases is being
> identified and removed from the pool of all the atomic_t cases so that
> they will have been classified, and we can continue to narrow down all
> the atomic_t uses to find any potentially mis-used non-wrapping cases.
> 
> The other option is adding some kind of attribute to the declarations
> (which gets us the annotation) but doesn't provide a limit to the API.
> (e.g. no counter should ever call dec_return).
> 

Not sure about that. We have more than dec_return to deal with. More on
this below.

>> So audit the kernel, find the cases that should not wrap, categorize and
>> create APIs for them that trap the wrapping. But don't go around
>> confusing things that don't need confusion.
> 
> That's what's happening here. But as it turns out, it's easier to do
> this by employing both the process of elimination (mark the counters)
> and direct identification (mark the refcount_t). Then the pool of
> "unannotated" atomic_t instances continues to shrink.
> 

Right auditing is what is happening now.

Let me summarize the discussion:

atomic_t api provides a wide range of atomic operations as a base
api to implement atomic counters, bitops, spinlock interfaces.
The usages also evolved into being used for resource lifetimes and
state management. Then came refcount_t api to address resource lifetime
problems related to atomic_t wrapping.

There is a large overlap between the atomic_t api used for resource
lifetimes and just counters. Not all counters used for resource
lifetimes can be converted to refcount_t.

A few quick "git grep" numbers on atomic_t interfaces usage:

Common for all:

atomic_set() - 3418
atomic_read() - 5833
atomic_inc() - 3376
atomic_dec() - 2498
atomic_inc_return() - 612

Counters don't need these:

atomic_dec_return() - 295
atomic_add_return() - 209
atomic_sub_return() - 144
atomic_add() - 744
atomic_sub() - 371
atomic_dec_and_test() - 552

You can see from these numbers, the volume of common usages that make
it difficult to separate out counters vs. non-counter usages.

The problem we are now running into is, it is becoming difficult
weed out candidates for refcount_t conversion in this noise.

Isolating a smaller subset of arithmetic atomic ops to address this
specific counters use-case will help reduce noise. This way we can
go through this work once and convert all counters to use this narrow
scoped api and what is left is non-counter usages.

The current situation is more confusing and adding a narrowly focused
api for counters reduces it and makes it easier.

thanks,
-- Shuah


More information about the devel mailing list