linux-next network throughput performance regression

Mon Nov 9 03:23:43 UTC 2015

From: Dexuan Cui <decui at microsoft.com>
Date: Mon, 9 Nov 2015 03:11:35 +0000

>> -----Original Message-----
>> From: David Miller [mailto:davem at davemloft.net]
>> Sent: Monday, November 9, 2015 10:53
>> To: Dexuan Cui <decui at microsoft.com>
>> Cc: eric.dumazet at gmail.com; dsa at cumulusnetworks.com; Simon Xiao
>> <sixiao at microsoft.com>; netdev at vger.kernel.org; Haiyang Zhang
>> <haiyangz at microsoft.com>; linux-kernel at vger.kernel.org;
>> devel at linuxdriverproject.org
>> Subject: Re: linux-next network throughput performance regression
>> 
>> From: Dexuan Cui <decui at microsoft.com>
>> Date: Mon, 9 Nov 2015 02:39:24 +0000
>> 
>> >> Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
>> > Why is a single TCP flow trickier than multiple TCP flows?
>> > IMO it should be easier to analyze the issue of a single TCP flow?
>> 
>> Because a single TCP flow can only use one of the many TX queues
>> that such modern NICs have.
>> 
>> The single TX queue becomes the bottleneck.
>> 
>> Whereas if you have several TCP flows, all of them can use independant
>> TX queues on the NIC in parallel to fill the link with traffic.
>> 
>> That's why.
> 
> Thanks, David!
> I understand 1 TX queue is the bottleneck (however in Simon's
> test, TX=1 => 36.7Gb/s, TX=8 => 37.7 Gb/s, so it looks the TX=1 bottleneck
> is not so obvious).
> I'm just wondering how the bottleneck became much narrower with
> recent linux-next in Simon's result (36.7 Gb/s vs. 18.2 Gb/s). IMO there
> must be some latency somewhere.

I think the whole thing here is that you misinterpreted what Eric said.

He is not arguing that some regression did, or did not, happen.

He instead was making the basic statement about the fact that due to
the lack of paralellness a single stream TCP case is harder to
optimize for high speed NICs.

That is all.