vme_tsi148 question
Michael Kenney
mfkenney at gmail.com
Wed Feb 5 23:21:00 UTC 2014
On Wed, Feb 5, 2014 at 1:38 PM, Michael Kenney <mfkenney at gmail.com> wrote:
> On Wed, Feb 5, 2014 at 1:22 PM, Martyn Welch <martyn at welchs.me.uk> wrote:
>>
>>
>>
>> On 5 February 2014 17:41, Greg KH <gregkh at linuxfoundation.org> wrote:
>>>
>>> On Tue, Feb 04, 2014 at 06:34:13PM +0000, Martyn Welch wrote:
>>> > On 04/02/14 16:34, Michael Kenney wrote:
>>> > >Hi Martyn,
>>> > >
>>> > >On Tue, Feb 4, 2014 at 7:28 AM, Martyn Welch <martyn.welch at ge.com
>>> > ><mailto:martyn.welch at ge.com>> wrote:
>>> > >
>>> > > On 28/12/13 00:34, Michael Kenney wrote:
>>> > >
>>> > > Hi Martyn,
>>> > >
>>> > > On Fri, Dec 27, 2013 at 4:23 PM, Martyn Welch
>>> > > <martyn at welchs.me.uk <mailto:martyn at welchs.me.uk>> wrote:
>>> > >
>>> > > On 27/12/13 20:15, Michael Kenney wrote:
>>> > >
>>> > >
>>> > > We are using the vme_tsi148 bridge driver along with
>>> > > the
>>> > > vme_user
>>> > > driver to access the VME boards. The A/D board requires
>>> > > D32 bus cycles
>>> > > and the VME master window is configured accordingly,
>>> > > however, when
>>> > > monitoring the bus cycles with a logic analyzer, we
>>> > > noticed that the
>>> > > CPU is transferring one byte at a time (i.e. four D8
>>> > > transfers rather
>>> > > than one D32).
>>> > >
>>> > > Is this the expected behavior of the tsi148 driver?
>>> > >
>>> > >
>>> > > Hi Mike,
>>> > >
>>> > > This is certainly not the expected behaviour - if the
>>> > > window
>>> > > is configured
>>> > > for D32 then it should do 32 bit transfers where possible.
>>> > >
>>> > > I've heard of this happening recently, but haven't yet been
>>> > > able to
>>> > > replicate it. Which VME board are you running Linux on and
>>> > > which flavour of
>>> > > Linux?
>>> > >
>>> > >
>>> > > I'm running Debian 7.2 with kernel 3.2 on a Fastwel CPC600
>>> > > (Pentium M
>>> > > based CPU board).
>>> > >
>>> > >
>>> > > I haven't forgotten about this, still not sure exactly what is
>>> > > happening.
>>> > >
>>> > > Is your install/kernel 32 or 64 bit?
>>> > >
>>> > > Are you doing single 32-bit transfers, or are you seeing this on
>>> > > longer transfers (i.e. copying a buffer full of data)?
>>> > >
>>> > >
>>> > >Thanks for getting back to me.
>>> > >
>>> > >I'm running a 32-bit kernel and I see this behavior on all transfers
>>> > >regardless of buffer size.
>>> > >
>>> >
>>> > Gah! Thought I could see what may be causing it in 64-bit kernels, but
>>> > not
>>> > 32-bit (my x86 asm is not particularly hot).
>>> >
>>> > I think we /may/ be hitting issues with how the memcpy function gets
>>> > implemented on specific architectures. The tsi148 is a PCI/X to VME
>>> > bridge,
>>> > if it receives a series of 8-bit reads or writes, it translates these to
>>> > 8-bit reads or writes on the VME bus, which is not necessarily what we
>>> > want.
>>> >
>>> > According to the data sheet, the card you are using has an Intel 6300ESB
>>> > ICH, which seems to be PCI/X (so we can rule out PCIe to PCI/X bridges
>>> > or
>>> > something like that doing nasty things).
>>> >
>>> > I think (if I follow everything correctly) then the memcpy for 32-bit is
>>> > handled by __memcpy in arch/x86/include/asm/string_32.h:
>>> >
>>> > static __always_inline void *__memcpy(void *to, const void *from, size_t
>>> > n)
>>> > {
>>> > int d0, d1, d2;
>>> > asm volatile("rep ; movsl\n\t"
>>> > "movl %4,%%ecx\n\t"
>>> > "andl $3,%%ecx\n\t"
>>> > "jz 1f\n\t"
>>> > "rep ; movsb\n\t"
>>> > "1:"
>>> > : "=&c" (d0), "=&D" (d1), "=&S" (d2)
>>> > : "0" (n / 4), "g" (n), "1" ((long)to), "2"
>>> > ((long)from)
>>> > : "memory");
>>> > return to;
>>> > }
>>> >
>>> > I'd expected this function to use movl (32-bit moves) where possible,
>>> > but
>>> > movsb to get to naturally aligned moves (which is something that we deal
>>> > with in the VME code already to use 16-bit reads where we can).
>>> >
>>> > Greg (as co-maintainer of the VME subsystem :-) ), Am I reading this
>>> > right?
>>>
>>> Yes, but why would it matter if we copy by bytes, bits, or dwords to a
>>> memory-backed bus? By definition, that shouldn't matter. If it does
>>> matter, then we shouldn't be using memcpy, or relying on the compiler to
>>> do the copying, but rather we need to use a write() function instead as
>>> this would be IO memory that does care about this type of thing.
>>>
>>
>> The drivers are using the functions memcpy_toio and memcpy_fromio for the
>> bulk of the transfer, these seem to equate to memcpy on x86.
>>
>> In this instance it does matter - some devices can only accept certain width
>> accesses and the VME bridge will translate a 8-bit PCI transfer to a 8-bit
>> transfer on the VME bus.
>>
>> So (and I haven't even compile tested this yet), this?:
>>
>> diff --git a/drivers/vme/bridges/vme_ca91cx42.c
>> b/drivers/vme/bridges/vme_ca91cx42.c
>> index 1425d22c..0d87ebd 100644
>> --- a/drivers/vme/bridges/vme_ca91cx42.c
>> +++ b/drivers/vme/bridges/vme_ca91cx42.c
>> @@ -894,9 +894,9 @@ static ssize_t ca91cx42_master_read(struct
>> vme_master_resource *image,
>> }
>>
>> count32 = (count - done) & ~0x3;
>> - if (count32 > 0) {
>> - memcpy_fromio(buf + done, addr + done, (unsigned int)count);
>> - done += count32;
>> + while (done < count32) {
>> + *(u32 *)(buf + done) = ioread32(addr + done);
>> + done += 4;
>> }
>>
>> if ((count - done) & 0x2) {
>> @@ -948,9 +948,9 @@ static ssize_t ca91cx42_master_write(struct
>> vme_master_resource *image,
>> }
>>
>> count32 = (count - done) & ~0x3;
>> - if (count32 > 0) {
>> - memcpy_toio(addr + done, buf + done, count32);
>> - done += count32;
>> + while (done < count32) {
>> + iowrite32(*(u32 *)(buf + done), addr + done);
>> + done += 4;
>> }
>>
>> if ((count - done) & 0x2) {
>> diff --git a/drivers/vme/bridges/vme_tsi148.c
>> b/drivers/vme/bridges/vme_tsi148.c
>> index 5fbd08f..bc82991 100644
>> --- a/drivers/vme/bridges/vme_tsi148.c
>> +++ b/drivers/vme/bridges/vme_tsi148.c
>> @@ -1297,9 +1297,9 @@ static ssize_t tsi148_master_read(struct
>> vme_master_resource *image, void *buf,
>> }
>>
>> count32 = (count - done) & ~0x3;
>> - if (count32 > 0) {
>> - memcpy_fromio(buf + done, addr + done, count32);
>> - done += count32;
>> + if (done < count32) {
>> + *(u32 *)(buf + done) = ioread32(addr + done);
>> + done += 4;
>> }
>>
>> if ((count - done) & 0x2) {
>> @@ -1379,9 +1379,9 @@ static ssize_t tsi148_master_write(struct
>> vme_master_resource *image, void *buf,
>> }
>>
>> count32 = (count - done) & ~0x3;
>> - if (count32 > 0) {
>> - memcpy_toio(addr + done, buf + done, count32);
>> - done += count32;
>> + if (done < count32) {
>> + iowrite32(*(u32 *)(buf + done), addr + done);
>> + done += 4;
>> }
>>
>> if ((count - done) & 0x2) {
>>
>>
>>>
>>> Does that help?
>>>
>>
>> Yep :-)
>>
>> Thanks Greg.
>
> That certainly looks promising. I can give that patch a try later this week.
>
> Thanks.
Good news Martyn.
I had a bit of time today to test your patch and am happy to say it
worked! No more bus errors.
Now I can work on getting the actual data acquisition working :-)
Thanks again for all your help.
--Mike
More information about the devel
mailing list