Author: Date: Subject:
Ingo Molnar
2008-07-18 02:16:40
Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison
overwritten
* Vegard Nossum <[email protected]> wrote:
> On Fri, Jul 18, 2008 at 1:52 AM, Ingo Molnar <[email protected]> wrote:
> > If only we had some kernel technology that could track and validate
> > memory accesses, and point out the cases where we access uninitialized
> > memory, just like Valgrind?
> >
> > ... something like kmemcheck? ;-)
>
> Cool :)
>
> > So i booted that box with tip/master and kmemcheck enabled. (plus a few
> > fixlets to make networking allocations be properly tracked by
> > kmemcheck.)
> >
> > It was a slow bootup and long wait, but it gave a few hits here:
>
> Hm, if you think it was that slow, I am suspecting you were also using
> SLUB debugging.
nope:
# CONFIG_SLUB_DEBUG is not set
CONFIG_SLUB=y
> This can actually be negative, since now SLUB will access the objects
> (+redzone +padding) and possibly trick kmemcheck into thinking they
> were initialized in the first place.
>
> But what we are really looking for is "read from freed memory"
> messages. So I would actually recommend this: Disable kmemcheck's
> reporting of uninitialized memory, simply to make it easier to spot
> the "freed" messages more easily.
>
> Maybe something like this (warning: whitespace-munged):
ok, applied this too.
> If this only happens during boot, it would also be a good idea to
> simply reboot the machine a lot...
yeah, i've got a script for that. Will try it overnight.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author: Date: Subject:
David Miller
2008-07-17 19:13:37
Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison
overwritten
From: Ingo Molnar <[email protected]>
Date: Fri, 18 Jul 2008 01:52:54 +0200
> kmemcheck: Caught 8-bit read from uninitialized memory (f653ad24)
> iiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuuuuiuuuuuuuuuuuuuuuuuuuuuuuuuu
> ^
>
> Pid: 2484, comm: arping Not tainted (2.6.26-tip #20187)
> EIP: 0060:[<c05e973c>] EFLAGS: 00010282 CPU: 0
> EIP is at __copy_skb_header+0x7c/0x100
> EAX: 00000000 EBX: f653acc0 ECX: f653ac00 EDX: f653ac00
> ESI: f653ac50 EDI: f653ad10 EBP: c09b9e84 ESP: c09ddaa8
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: f71c2700 CR3: 36513000 CR4: 000006d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff4ff0 DR7: 00000400
> [<c05e97e7>] __skb_clone+0x27/0xe0
> [<c05eb101>] skb_clone+0x41/0x60
> [<c065cbf1>] packet_rcv+0xc1/0x290
> [<c05f07ad>] netif_receive_skb+0x20d/0x400
> [<c03b2aa7>] e1000_receive_skb+0x47/0x180
> [<c03b3983>] e1000_clean_rx_irq+0x223/0x2e0
> [<c03b225b>] e1000_clean+0x5b/0x200
> [<c05f29db>] net_rx_action+0xfb/0x160
> [<c0129092>] __do_softirq+0x82/0xf0
> [<c0105b8a>] call_on_stack+0x1a/0x30
>
> false positive? Find below the quick hacks i did to pre-initialize skb
> allocations that have RX DMA into them.
Maybe. Every SKB object allocated is fully initialized
in __alloc_skb():
/*
* Only clear those fields we need to clear, not those that we will
* actually initialise below. Hence, don't put any more fields after
* the tail pointer in struct sk_buff!
*/
memset(skb, 0, offsetof(struct sk_buff, tail));
That leaves the following trailing members of struct sk_buff:
/* These elements must be at the end, see alloc_skb() for details. */
sk_buff_data_t tail;
sk_buff_data_t end;
unsigned char *head,
*data;
unsigned int truesize;
atomic_t users;
which are the explicitly initialized right after the quotes memset().
skb->truesize = size + sizeof(struct sk_buff);
atomic_set(&skb->users, 1);
skb->head = data;
skb->data = data;
skb_reset_tail_pointer(skb);
skb->end = skb->tail + size;
When we clone, there are probably some fields we don't copy over
explicitly. And we usually do that because they don't matter or
if they do the caller will take care of it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author: Date: Subject:
David Miller
2008-07-17 19:03:15
Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison
overwritten
From: "Vegard Nossum" <[email protected]>
Date: Fri, 18 Jul 2008 01:15:47 +0200
> On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar <[email protected]> wrote:
> >
> > A regression to v2.6.26:
> >
> > I started getting this skb-head corruption message today, on a T60
> > laptop with e1000:
> >
> > PM: Removing info for No Bus:vcs11
> > device: 'vcs11': device_create_release
> > =============================================================================
> > BUG skbuff_head_cache: Poison overwritten
> > -----------------------------------------------------------------------------
> >
> > INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b
>
> 1. Notice the range. It's just a single byte.
> 2. Notice the value. It's just a ++.
It's supposed to be 0x6b, this would be a "--"
Also it (more likely IMHO) could be clearing a flag with the value 0x01.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author: Date: Subject:
Vegard Nossum
2008-07-18 09:03:50
Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten
On Fri, Jul 18, 2008 at 4:03 AM, David Miller <[email protected]> wrote:
>> On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar <[email protected]> wrote:
>> >
>> > A regression to v2.6.26:
>> >
>> > I started getting this skb-head corruption message today, on a T60
>> > laptop with e1000:
>> >
>> > PM: Removing info for No Bus:vcs11
>> > device: 'vcs11': device_create_release
>> > =============================================================================
>> > BUG skbuff_head_cache: Poison overwritten
>> > -----------------------------------------------------------------------------
>> >
>> > INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b
>>
>> 1. Notice the range. It's just a single byte.
>> 2. Notice the value. It's just a ++.
>
> It's supposed to be 0x6b, this would be a "--"
You're right! Oops. In my defence, I wrote that at 2 AM last night ;-)
> Also it (more likely IMHO) could be clearing a flag with the value 0x01.
It could be. But like I said in a later e-mail, the thing is likely
sk_buff->truesize. Which is not a flags variable. It _is_ however, a
counter, which is frequently -= and atomic_sub()ed.
That field is also an int, not a byte like I suggested above. This is
fine, though. "--" on an int can of course legitimately update/change
just the lower byte of an int.
But.. it could also be some random corruption coming from elsewhere.
Maybe even bad RAM (it's just a single bit anyway). But that's less
likely.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author: Date: Subject:
David Miller
2008-07-18 00:12:02
Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison
overwritten
From: "Vegard Nossum" <[email protected]>
Date: Fri, 18 Jul 2008 09:03:50 +0200
> > It's supposed to be 0x6b, this would be a "--"
>
> You're right! Oops. In my defence, I wrote that at 2 AM last night ;-)
>
> > Also it (more likely IMHO) could be clearing a flag with the value 0x01.
>
> It could be. But like I said in a later e-mail, the thing is likely
> sk_buff->truesize. Which is not a flags variable. It _is_ however, a
> counter, which is frequently -= and atomic_sub()ed.
skb->truesize is ever incremented or decremented by only one.
Usually it is changed by the entire packet size, or at least one MSS's
worth.
On packet free, it will be decremented by at least sizeof(struct sk_buff)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
|