Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

Linux Kernel Mailing List, post #231,410
Author:
Date:
Subject:
 Pekka J Enberg
 2008-07-18 12:00:33
 Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten
Hi Ingo,

On Thu, 17 Jul 2008, Ingo Molnar wrote:
> A regression to v2.6.26:
>
> I started getting this skb-head corruption message today, on a T60
> laptop with e1000:

[snip]

On Thu, 17 Jul 2008, Ingo Molnar wrote:
> Perhaps SLUB debugging got smarter?

Nope.

On Thu, 17 Jul 2008, Ingo Molnar wrote:
> PM: Removing info for No Bus:vcs11
> device: 'vcs11': device_create_release
> =============================================================================
> BUG skbuff_head_cache: Poison overwritten
> -----------------------------------------------------------------------------
>
> INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b

0x6b is POISON_FREE so 0x6a is one bit corruption.

> INFO: Allocated in __alloc_skb+0x2c/0x110 age=0 cpu=0 pid=5098
> INFO: Freed in __kfree_skb+0x31/0x80 age=0 cpu=1 pid=4440
> INFO: Slab 0xc16cc140 objects=16 used=1 fp=0xf658ae00 flags=0x400000c3
> INFO: Object 0xf658ae00 @offset=3584 fp=0xf658af00
>
> Bytes b4 0xf658adf0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Object 0xf658ae00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf658ae10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf658ae20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf658ae30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf658ae40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf658ae50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf658ae60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf658ae70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk

It's bit unfortunate that we don't see full dump of the corruption here
because SLUB limits the output to 128 bytes. Ingo, you might want to try
this patch so that we can see all of it:

diff --git a/mm/slub.c b/mm/slub.c
index 5f6e2c4..f69d181 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -492,7 +492,7 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
if (p > addr + 16)
print_section("Bytes b4", p - 16, 16);

- print_section("Object", p, min(s->objsize, 128));
+ print_section("Object", p, s->objsize);

if (s->flags & SLAB_RED_ZONE)
print_section("Redzone", p + s->objsize,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 Ingo Molnar
 2008-07-18 11:11:46
 Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten
* Pekka J Enberg <[email protected]> wrote:

> > Object 0xf658ae70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>
> It's bit unfortunate that we don't see full dump of the corruption
> here because SLUB limits the output to 128 bytes. Ingo, you might want
> to try this patch so that we can see all of it:

ok, applied this as a debug special to tip/out-of-tree - future
incidents should have the full object dump.

would makes sense for upstream too i think, or increase the limit to 4K
or so. (which is still fair to be dumped into a syslog)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 Pekka Enberg
 2008-07-18 12:16:26
 Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten
On Fri, Jul 18, 2008 at 12:11 PM, Ingo Molnar <[email protected]> wrote:
>
> * Pekka J Enberg <[email protected]> wrote:
>
>> > Object 0xf658ae70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>>
>> It's bit unfortunate that we don't see full dump of the corruption
>> here because SLUB limits the output to 128 bytes. Ingo, you might want
>> to try this patch so that we can see all of it:
>
> ok, applied this as a debug special to tip/out-of-tree - future
> incidents should have the full object dump.
>
> would makes sense for upstream too i think, or increase the limit to 4K
> or so. (which is still fair to be dumped into a syslog)

SLUB already limits object sizes to less than PAGE_SIZE so the patch
should be fine. Christoph, are you okay with this going upstream?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 Christoph Lameter
 2008-07-18 08:54:56
 Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten
Pekka Enberg wrote:
> On Fri, Jul 18, 2008 at 12:11 PM, Ingo Molnar <[email protected]> wrote:
>> * Pekka J Enberg <[email protected]> wrote:
>>
>>>> Object 0xf658ae70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>>> It's bit unfortunate that we don't see full dump of the corruption
>>> here because SLUB limits the output to 128 bytes. Ingo, you might want
>>> to try this patch so that we can see all of it:
>> ok, applied this as a debug special to tip/out-of-tree - future
>> incidents should have the full object dump.
>>
>> would makes sense for upstream too i think, or increase the limit to 4K
>> or so. (which is still fair to be dumped into a syslog)
>
> SLUB already limits object sizes to less than PAGE_SIZE so the patch
> should be fine. Christoph, are you okay with this going upstream?

I am fine with the patch. Just be aware that we are going to be shoveling a lot of stuff into the log. The object size can be greater than 4k for non kmalloc slab caches. True in general object sizes are <4k.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 Ingo Molnar
 2008-07-21 11:41:10
 Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten
update about this problem: just triggered another colorful crash, see
below. This was with the 4K object dump patch already, maybe the dump
gives a clue?

The upstream base of this test kernel was v2.6.26-5253-g14b395e - i.e.
post the big networking pull, but this problem predates it. (It first
triggered after v2.6.26)

All the crashes trigger in or close to networking code - not a single
block IO DMA or other DMA crash happened so far, and no filesystem
corruptions or anything like that which would signal hw trouble.

Ingo

------------------>
initcall sctp_init+0x0/0x697 returned 0 after 9 msecs
calling powernowk8_init+0x0/0x6e
initcall powernowk8_init+0x0/0x6e returned -19 after 0 msecs
calling hpet_insert_resource+0x0/0x1e
initcall hpet_insert_resource+0x0/0x1e returned 0 after 0 msecs
calling lapic_insert_resource+0x0/0x44
initcall lapic_insert_resource+0x0/0x44 returned 0 after 0 msecs
calling init_lapic_nmi_sysfs+0x0/0x33
initcall init_lapic_nmi_sysfs+0x0/0x33 returned 0 after 0 msecs
=============================================================================
BUG skbuff_head_cache: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xf7ccc100-0xf7ccc103. First byte 0x0 instead of 0x6b
INFO: Allocated in __alloc_skb+0x30/0x10e age=1 cpu=1 pid=1
INFO: Freed in __kfree_skb+0x63/0x66 age=1 cpu=0 pid=0
INFO: Slab 0xc1c34ca0 objects=16 used=1 fp=0xf7ccc100 flags=0x400000c3
INFO: Object 0xf7ccc100 @offset=256 fp=0xf7ccc200

Bytes b4 0xf7ccc0f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf7ccc100: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ....kkkkkkkkkkkk
Object 0xf7ccc110: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc120: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc130: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc140: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc150: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc160: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc170: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc180: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf7ccc1a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk�
Redzone 0xf7ccc1b0: bb bb bb bb ����
Padding 0xf7ccc1d8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding 0xf7ccc1e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding 0xf7ccc1f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Pid: 1, comm: swapper Not tainted 2.6.26-tip #3261
[<c01673ad>] print_trailer+0xd1/0xd9
[<c0167428>] check_bytes_and_report+0x73/0x8f
[<c0167664>] check_object+0xa5/0x15a
[<c016824c>] __slab_alloc+0x2fb/0x3c8
[<c0168364>] kmem_cache_alloc+0x4b/0xa8
[<c0497376>] ? __alloc_skb+0x30/0x10e
[<c0497376>] ? __alloc_skb+0x30/0x10e
[<c0497376>] __alloc_skb+0x30/0x10e
[<c04a6678>] alloc_skb+0xc/0xe
[<c04a6ce5>] find_skb+0x28/0x66
[<c04a6f5f>] netpoll_send_udp+0x2b/0x1cf
[<c058800f>] ? _spin_lock_irqsave+0x4b/0x55
[<c03db399>] write_msg+0x79/0xac
[<c03db320>] ? write_msg+0x0/0xac
[<c0122f96>] __call_console_drivers+0x56/0x63
[<c0122ffa>] _call_console_drivers+0x57/0x5b
[<c0123386>] release_console_sem+0x112/0x1a5
[<c01238f3>] vprintk+0x344/0x35e
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/