Wonder why patchwatcher was down for 12 hours? It seems to have been a kernel problem.
This is on an Ubuntu 8.10 32 bit Core 2 Duo system on a P5K Pro motherboard, with a PCI-E NVidia card. I have the proprietary Nvidia drivers installed, so I can't expect support from anybody, but I did want to at least kvetch in public somewhere.
The build slave was unresponsive. Upon rebooting, I found the following in /var/log/messages:
Nov 15 18:07:05 slave2 kernel: [1146762.712500] Modules linked in: nfs lockd nfs_acl sunrpc af_packet binfmt_misc rfcomm bridge stp bnep sco l2cap bluetooth ppdev ipv6 acpi_cpufreq cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_ondemand freq_table sbs wmi sbshc video output container pci_slot battery iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport snd_hda_intel psmouse pcspkr serio_raw evdev snd_pcm_oss snd_mixer_oss nvidia(P) snd_pcm i2c_core snd_seq_dummy iTCO_wdt iTCO_vendor_support snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd shpchp soundcore button pci_hotplug snd_page_alloc intel_agp agpgart ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg pata_marvell pata_acpi usbhid hid natsemi ata_generic ohci1394 ieee1394 ata_piix libata scsi_mod dock sky2 ehci_hcd uhci_hcd usbcore thermal processor fan fbcon tileblit font bitblit softcursor fuse Nov 15 18:07:05 slave2 kernel: [1146762.712501] Nov 15 18:07:05 slave2 kernel: [1146762.712501] Pid: 21048, comm: wrc Tainted: P (2.6.27-7-generic #1) Nov 15 18:07:05 slave2 kernel: [1146762.712501] EIP: 0060:[<c0184320>] EFLAGS: 00000286 CPU: 1 Nov 15 18:07:05 slave2 kernel: [1146762.712501] EIP is at find_get_pages+0x70/0x110 Nov 15 18:07:05 slave2 kernel: [1146762.712501] EAX: ee96fa7c EBX: c1d5ac50 ECX: e2333d38 EDX: 00000000 Nov 15 18:07:05 slave2 kernel: [1146762.712501] ESI: e2333d30 EDI: e2333d38 EBP: e2333ce8 ESP: e2333cb8 Nov 15 18:07:05 slave2 kernel: [1146762.712501] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Nov 15 18:07:05 slave2 kernel: [1146762.712501] CR0: 8005003b CR2: 09c28000 CR3: 35009000 CR4: 00000690 Nov 15 18:07:05 slave2 kernel: [1146762.712501] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Nov 15 18:07:05 slave2 kernel: [1146762.712501] DR6: ffff0ff0 DR7: 00000403 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c018d4d7>] pagevec_lookup+0x27/0x40 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c018e2ac>] truncate_inode_pages_range+0x8c/0x360 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<f89e9c7f>] ? do_get_write_access+0x2df/0x4b0 [jbd] Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<f8a23174>] ? ext3_get_group_desc+0x14/0xd0 [ext3] Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c018e59f>] truncate_inode_pages+0x1f/0x30 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c0195d11>] vmtruncate+0x161/0x190 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c85d2>] inode_setattr+0x62/0x190 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<f8a28ef9>] ext3_setattr+0xd9/0x210 [ext3] Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c89d6>] fnotify_change+0x2d6/0x3b0 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01b9d81>] ? path_permission+0x31/0x40 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01b0f56>] do_truncate+0x76/0xa0 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c023528f>] ? apparmor_path_permission+0x5f/0x80 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01ba6f3>] may_open+0x193/0x210 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01bda75>] do_filp_open+0x115/0x790 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c9619>] ? expand_files+0x9/0x60 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c9750>] ? alloc_fd+0xe0/0x100 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01afff5>] do_sys_open+0x65/0x100 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01b00fe>] sys_open+0x2e/0x40 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c0103f7b>] sysenter_do_call+0x12/0x2f Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c0370000>] ? default_device_exit+0x60/0xb0
It crashed in the middle of a wine build. The last thing in the patchwatcher log was make[1]: Leaving directory `/home/patchslave/winezeug/patchwatcher/wine-continuous-workdir/active/dlls' make[2]: Entering
BIOS reported the temperature was 56 degrees after rebooting, fwiw.
This is the first problem on this machine since I set it up several weeks ago.
I guess I'll just reboot it and hope...
On Sun, Nov 16, 2008 at 12:20:37PM -0800, Dan Kegel wrote:
Wonder why patchwatcher was down for 12 hours? It seems to have been a kernel problem.
This is on an Ubuntu 8.10 32 bit Core 2 Duo system on a P5K Pro motherboard, with a PCI-E NVidia card. I have the proprietary Nvidia drivers installed, so I can't expect support from anybody, but I did want to at least kvetch in public somewhere.
The build slave was unresponsive. Upon rebooting, I found the following in /var/log/messages:
Nov 15 18:07:05 slave2 kernel: [1146762.712500] Modules linked in: nfs lockd nfs_acl sunrpc af_packet binfmt_misc rfcomm bridge stp bnep sco l2cap bluetooth ppdev ipv6 acpi_cpufreq cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_ondemand freq_table sbs wmi sbshc video output container pci_slot battery iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport snd_hda_intel psmouse pcspkr serio_raw evdev snd_pcm_oss snd_mixer_oss nvidia(P) snd_pcm i2c_core snd_seq_dummy iTCO_wdt iTCO_vendor_support snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd shpchp soundcore button pci_hotplug snd_page_alloc intel_agp agpgart ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg pata_marvell pata_acpi usbhid hid natsemi ata_generic ohci1394 ieee1394 ata_piix libata scsi_mod dock sky2 ehci_hcd uhci_hcd usbcore thermal processor fan fbcon tileblit font bitblit softcursor fuse Nov 15 18:07:05 slave2 kernel: [1146762.712501] Nov 15 18:07:05 slave2 kernel: [1146762.712501] Pid: 21048, comm: wrc Tainted: P (2.6.27-7-generic #1) Nov 15 18:07:05 slave2 kernel: [1146762.712501] EIP: 0060:[<c0184320>] EFLAGS: 00000286 CPU: 1 Nov 15 18:07:05 slave2 kernel: [1146762.712501] EIP is at find_get_pages+0x70/0x110 Nov 15 18:07:05 slave2 kernel: [1146762.712501] EAX: ee96fa7c EBX: c1d5ac50 ECX: e2333d38 EDX: 00000000 Nov 15 18:07:05 slave2 kernel: [1146762.712501] ESI: e2333d30 EDI: e2333d38 EBP: e2333ce8 ESP: e2333cb8 Nov 15 18:07:05 slave2 kernel: [1146762.712501] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Nov 15 18:07:05 slave2 kernel: [1146762.712501] CR0: 8005003b CR2: 09c28000 CR3: 35009000 CR4: 00000690 Nov 15 18:07:05 slave2 kernel: [1146762.712501] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Nov 15 18:07:05 slave2 kernel: [1146762.712501] DR6: ffff0ff0 DR7: 00000403 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c018d4d7>] pagevec_lookup+0x27/0x40 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c018e2ac>] truncate_inode_pages_range+0x8c/0x360 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<f89e9c7f>] ? do_get_write_access+0x2df/0x4b0 [jbd] Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<f8a23174>] ? ext3_get_group_desc+0x14/0xd0 [ext3] Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c018e59f>] truncate_inode_pages+0x1f/0x30 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c0195d11>] vmtruncate+0x161/0x190 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c85d2>] inode_setattr+0x62/0x190 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<f8a28ef9>] ext3_setattr+0xd9/0x210 [ext3] Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c89d6>] fnotify_change+0x2d6/0x3b0 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01b9d81>] ? path_permission+0x31/0x40 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01b0f56>] do_truncate+0x76/0xa0 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c023528f>] ? apparmor_path_permission+0x5f/0x80 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01ba6f3>] may_open+0x193/0x210 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01bda75>] do_filp_open+0x115/0x790 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c9619>] ? expand_files+0x9/0x60 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01c9750>] ? alloc_fd+0xe0/0x100 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01afff5>] do_sys_open+0x65/0x100 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c01b00fe>] sys_open+0x2e/0x40 Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c0103f7b>] sysenter_do_call+0x12/0x2f Nov 15 18:07:05 slave2 kernel: [1146762.712501] [<c0370000>] ? default_device_exit+0x60/0xb0
It crashed in the middle of a wine build. The last thing in the patchwatcher log was make[1]: Leaving directory `/home/patchslave/winezeug/patchwatcher/wine-continuous-workdir/active/dlls' make[2]: Entering
BIOS reported the temperature was 56 degrees after rebooting, fwiw.
This is the first problem on this machine since I set it up several weeks ago.
I guess I'll just reboot it and hope...
Run a fsck and detailed memory check too. This backtrace shows very common paths which likely happens thousand times per second during build. ;)
Ciao, Marcus