summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-09-19md: make sure metadata is updated when spares are activated or removed.NeilBrown
It isn't always necessary to update the metadata when spares are removed as the presence-or-not of a spare isn't really important to the integrity of an array. Also activating a spare doesn't always require updating the metadata as the update on 'recovery-completed' is usually sufficient. However the introduction of 'replacement' devices have made these transitions sometimes more important. For example the 'Replacement' flag isn't cleared until the original device is removed, so we need to ensure a metadata update after that 'spare' is removed. So set MD_CHANGE_DEVS whenever a spare is activated or removed, to complement the current situation where it is set when a spare is added or a device is failed (or a number of other less common situations). This is suitable for -stable as out-of-data metadata could lead to data corruption. This is only relevant for 3.3 and later 9when 'replacement' as introduced. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-19md/raid5: fix calculate of 'degraded' when a replacement becomes active.NeilBrown
When a replacement device becomes active, we mark the device that it replaces as 'faulty' so that it can subsequently get removed. However 'calc_degraded' only pays attention to the primary device, not the replacement, so the array appears to become degraded, which is wrong. So teach 'calc_degraded' to consider any replacement if a primary device is faulty. This is suitable for -stable as an incorrect 'degraded' value can confuse md and could lead to data corruption. This is only relevant for 3.3 and later. Cc: stable@vger.kernel.org Reported-by: Robin Hill <robin@robinhill.me.uk> Reported-by: John Drescher <drescherjm@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-19Revert "md/raid5: For odirect-write performance, do not set ↵NeilBrown
STRIPE_PREREAD_ACTIVE." This reverts commit 895e3c5c58a80bb9e4e05d9ac38b4f30e0f97d80. While this patch seemed like a good idea and did help some workloads, it hurts other workloads. Large sequential O_DIRECT writes were faster, Small random O_DIRECT writes were slower. Other changes (batching RAID5 writes) have improved the sequential writes using a different mechanism, so the net result of this patch is definitely negative. So revert it. Reported-by: Shaohua Li <shli@kernel.org> Tested-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-18xfs: stop the sync worker before xfs_unmountfsBen Myers
Cancel work of the xfs_sync_worker before teardown of the log in xfs_unmountfs. This prevents occasional crashes on unmount like so: PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3" #0 [c5377d28] crash_kexec at c0292c94 #1 [c5377d80] oops_end at c07090c2 #2 [c5377d98] no_context at c06f614e #3 [c5377dbc] __bad_area_nosemaphore at c06f6281 #4 [c5377df4] bad_area_nosemaphore at c06f629b #5 [c5377e00] do_page_fault at c070b0cb #6 [c5377e7c] error_code (via page_fault) at c070892c EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0 DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20 CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs] #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs] #10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs] #11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs] #12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs] #13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs] #14 [c5377f58] process_one_work at c024ee4c #15 [c5377f98] worker_thread at c024f43d #16 [c5377fbc] kthread at c025326b #17 [c5377fe8] kernel_thread_helper at c070e834 PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount" #0 [cde0fda0] __schedule at c0706595 #1 [cde0fe28] schedule at c0706b89 #2 [cde0fe30] schedule_timeout at c0705600 #3 [cde0fe94] __down_common at c0706098 #4 [cde0fec8] __down at c0706122 #5 [cde0fed0] down at c025936f #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs] #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs] #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs] #9 [cde0ff1c] generic_shutdown_super at c0333d7a #10 [cde0ff38] kill_block_super at c0333e0f #11 [cde0ff48] deactivate_locked_super at c0334218 #12 [cde0ff58] deactivate_super at c033495d #13 [cde0ff68] mntput_no_expire at c034bc13 #14 [cde0ff7c] sys_umount at c034cc69 #15 [cde0ffa0] sys_oldumount at c034ccd4 #16 [cde0ffb0] system_call at c0707e66 commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up at a later date. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
2012-09-18cifs: fix return value in cifsConvertToUTF16Jeff Layton
This function returns the wrong value, which causes the callers to get the length of the resulting pathname wrong when it contains non-ASCII characters. This seems to fix https://bugzilla.samba.org/show_bug.cgi?id=6767 Cc: <stable@vger.kernel.org> Reported-by: Baldvin Kovacs <baldvin.kovacs@gmail.com> Reported-and-Tested-by: Nicolas Lefebvre <nico.lefebvre@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-18e1000: Small packets may get corrupted during padding by HWTushar Dave
On PCI/PCI-X HW, if packet size is less than ETH_ZLEN, packets may get corrupted during padding by HW. To WA this issue, pad all small packets manually. Signed-off-by: Tushar Dave <tushar.n.dave@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18xfrm: fix a read lock imbalance in make_blackholeLi RongQing
if xfrm_policy_get_afinfo returns 0, it has already released the read lock, xfrm_policy_put_afinfo should not be called again. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18tcp: fix regression in urgent data handlingEric Dumazet
Stephan Springl found that commit 1402d366019fed "tcp: introduce tcp_try_coalesce" introduced a regression for rlogin It turns out problem comes from TCP urgent data handling and a change in behavior in input path. rlogin sends two one-byte packets with URG ptr set, and when next data frame is coalesced, we lack sk_data_ready() calls to wakeup consumer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stephan Springl <springl-k@lar.bfw.de> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18net: fix memory leak on oom with zerocopyMichael S. Tsirkin
If orphan flags fails, we don't free the skb on receive, which leaks the skb memory. Return value was also wrong: netif_receive_skb is supposed to return NET_RX_DROP, not ENOMEM. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18netxen: check for root bus in netxen_mask_aer_correctableNikolay Aleksandrov
Add a check if pdev->bus->self == NULL (root bus). When attaching a netxen NIC to a VM it can be on the root bus and the guest would crash in netxen_mask_aer_correctable() because of a NULL pointer dereference if CONFIG_PCIEAER is present. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18hwmon: (applesmc) Bump max waitParag Warudkar
A heavy-load test on a MacBookPro6,1 is still showing a substantial amount of read errors. Increasing the maximum wait time to 128 ms resolves the issue. Signed-off-by: Parag Warudkar <parag.lkml@gmail.com> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2012-09-18hwmon: (ad7314) Add 'name' sysfs attributeGuenter Roeck
The 'name' sysfs attribute is mandatory for hwmon devices, but was missing in this driver. Cc: Jonathan Cameron <jic23@cam.ac.uk> Cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Jean Delvare <khali@linux-fr.org> Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
2012-09-18bnx2x: fix rx checksum validation for IPv6Michal Schmidt
Commit d6cb3e41 "bnx2x: fix checksum validation" caused a performance regression for IPv6. Rx checksum offload does not work. IPv6 packets are passed to the stack with CHECKSUM_NONE. The hardware obviously cannot perform IP checksum validation for IPv6, because there is no checksum in the IPv6 header. This should not prevent us from setting CHECKSUM_UNNECESSARY. Tested on BCM57711. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18xfrm_user: return error pointer instead of NULL #2Mathias Krause
When dump_one_policy() returns an error, e.g. because of a too small buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns NULL instead of an error pointer. But its caller expects an error pointer and therefore continues to operate on a NULL skbuff. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18xfrm_user: return error pointer instead of NULLMathias Krause
When dump_one_state() returns an error, e.g. because of a too small buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL instead of an error pointer. But its callers expect an error pointer and therefore continue to operate on a NULL skbuff. This could lead to a privilege escalation (execution of user code in kernel context) if the attacker has CAP_NET_ADMIN and is able to map address 0. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18ipv6: use DST_* macro to set obselete fieldNicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18ipv6: use net->rt_genid to check dst validityNicolas Dichtel
IPv6 dst should take care of rt_genid too. When a xfrm policy is inserted or deleted, all dst should be invalidated. To force the validation, dst entries should be created with ->obsolete set to DST_OBSOLETE_FORCE_CHK. This was already the case for all functions calling ip6_dst_alloc(), except for ip6_rt_copy(). As a consequence, we can remove the specific code in inet6_connection_sock. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18xfrm: invalidate dst on policy insertion/deletionNicolas Dichtel
When a policy is inserted or deleted, all dst should be recalculated. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18netns: move net->ipv4.rt_genid to net->rt_genidNicolas Dichtel
This commit prepares the use of rt_genid by both IPv4 and IPv6. Initialization is left in IPv4 part. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18net: rt_cache_flush() cleanupEric Dumazet
We dont use jhash anymore since route cache removal, so we can get rid of get_random_bytes() calls for rt_genid changes. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18ipv4/route: arg delay is useless in rt_cache_flush()Nicolas Dichtel
Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18Merge tag 'hwspinlock-3.6-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock Pull hwspinlock fix from Ohad Ben-Cohen: "A single hwspinlock fix by Wei Yongjun, which prevents potential NULL dereferences" * tag 'hwspinlock-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock: hwspinlock/core: move the dereference below the NULL test
2012-09-18vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()Miklos Szeredi
IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-18Merge tag 'imx-fixes' of git://git.pengutronix.de/git/imx/linux-2.6 into fixesOlof Johansson
From Sascha Hauer: ARM i.MX: Two fixes for i.MX - armadillo5x0 board broken since v3.5 (stable material) - i.MX25 Architecture broken since v3.6-rc1 * tag 'imx-fixes' of git://git.pengutronix.de/git/imx/linux-2.6: ARM i.MX25: Make timer irq work again ARM: imx: armadillo5x0: Fix illegal register access
2012-09-18ARM i.MX25: Make timer irq work againSascha Hauer
Since i.MX has SPARSE_IRQ enabled the i.MX25 timer is broken. This is because the internal irqs now start at an offset of NR_IRQS_LEGACY. The patch fixed this up, but missed the i.MX25 timer which used a hardcoded value instead of a define. This patch introduces a define for the timer irq and uses it. This is broken since introduced with 3.6-rc1: | commit 8842a9e2869cae14bbb8184004a42fc3070587fb | Author: Shawn Guo <shawn.guo@linaro.org> | Date: Thu Jun 14 11:16:14 2012 +0800 | | ARM: imx: enable SPARSE_IRQ for imx platform Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Shawn Guo <shawn.guo@linaro.org>
2012-09-18ARM: imx: armadillo5x0: Fix illegal register accessFabio Estevam
Since commit eb92044eb (ARM i.MX3: Make ccm base address a variable ) it is necessary to pass the CCM register base as a variable. Fix the CCM register access in mach-armadillo5x0 by passing mx3_ccm_base and avoid illegal accesses. Also applies to v3.5 Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Cc: stable@vger.kernel.org
2012-09-18Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixesOlof Johansson
From Nicolas Ferre: Modify AT91 device tree files for making the GPIO interrupts work. * tag 'at91-fixes' of git://github.com/at91linux/linux-at91: ARM: at91: fix missing #interrupt-cells on gpio-controller
2012-09-18Merge branch 'fixes' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: kzm9g: bugfix: correct mmcif interrupt settings
2012-09-18Merge branch 'v3.6-samsung-fixes-3' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes * 'v3.6-samsung-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: ARM: SAMSUNG: Use spin_lock_{irqsave,irqrestore} in clk_set_rate ARM: SAMSUNG: use spin_lock_irqsave() in clk_set_parent
2012-09-18[SCSI] hpsa: fix handling of protocol errorStephen M. Cameron
If a command status of CMD_PROTOCOL_ERR is received, this information should be conveyed to the SCSI mid layer, not dropped on the floor. CMD_PROTOCOL_ERR may be received from the Smart Array for any commands destined for an external RAID controller such as a P2000, or commands destined for tape drives or CD/DVD-ROM drives, if for instance a cable is disconnected. This mostly affects multipath configurations, as disconnecting a cable on a non-multipath configuration is not going to do anything good regardless of whether CMD_PROTOCOL_ERR is handled correctly or not. Not handling CMD_PROTOCOL_ERR correctly in a multipath configaration involving external RAID controllers may cause data corruption, so this is quite a serious bug. This bug should not normally cause a problem for direct attached disk storage. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-09-18cciss: fix handling of protocol errorStephen M. Cameron
If a command completes with a status of CMD_PROTOCOL_ERR, this information should be conveyed to the SCSI mid layer, not dropped on the floor. Unlike a similar bug in the hpsa driver, this bug only affects tape drives and CD and DVD ROM drives in the cciss driver, and to induce it, you have to disconnect (or damage) a cable, so it is not a very likely scenario (which would explain why the bug has gone undetected for the last 10 years.) Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-18blk: add an upper sanity check on partition addingAlan Cox
65536 should be ludicrous anyway but without it we overflow the memory computation doing the allocation and badness occurs. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-18sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.Al Viro
As Al notes, we missed a TIF_NOTIFY_RESUME check which caused any handlers without TIF_SIGPENDING also set to skip the notification: Looks like while it is in the relevant masks *and* checked in do_notify_resume() both on 32bit and 64bit variants since commit ab99c733ae73cce31f2a2434f7099564e5a73d95 ("sh: Make syscall tracer use tracehook notifiers, add TIF_NOTIFY_RESUME.") they are actually *not* reached without simulataneous SIGPENDING, since the actual glue in the callers had not been updated back then and still checks for _TIF_SIGPENDING alone when deciding whether to hit do_notify_resume() or not. Reported-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-09-18sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error pathLaurent Pinchart
The sh_pfc_gpio_request_enable() function acquires a spinlock but fails to release it before returning if the requested mux type is not supported. Fix this. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-09-18DMA: PL330: Check the pointer returned by kzallocSachin Kamat
kzalloc could return NULL. Hence add a check to avoid NULL pointer dereference. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Cc: Stable <stable@vger.kernel.org> Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
2012-09-18DMA: PL330: Fix potential NULL pointer dereference in pl330_submit_req()Sachin Kamat
'r->cfg' is being checked for NULL. However, it is dereferenced in the previous statements. Thus moving those statements within the check. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Cc: Stable <stable@vger.kernel.org> Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
2012-09-18ARM: shmobile: kzm9g: bugfix: correct mmcif interrupt settingsTetsuyuki Kobayashi
Correct interrupt settings of sh_mmc:int and sh_mmc:error in board-kzm9g.c. Signed-off-by: Tetsuyuki Kobayashi <koba@kmckk.co.jp> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-18ARM: SAMSUNG: Use spin_lock_{irqsave,irqrestore} in clk_set_rateTushar Behera
The spinlock clocks_lock can be held during ISR, hence it is not safe to hold that lock with disabling interrupts. It fixes following potential deadlock. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 3.6.0-rc4+ #2 Not tainted --------------------------------------------------------- swapper/0/1 just changed the state of lock: (&(&host->lock)->rlock){-.....}, at: [<c027fb0d>] sdhci_irq+0x15/0x564 but this lock took another, HARDIRQ-unsafe lock in the past: (clocks_lock){+.+...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(clocks_lock); local_irq_disable(); lock(&(&host->lock)->rlock); lock(clocks_lock); <Interrupt> lock(&(&host->lock)->rlock); *** DEADLOCK *** Signed-off-by: Tushar Behera <tushar.behera@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2012-09-17Merge branch 'for-3.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull another workqueue fix from Tejun Heo: "Unfortunately, yet another late fix. This too is discovered and fixed by Lai. This bug was introduced during this merge window by commit 25511a477657 ("workqueue: reimplement CPU online rebinding to handle idle workers") which started using WORKER_REBIND flag for idle rebind too. The bug is relatively easy to trigger if the CPU rapidly goes through off, on and then off (and stay off). The fix is on the safer side. This hasn't been on linux-next yet but I'm pushing early so that it can get more exposure before v3.6 release." * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
2012-09-17workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()Lai Jiangshan
busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed (CPU is down again). This used to be okay because the flag wasn't used for anything else. However, after 25511a477 "workqueue: reimplement CPU online rebinding to handle idle workers", WORKER_REBIND is also used to command idle workers to rebind. If not cleared, the worker may confuse the next CPU_UP cycle by having REBIND spuriously set or oops / get stuck by prematurely calling idle_worker_rebind(). WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5 00() Hardware name: Bochs Modules linked in: test_wq(O-) Pid: 33, comm: kworker/1:1 Tainted: G O 3.6.0-rc1-work+ #3 Call Trace: [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500 [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 ---[ end trace e977cf20f4661968 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500 PGD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: test_wq(O-) CPU 0 Pid: 33, comm: kworker/1:1 Tainted: G W O 3.6.0-rc1-work+ #3 Bochs Bochs RIP: 0010:[<ffffffff810b3db0>] [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP: 0018:ffff88001e1c9de0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580 R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900 FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900) Stack: ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010 ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340 Call Trace: [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b RIP [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP <ffff88001e1c9de0> CR2: 0000000000000000 There was no reason to keep WORKER_REBIND on failure in the first place - WORKER_UNBOUND is guaranteed to be set in such cases preventing incorrectly activating concurrency management. Always clear WORKER_REBIND. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-17Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge fixes from Andrew Morton: "13 patches. 12 are fixes and one is a little preparatory thing for Andi." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (13 commits) memory hotplug: fix section info double registration bug mm/page_alloc: fix the page address of higher page's buddy calculation drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe compiler.h: add __visible pid-namespace: limit value of ns_last_pid to (0, max_pid) include/net/sock.h: squelch compiler warning in sk_rmem_schedule() slub: consider pfmemalloc_match() in get_partial_node() slab: fix starting index for finding another object slab: do ClearSlabPfmemalloc() for all pages of slab nbd: clear waiting_queue on shutdown MAINTAINERS: fix TXT maintainer list and source repo path mm/ia64: fix a memory block size bug memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails
2012-09-17memory hotplug: fix section info double registration bugqiuxishi
There may be a bug when registering section info. For example, on my Itanium platform, the pfn range of node0 includes the other nodes, so other nodes' section info will be double registered, and memmap's page count will equal to 3. node0: start_pfn=0x100, spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00 node1: start_pfn=0x80000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x080000-0x100000 node2: start_pfn=0x100000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x100000-0x180000 node3: start_pfn=0x180000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x180000-0x200000 free_all_bootmem_node() register_page_bootmem_info_node() register_page_bootmem_info_section() When hot remove memory, we can't free the memmap's page because page_count() is 2 after put_page_bootmem(). sparse_remove_one_section() free_section_usemap() free_map_bootmem() put_page_bootmem() [akpm@linux-foundation.org: add code comment] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17mm/page_alloc: fix the page address of higher page's buddy calculationLi Haifeng
The heuristic method for buddy has been introduced since commit 43506fad21ca ("mm/page_alloc.c: simplify calculation of combined index of adjacent buddy lists"). But the page address of higher page's buddy was wrongly calculated, which will lead page_is_buddy to fail for ever. IOW, the heuristic method would be disabled with the wrong page address of higher page's buddy. Calculating the page address of higher page's buddy should be based higher_page with the offset between index of higher page and index of higher page's buddy. Signed-off-by: Haifeng Li <omycle@gmail.com> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: <stable@vger.kernel.org> [2.6.38+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probeKevin Hilman
On some platforms, bootloaders are known to do some interesting RTC programming. Without going into the obscurities as to why this may be the case, suffice it to say the the driver should not make any assumptions about the state of the RTC when the driver loads. In particular, the driver probe should be sure that all interrupts are disabled until otherwise programmed. This was discovered when finding bursty I2C traffic every second on Overo platforms. This I2C overhead was keeping the SoC from hitting deep power states. The cause was found to be the RTC firing every second on the I2C-connected TWL PMIC. Special thanks to Felipe Balbi for suggesting to look for a rogue driver as the source of the I2C traffic rather than the I2C driver itself. Special thanks to Steve Sakoman for helping track down the source of the continuous RTC interrups on the Overo boards. Signed-off-by: Kevin Hilman <khilman@ti.com> Cc: Felipe Balbi <balbi@ti.com> Tested-by: Steve Sakoman <steve@sakoman.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Tested-by: Shubhrajyoti Datta <omaplinuxkernel@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17compiler.h: add __visibleAndi Kleen
gcc 4.6+ has support for a externally_visible attribute that prevents the optimizer from optimizing unused symbols away. Add a __visible macro to use it with that compiler version or later. This is used (at least) by the "Link Time Optimization" patchset. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17pid-namespace: limit value of ns_last_pid to (0, max_pid)Andrew Vagin
The kernel doesn't check the pid for negative values, so if you try to write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic. The crash happens because the next pid is -1, and alloc_pidmap() will try to access to a nonexistent pidmap. map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17include/net/sock.h: squelch compiler warning in sk_rmem_schedule()Chuck Lever
This warning: In file included from linux/include/linux/tcp.h:227:0, from linux/include/linux/ipv6.h:221, from linux/include/net/ipv6.h:16, from linux/include/linux/sunrpc/clnt.h:26, from linux/net/sunrpc/stats.c:22: linux/include/net/sock.h: In function `sk_rmem_schedule': linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the -Wextra option. Commit c76562b6709f ("netvm: prevent a stream-specific deadlock") accidentally replaced the "size" parameter of sk_rmem_schedule() with an unsigned int. This changes the semantics of the comparison in the return statement. In sk_wmem_schedule we have syntactically the same comparison, but "size" is a signed integer. In addition, __sk_mem_schedule() takes a signed integer for its "size" parameter, so there is an implicit type conversion in sk_rmem_schedule() anyway. Revert the "size" parameter back to a signed integer so that the semantics of the expressions in both sk_[rw]mem_schedule() are exactly the same. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Cc: Joonsoo Kim <js1304@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17slub: consider pfmemalloc_match() in get_partial_node()Joonsoo Kim
get_partial() is currently not checking pfmemalloc_match() meaning that it is possible for pfmemalloc pages to leak to non-pfmemalloc users. This is a problem in the following situation. Assume that there is a request from normal allocation and there are no objects in the per-cpu cache and no node-partial slab. In this case, slab_alloc enters the slow path and new_slab_objects() is called which may return a PFMEMALLOC page. As the current user is not allowed to access PFMEMALLOC page, deactivate_slab() is called ([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc checks]) and returns an object from PFMEMALLOC page. Next time, when we get another request from normal allocation, slab_alloc() enters the slow-path and calls new_slab_objects(). In new_slab_objects(), we call get_partial() and get a partial slab which was just deactivated but is a pfmemalloc page. We extract one object from it and re-deactivate. "deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly. As a result, access to PFMEMALLOC page is not properly restricted and it can cause a performance degradation due to frequent deactivation. deactivation frequently. This patch changes get_partial_node() to take pfmemalloc_match() into account and prevents the "deactivate -> re-get in get_partial() scenario. Instead, new_slab() is called. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17slab: fix starting index for finding another objectJoonsoo Kim
In array cache, there is a object at index 0, check it. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17slab: do ClearSlabPfmemalloc() for all pages of slabMel Gorman
Right now, we call ClearSlabPfmemalloc() for first page of slab when we clear SlabPfmemalloc flag. This is fine for most swap-over-network use cases as it is expected that order-0 pages are in use. Unfortunately it is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page and while this is harmless, it is sloppy. This patch ensures that the head page is always used. This problem was originally identified by Joonsoo Kim. [js1304@gmail.com: Original implementation and problem identification] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>