summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-17vxlan: only migrate dynamic FDB entriesstephen hemminger
Only migrate dynamic forwarding table entries, don't modify static entries. If packet received from incorrect source IP address assume it is an imposter and drop it. This patch applies only to -net, a different patch would be needed for earlier kernels since the NTF_SELF flag was introduced with 3.10. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17vxlan: fix race between flush and incoming learningstephen hemminger
It is possible for a packet to arrive during vxlan_stop(), and have a dynamic entry created. Close this by checking if device is up. CPU1 CPU2 vxlan_stop vxlan_flush hash_lock acquired vxlan_encap_recv vxlan_snoop waiting for hash_lock hash_lock relased vxlan_flush done hash_lock acquired vxlan_fdb_create This is a day-one bug in vxlan goes back to 3.7. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17Merge branch 'tipc'David S. Miller
Paul Gortmaker says: ==================== This is a rework of the content sent earlier[1], with the following changes: -drop the Kconfig --> modparam conversion patch; this was requested to be replaced[2] with a dynamic port quantity resizing. Ying and Erik were discussing how best to achieve this, and then vacation schedules got in the way, so implementing that will come (hopefully) in the next round. -rework the sk_rcvbuf patch to allow memory resizing via sysctl as per what Ying and Neil discussed[3] -add 4 more seemingly straigtforward and relatively small changes from Ying (the last 4 in the series). -add cosmetic UAPI comment update patch from Ying. That said, the largest change is still the one where we make use of the fact that linux supports kernel threads and do the server like operations within kernel threads. As Jon says: We remove the last remnants of the TIPC native API, to make it possible to simplify locking policy and solve a problem with lost topology events. First, we introduce a socket-based alternative to the native API. Second, we convert the two remaining users of the native API, the TIPC internal topology server and the configuarion server, to use the new API. Third, we remove the remaining code pertaining to the native API. I have re-tested this collection of commits between 32 and 64 bit x86 machines using the standard tipc test suite, and build tested for ppc. [1] http://patchwork.ozlabs.org/patch/247687/ [2] http://patchwork.ozlabs.org/patch/247680/ [3] http://patchwork.ozlabs.org/patch/247688/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: remove dev_base_lock use from enable_bearerYing Xue
Convert enable_bearer() to RCU locking with dev_get_by_name(). Based on a similar changeset in commit 840a185d ["aoe: remove dev_base_lock use from aoecmd_cfg_pkts()"] -- quoting that: "dev_base_lock is the legacy way to lock the device list, and is planned to disappear. (writers hold RTNL, readers hold RCU lock)" Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: fix wrong return value for link_send_sections_long routineYing Xue
When skb buffer cannot be allocated in link_send_sections_long(), -ENOMEM error code instead of -EFAULT should be returned to its caller. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: make tipc_link_send_sections_fast exit earlierYing Xue
Once message build request function returns invalid code, the process of sending message cannot continue. So in case of message build failure, tipc_link_send_sections_fast() should return immediately. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: enhance priority of link protocol packetYing Xue
pfifo_fast is set as default traffic class queueing discipline. This queue has three so called "bands". Within each band, FIFO rules apply. However, as long as there are packets waiting in band 0, band 1 won't be processed. Now all kind of TIPC type packet priorities are never set, that is, their priorities are 0, so they are mapped to band 1 of pfifo_fast qdisc. But, especially during link congestion, if link protocol packet can be sent out as earlier as possible than other type of packets so that protocol packet can arrive at peer endpoint in time, the peer will timely reset its link timeout timer to keep the link alive. So enhancing the priority of link protocol packets can meet the specific demand to avoid unnecessary link reset due to a transient link congestion. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: cosmetic realignment of function argumentsPaul Gortmaker
No runtime code changes here. Just a realign of the function arguments to start where the 1st one was, and fit as many args as can be put in an 80 char line. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: save sock structure pointer instead of void pointer to tipc_portYing Xue
Directly save sock structure pointer instead of void pointer to avoid unnecessary cast conversions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: convert config_lock from spinlock to mutexYing Xue
As the configuration server is now running under process context, it's unnecessary for us to have a spinlock serializing the TIPC configuration process. Instead, we replace it with a mutex lock, which gives us more freedom. For instance, we can now call pre-emptable functions within the protected area. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: rename tipc_createport_raw to tipc_createportYing Xue
After the removal of the native API, there is now only one way to to create a TIPC port instance -- the function tipc_createport_raw(). We make it more readable by renaming it to tipc_createport(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: remove user_port instance from tipc_port structureYing Xue
After the native API has been completely removed, the 'user_port' field in struct tipc_port becomes unused, and can be removed. As a consequence, the "usrmem" argument in tipc_msg_build() is no longer needed, and so we remove that one too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: delete code orphaned by new server infrastructureYing Xue
Having completed the conversion of the topology server and configuration server to use the new server infrastructure, the following functions become unused, and can be deleted: - tipc_createport() - port_wakeup_sh() - port_dispatcher() - port_dispatcher_sigh() - tipc_send_buf_fast() - tipc_send_buf2port Additionally, the following variables become orphaned, and can be deleted: - tipc_msg_err_event - tipc_named_msg_err_event - tipc_conn_shutdown_event - tipc_msg_event - tipc_named_msg_event - tipc_conn_msg_event - tipc_continue_event - msg_queue_head - msg_queue_tail - queue_lock Deletion is done here in a separate commit in order to allow the actual conversion changes to be more easily viewed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: convert configuration server to use new server facilityYing Xue
As the new socket-based TIPC server infrastructure has been introduced, we can now convert the configuration server to use it. Then we can take future steps to simplify the configuration server locking policy. Some minor reordering of initialization is done, due to the dependency on having tipc_socket_init completed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: convert topology server to use new server facilityYing Xue
As the new TIPC server infrastructure has been introduced, we can now convert the TIPC topology server to it. We get two benefits from doing this: 1) It simplifies the topology server locking policy. In the original locking policy, we placed one spin lock pointer in the tipc_subscriber structure to reuse the lock of the subscriber's server port, controlling access to members of tipc_subscriber instance. That is, we only used one lock to ensure both tipc_port and tipc_subscriber members were safely accessed. Now we introduce another spin lock for tipc_subscriber structure only protecting themselves, to get a finer granularity locking policy. Moreover, the change will allow us to make the topology server code more readable and maintainable. 2) It fixes a bug where sent subscription events may be lost when the topology port is congested. Using the new service, the topology server now queues sent events into an outgoing buffer, and then wakes up a sender process which has been blocked in workqueue context. The process will keep picking events from the buffer and send them to their respective subscribers, using the kernel socket interface, until the buffer is empty. Even if the socket is congested during transmission there is no risk that events may be dropped, since the sender process may block when needed. Some minor reordering of initialization is done, since we now have a scenario where the topology server must be started after socket initialization has taken place, as the former depends on the latter. And overall, we see a simplification of the TIPC subscriber code in making this changeover. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: introduce new TIPC server infrastructureYing Xue
TIPC has two internal servers, one providing a subscription service for topology events, and another providing the configuration interface. These servers have previously been running in BH context, accessing the TIPC-port (aka native) API directly. Apart from these servers, even the TIPC socket implementation is partially built on this API. As this API may simultaneously be called via different paths and in different contexts, a complex and costly lock policiy is required in order to protect TIPC internal resources. To eliminate the need for this complex lock policiy, we introduce a new, generic service API that uses kernel sockets for message passing instead of the native API. Once the toplogy and configuration servers are converted to use this new service, all code pertaining to the native API can be removed. This entails a significant reduction in code amount and complexity, and opens up for a complete rework of the locking policy in TIPC. The new service also solves another problem: As the current topology server works in BH context, it cannot easily be blocked when sending of events fails due to congestion. In such cases events may have to be silently dropped, something that is unacceptable. Therefore, the new service keeps a dedicated outbound queue receiving messages from BH context. Once messages are inserted into this queue, we will immediately schedule a work from a special workqueue. This way, messages/events from the topology server are in reality sent in process context, and the server can block if necessary. Analogously, there is a new workqueue for receiving messages. Once a notification about an arriving message is received in BH context, we schedule a work from the receive workqueue to do the job of receiving the message in process context. As both sending and receive messages are now finished in processes, subscribed events cannot be dropped any more. As of this commit, this new server infrastructure is built, but not actually yet called by the existing TIPC code, but since the conversion changes required in order to use it are significant, the addition is kept here as a separate commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: allow implicit connect for stream socketsErik Hugne
TIPC's implied connect feature, aka piggyback connect, allows applications to save one syscall and all SYN/SYN-ACK signalling overhead when setting up a connection. Until now, this has only been supported for SEQPACKET sockets. Here, we make it possible to use this feature even with stream sockets. At the connecting side, the connection is completed when the first data message arrives from the accepting peer. This means that we must allow the connecting user to call blocking recv() before the socket has reached state SS_CONNECTED. So we must must relax the state machine check at recv_stream(), and allow the recv() call even if socket is in state SS_CONNECTING. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: change socket buffer overflow control to respect sk_rcvbufYing Xue
As per feedback from the netdev community, we change the buffer overflow protection algorithm in receiving sockets so that it always respects the nominal upper limit set in sk_rcvbuf. Instead of scaling up from a small sk_rcvbuf value, which leads to violation of the configured sk_rcvbuf limit, we now calculate the weighted per-message limit by scaling down from a much bigger value, still in the same field, according to the importance priority of the received message. To allow for administrative tunability of the socket receive buffer size, we create a tipc_rmem sysctl variable to allow the user to configure an even bigger value via sysctl command. It is a size of three (min/default/max) to be consistent with things like tcp_rmem. By default, the value initialized in tipc_rmem[1] is equal to the receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message. This value is also set as the default value of sk_rcvbuf. Originally-by: Jon Maloy <jon.maloy@ericsson.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> [Ying: added sysctl variation to Jon's original patch] Signed-off-by: Ying Xue <ying.xue@windriver.com> [PG: don't compile sysctl.c if not config'd; add Documentation] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: update code comments to reflect new uapi header pathYing Xue
Files tipc.h and tipc_config.h were moved to uapi directory, but the corresponding comments were not updated at the same time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: add socket option for low latency pollingEliezer Tamir
adds a socket option for low latency polling. This allows overriding the global sysctl value with a per-socket one. Unexport sysctl_net_ll_poll since for now it's not needed in modules. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: remove NET_LL_RX_POLL config menueEliezer Tamir
Remove NET_LL_RX_POLL from the config menu. Change default to y. Busy polling still needs to be enabled at run time. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: convert low latency sockets to sched_clock()Eliezer Tamir
Use sched_clock() instead of get_cycles(). We can use sched_clock() because we don't care much about accuracy. Remove the dependency on X86_TSC Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: change sysctl_net_ll_poll into an unsigned intEliezer Tamir
There is no reason for sysctl_net_ll_poll to be an unsigned long. Change it into an unsigned int. Fix the proc handler. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-06-16lseek(fd, n, SEEK_END) does *not* go to eof - nAl Viro
When you copy some code, you are supposed to read it. If nothing else, there's a chance to spot and fix an obvious bug instead of sharing it... X-Song: "I Got It From Agnes", by Tom Lehrer Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ Tom Lehrer? You're dating yourself, Al ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-15Linux 3.10-rc6Linus Torvalds
2013-06-15Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "These are a little later than I planned on since I got caught up with handling merges for 3.11 most of the week. Another week, another batch of fixes for arm-soc platforms. Again, nothing controversial. A few more than would be ideal, but all are valid fixes. In particular the prima2 panic patch is critical since it fixes a problem where multiplatform kernels panic on all but prima2 hardware." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: SAMSUNG: pm: Adjust for pinctrl- and DT-enabled platforms ARM: prima2: fix incorrect panic usage arm: mvebu: armada-xp-{gp,openblocks-ax3-4}: specify PCIe range ARM: Kirkwood: handle mv88f6282 cpu in __kirkwood_variant(). ARM: omap3: clock: fix wrong container_of in clock36xx.c ARM: dts: OMAP5: Fix missing PWM capability to timer nodes ARM: dts: omap4-panda|sdp: Fix mux for twl6030 IRQ pin and msecure line ARM: dts: AM33xx: Fix properties on gpmc node arm: omap2: fix AM33xx hwmod infos for UART2 ARM: OMAP3: Fix iva2_pwrdm settings for 3703
2013-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix RTNL locking in batman-adv, from Matthias Schiffer. 2) Don't allow non-passthrough macvlan devices to set NOPROMISC via netlink, otherwise we can end up with corrupted promisc counter values on the device. From Michael S Tsirkin. 3) Fix stmmac driver build with debugging defines enabled, from Dinh Nguyen. 4) Make sure name string we give in socket address in AF_PACKET is NULL terminated, from Daniel Borkmann. 5) Fix leaking of two uninitialized bytes of memory to userspace in l2tp, from Guillaume Nault. 6) Clear IPCB(skb) before tunneling otherwise we touch dangling IP options state and crash. From Saurabh Mohan. 7) Fix suspend/resume for davinci_mdio by using suspend_late and resume_early. From Mugunthan V N. 8) Don't tag ip_tunnel_init_net and ip_tunnel_delete_net with __net_{init,exit}, they can be called outside of those contexts. From Eric Dumazet. 9) Fix RX length error in sh_eth driver, from Yoshihiro Shimoda. 10) Fix missing sctp_outq initialization in some code paths of SCTP stack, from Neil Horman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) sctp: fully initialize sctp_outq in sctp_outq_init netiucv: Hold rtnl between name allocation and device registration. tulip: Properly check dma mapping result net: sh_eth: fix incorrect RX length error if R8A7740 ip_tunnel: remove __net_init/exit from exported functions drivers: net: davinci_mdio: restore mdio clk divider in mdio resume drivers: net: davinci_mdio: moving mdio resume earlier than cpsw ethernet driver net/ipv4: ip_vti clear skb cb before tunneling. tg3: Wait for boot code to finish after power on l2tp: Fix sendmsg() return value l2tp: Fix PPP header erasure and memory leak bonding: fix igmp_retrans type and two related races bonding: reset master mac on first enslave failure packet: packet_getname_spkt: make sure string is always 0-terminated net: ethernet: stmicro: stmmac: Fix compile error when STMMAC_XMIT_DEBUG used be2net: Fix 32-bit DMA Mask handling xen-netback: don't de-reference vif pointer after having called xenvif_put() macvlan: don't touch promisc without passthrough batman-adv: Don't handle address updates when bla is disabled batman-adv: forward late OGMs from best next hop ...
2013-06-14Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Benjamin Herrenschmidt: "So here are 3 fixes still for 3.10. Fixes are simple, bugs are nasty (though not recent regressions, nasty enough) and all targeted at stable" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Fix missing/delayed calls to irq_work powerpc: Fix emulation of illegal instructions on PowerNV platform powerpc: Fix stack overflow crash in resume_kernel when ftracing
2013-06-14smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().David Daney
Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts are enabled too early"), "bloody murder" is now being screamed. With a MIPS OCTEON config, we use on_each_cpu() in our irq_chip.irq_bus_sync_unlock() function. This gets called in early as a result of the time_init() call. Because the !SMP version of on_each_cpu() unconditionally enables irqs, we get: WARNING: at init/main.c:560 start_kernel+0x250/0x410() Interrupts were enabled early CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801 Call Trace: show_stack+0x68/0x80 warn_slowpath_common+0x78/0xb0 warn_slowpath_fmt+0x38/0x48 start_kernel+0x250/0x410 Suggested fix: Do what we already do in the SMP version of on_each_cpu(), and use local_irq_save/local_irq_restore. Because we need a flags variable, make it a static inline to avoid name space issues. [ Change from v1: Convert on_each_cpu to a static inline function, add #include <linux/irqflags.h> to avoid build breakage on some files. on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as on_each_cpu(), but they are not causing !SMP bugs for me, so I will defer changing them to a less urgent patch. ] Signed-off-by: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS fixes from Al Viro: "Several fixes + obvious cleanup (you've missed a couple of open-coded can_lookup() back then)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: snd_pcm_link(): fix a leak... use can_lookup() instead of direct checks of ->i_op->lookup move exit_task_namespaces() outside of exit_notify() fput: task_work_add() can fail if the caller has passed exit_task_work() ncpfs: fix rmdir returns Device or resource busy
2013-06-14Merge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs fixes from Ben Myers: - Remove noisy warnings about experimental support which spams the logs - Add padding to align directory and attr structures correctly - Set block number on child buffer on a root btree split - Disable verifiers during log recovery for non-CRC filesystems * tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs: xfs: don't shutdown log recovery on validation errors xfs: ensure btree root split sets blkno correctly xfs: fix implicit padding in directory and attr CRC formats xfs: don't emit v5 superblock warnings on write
2013-06-14Merge tag 'char-misc-3.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc fixes from Greg Kroah-Hartman: "Here are some small mei driver fixes for 3.10-rc6 that fix some reported problems" * tag 'char-misc-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: mei: me: clear interrupts on the resume path mei: nfc: fix nfc device freeing mei: init: Flush scheduled work before resetting the device
2013-06-14Merge tag 'usb-3.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg Kroah-Hartman: "Here are some small USB driver fixes that resolve some reported problems for 3.10-rc6 Nothing major, just 3 USB serial driver fixes, and two chipidea fixes" * tag 'usb-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: chipidea: fix id change handling usb: chipidea: fix no transceiver case USB: pl2303: fix device initialisation at open USB: spcp8x5: fix device initialisation at open USB: f81232: fix device initialisation at open
2013-06-15powerpc: Fix missing/delayed calls to irq_workBenjamin Herrenschmidt
When replaying interrupts (as a result of the interrupt occurring while soft-disabled), in the case of the decrementer, we are exclusively testing for a pending timer target. However we also use decrementer interrupts to trigger the new "irq_work", which in this case would be missed. This change the logic to force a replay in both cases of a timer boundary reached and a decrementer interrupt having actually occurred while disabled. The former test is still useful to catch cases where a CPU having been hard-disabled for a long time completely misses the interrupt due to a decrementer rollover. CC: <stable@vger.kernel.org> [v3.4+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-15powerpc: Fix emulation of illegal instructions on PowerNV platformPaul Mackerras
Normally, the kernel emulates a few instructions that are unimplemented on some processors (e.g. the old dcba instruction), or privileged (e.g. mfpvr). The emulation of unimplemented instructions is currently not working on the PowerNV platform. The reason is that on these machines, unimplemented and illegal instructions cause a hypervisor emulation assist interrupt, rather than a program interrupt as on older CPUs. Our vector for the emulation assist interrupt just calls program_check_exception() directly, without setting the bit in SRR1 that indicates an illegal instruction interrupt. This fixes it by making the emulation assist interrupt set that bit before calling program_check_interrupt(). With this, old programs that use no-longer implemented instructions such as dcba now work again. CC: <stable@vger.kernel.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-15powerpc: Fix stack overflow crash in resume_kernel when ftracingMichael Ellerman
It's possible for us to crash when running with ftrace enabled, eg: Bad kernel stack pointer bffffd12 at c00000000000a454 cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40] pc: c00000000000a454: resume_kernel+0x34/0x60 lr: c00000000000335c: performance_monitor_common+0x15c/0x180 sp: bffffd12 msr: 8000000000001032 dar: bffffd12 dsisr: 42000000 If we look at current's stack (paca->__current->stack) we see it is equal to c0000002ecab0000. Our stack is 16K, and comparing to paca->kstack (c0000002ecab3e30) we can see that we have overflowed our kernel stack. This leads to us writing over our struct thread_info, and in this case we have corrupted thread_info->flags and set _TIF_EMULATE_STACK_STORE. Dumping the stack we see: 3:mon> t c0000002ecab0000 [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70 [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180 --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30 [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable) [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90 [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34 [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300 [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable) [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280 [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40 [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 ... and so on __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry path. At that point the irq state is not consistent, ie. interrupts are hard disabled (by the exception entry), but the paca soft-enabled flag may be out of sync. This leads to the local_irq_restore() in trace_graph_entry() actually enabling interrupts, which we do not want. Because we have not yet reprogrammed the decrementer we immediately take another decrementer exception, and recurse. The fix is twofold. Firstly make sure we call DISABLE_INTS before calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles the irq state in the paca with the hardware, making it safe again to call local_irq_save/restore(). Although that should be sufficient to fix the bug, we also mark the runlatch routines as notrace. They are called very early in the exception entry and we are asking for trouble tracing them. They are also fairly uninteresting and tracing them just adds unnecessary overhead. [ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873 "powerpc: Rework runlatch code" by myself --BenH ] CC: <stable@vger.kernel.org> [v3.4+] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-15snd_pcm_link(): fix a leak...Al Viro
in case when snd_pcm_stream_linked(substream) is true, we end up leaking group. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15use can_lookup() instead of direct checks of ->i_op->lookupAl Viro
a couple of places got missed back when Linus has introduced that one... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15move exit_task_namespaces() outside of exit_notify()Oleg Nesterov
exit_notify() does exit_task_namespaces() after forget_original_parent(). This was needed to ensure that ->nsproxy can't be cleared prematurely, an exiting child we are going to reparent can do do_notify_parent() and use the parent's (ours) pid_ns. However, after 32084504 "pidns: use task_active_pid_ns in do_notify_parent" ->nsproxy != NULL is no longer needed, we rely on task_active_pid_ns(). Move exit_task_namespaces() from exit_notify() to do_exit(), after exit_fs() and before exit_task_work(). This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy() does fput() which needs task_work_add(). Note: this particular problem can be fixed if we change fput(), and that change makes sense anyway. But there is another reason to move the callsite. The original reason for exit_task_namespaces() from the middle of exit_notify() was subtle and it has already gone away, now this looks confusing. And this allows us do simplify exit_notify(), we can avoid unlock/lock(tasklist) and we can use ->exit_state instead of PF_EXITING in forget_original_parent(). Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15fput: task_work_add() can fail if the caller has passed exit_task_work()Oleg Nesterov
fput() assumes that it can't be called after exit_task_work() but this is not true, for example free_ipc_ns()->shm_destroy() can do this. In this case fput() silently leaks the file. Change it to fallback to delayed_fput_work if task_work_add() fails. The patch looks complicated but it is not, it changes the code from if (PF_KTHREAD) { schedule_work(...); return; } task_work_add(...) to if (!PF_KTHREAD) { if (!task_work_add(...)) return; /* fallback */ } schedule_work(...); As for shm_destroy() in particular, we could make another fix but I think this change makes sense anyway. There could be another similar user, it is not safe to assume that task_work_add() can't fail. Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-14net: sctp: sctp_association_init: put refs in reverse orderDaniel Borkmann
In case we need to bail out for whatever reason during assoc init, we call sctp_endpoint_put() and then sock_put(), however, we've hold both refs in reverse, non-symmetric order, so first sctp_endpoint_hold() and then sock_hold(). Reverse this, so that in an error case we have sock_put() and then sctp_endpoint_put(). Actually shouldn't matter too much, since both cleanup paths do the right thing, but that way, it is more consistent with the rest of the code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net: sctp: minor: remove variable in sctp_init_sockDaniel Borkmann
It's only used at this one time, so we could remove it as well. This is valid and also makes it more explicit/obvious that in case of error the sp->ep is NULL here, i.e. for the sctp_destroy_sock() check that was recently added. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net: sctp: sctp_sf_do_prm_asoc: do SCTP_CMD_INIT_CHOOSE_TRANSPORT firstDaniel Borkmann
While this currently cannot trigger any NULL pointer dereference in sctp_seq_dump_local_addrs(), better change the order of commands to prevent a future bug to happen. Although we first add SCTP_CMD_NEW_ASOC and then set the SCTP_CMD_INIT_CHOOSE_TRANSPORT, it is okay for now, since this primitive is only called by sctp_connect() or sctp_sendmsg() with sctp_assoc_add_peer() set first. However, lets do this precaution and first set the transport and then add it to the association hashlist to prevent in future something to possibly triggering this. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net: sctp: sideeffect: throw BUG if primary_path is NULLDaniel Borkmann
This clearly states a BUG somewhere in the SCTP code as e.g. fixed once in f28156335 ("sctp: Use correct sideffect command in duplicate cookie handling"). If this ever happens, throw a trace in the sideeffect engine where assocs clearly must have a primary_path assigned. When in sctp_seq_dump_local_addrs() also throw a WARN and bail out since we do not need to panic for printing this one asterisk. Also, it will avoid the not so obvious case when primary != NULL test passes and at a later point in time triggering a NULL ptr dereference caused by primary. While at it, also fix up the white space. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A few miscellaneous improvements and cleanups before the GRE tunnel integration series. Intended for net-next/3.11. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14openvswitch: Simplify interface ovs_flow_metadata_from_nlattrs()Pravin B Shelar
This is not functional change, this is just code cleanup. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: make skb->csum consistent with rest of networking stack.Pravin B Shelar
Following patch keeps skb->csum correct across ovs. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: Fix struct comment.Pravin B Shelar
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: Fix misspellings in comments and docs.Andy Hill
Flagged with: https://github.com/lyda/misspell-check Run with: git ls-files | misspellings -f - Signed-off-by: Andy Hill <hillad@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>