linux-kernel-test

Author	SHA1	Message	Date
Daniel Borkmann	d8cdeda6dd	net: tcp_probe: kprobes: adapt jtcp_rcv_established signature This patches fixes a rather unproblematic function signature mismatch as the const specifier was missing for the th variable; and next to that it adds a build-time assertion so that future function signature mismatches for kprobes will not end badly, similarly as commit `22222997` ("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack") did it for SCTP. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
Daniel Borkmann	b4c1c1d038	net: tcp_probe: also include rcv_wnd next to snd_wnd It is helpful to sometimes know the TCP window sizes of an established socket e.g. to confirm that window scaling is working or to tweak the window size to improve high-latency connections, etc etc. Currently the TCP snooper only exports the send window size, but not the receive window size. Therefore, also add the receive window size to the end of the output line. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
David S. Miller	baf3b3f227	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Some constifications, from Mathias Krause. 2) Catch bugs if a hold timer is still active when xfrm_policy_destroy() is called, from Fan Du. 3) Remove a redundant address family checking, from Fan Du. 4) Make xfrm_state timer monotonic to be independent of system clock changes, from Fan Du. 5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(), from Rami Rosen. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:04:41 -07:00
Yuchung Cheng	0f7cc9a3c2	tcp: increase throughput when reordering is high The stack currently detects reordering and avoid spurious retransmission very well. However the throughput is sub-optimal under high reordering because cwnd is increased only if the data is deliverd in order. I.e., FLAG_DATA_ACKED check in tcp_ack(). The more packet are reordered the worse the throughput is. Therefore when reordering is proven high, cwnd should advance whenever the data is delivered regardless of its ordering. If reordering is low, conservatively advance cwnd only on ordered deliveries in Open state, and retain cwnd in Disordered state (RFC5681). Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms to 55ms (for reordering effect). This change increases TCP throughput by 20 - 25% to near bottleneck BW. A special case is the stretched ACK with new SACK and/or ECE mark. For example, a receiver may receive an out of order or ECN packet with unacked data buffered because of LRO or delayed ACK. The principle on such an ACK is to advance cwnd on the cummulative acked part first, then reduce cwnd in tcp_fastretrans_alert(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 14:39:46 -07:00
Johannes Berg	9d47b38056	Revert "genetlink: fix family dump race" This reverts commit `58ad436fcf`. It turns out that the change introduced a potential deadlock by causing a locking dependency with netlink's cb_mutex. I can't seem to find a way to resolve this without doing major changes to the locking, so revert this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 13:24:02 -07:00
John W. Linville	69b307a48a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2013-08-22 14:27:31 -04:00
John W. Linville	89b5f74a26	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-22 11:35:22 -04:00
Johannes Berg	e133fae263	mac80211: minstrel_ht: don't use control.flags in TX status path Sujith reports that my commit `af61a16518` ("mac80211: add control port protocol TX control flag") broke ath9k (aggregation). The reason is that I made minstrel_ht use the flag in the TX status path, where it can have been overwritten by the driver. Since we have no more space in info->flags, revert that part of the change for now, until we can reshuffle the flags or so. Reported-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-22 08:37:08 +02:00
Frédéric Dalleau	2dea632f9a	Bluetooth: Add SCO connection fallback When initiating a transparent eSCO connection, make use of T2 settings at first try. T2 is the recommended settings from HFP 1.6 WideBand Speech. Upon connection failure, try T1 settings. When CVSD is requested and eSCO is supported, try to establish eSCO connection using S3 settings. If it fails, fallback in sequence to S2, S1, D1, D0 settings. To know which setting should be used, conn->attempt is used. It indicates the currently ongoing SCO connection attempt and can be used as the index for the fallback settings table. These setting and the fallback order are described in Bluetooth HFP 1.6 specification p. 101. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:13 +02:00
Frédéric Dalleau	1a4c958cf9	Bluetooth: Handle specific error for SCO connection fallback Synchronous Connection Complete event can return error "Connection Rejected due to Limited resources (0x10)". Handling this error is required for SCO connection fallback. This error happens when the server tried to accept the connection but failed to negotiate settings. This error code has been verified experimentally by sending a T2 request to a T1 only SCO listener. Client dump follows : < HCI Command (0x01\|0x0028) plen 17 [hci0] 3.696064 Handle: 12 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Retransmission effort: Optimize for link quality (0x02) Packet type: 0x0380 > HCI Event (0x0f) plen 4 [hci0] 3.697034 Setup Synchronous Connection (0x01\|0x0028) ncmd 1 Status: Success (0x00) > HCI Event (0x2c) plen 17 [hci0] 3.736059 Status: Connection Rejected due to Limited Resources (0x0d) Handle: 0 Address: xx:xx:xx:xx:xx:AB (OUI 70-F3-95) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x06 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) Server dump follows : > HCI Event (0x04) plen 10 [hci0] 4.741513 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Class: 0x620100 Major class: Computer (desktop, notebook, PDA, organizers) Minor class: Uncategorized, code for device not assigned Networking (LAN, Ad hoc) Audio (Speaker, Microphone, Headset) Telephony (Cordless telephony, Modem, Headset) Link type: eSCO (0x02) < HCI Command (0x01\|0x0029) plen 21 [hci0] 4.743269 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Retransmission effort: Optimize for link quality (0x02) Packet type: 0x03c1 > HCI Event (0x0f) plen 4 [hci0] 4.745517 Accept Synchronous Connection (0x01\|0x0029) ncmd 1 Status: Success (0x00) > HCI Event (0x2c) plen 17 [hci0] 4.749508 Status: Connection Rejected due to Limited Resources (0x0d) Handle: 0 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x06 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:13 +02:00
Frédéric Dalleau	79dc0087c3	Bluetooth: Prevent transparent SCO on older devices Older Bluetooth devices may not support Setup Synchronous Connection or SCO transparent data. This is indicated by the corresponding LMP feature bits. It is not possible to know if the adapter support these features before setting BT_VOICE option since the socket is not bound to an adapter. An adapter can also be added after the socket is created. The socket can be bound to an address before adapter is plugged in. Thus, on a such adapters, if user request BT_VOICE_TRANSPARENT, outgoing connections fail on connect() and returns -EOPNOTSUPP. Incoming connections do not fail. However, they should only be allowed depending on what was specified in Write_Voice_Settings command. EOPNOTSUPP is choosen because connect() system call is failing after selecting route but before any connection attempt. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:12 +02:00
Frédéric Dalleau	10c62ddc6f	Bluetooth: Parameters for outgoing SCO connections In order to establish a transparent SCO connection, the correct settings must be specified in the Setup Synchronous Connection request. For that, a setting field is added to ACL connection data to set up the desired parameters. The patch also removes usage of hdev->voice_setting in CVSD connection and makes use of T2 parameters for transparent data. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:11 +02:00
Frédéric Dalleau	2f69a82acf	Bluetooth: Use voice setting in deferred SCO connection request When an incoming eSCO connection is requested, check the selected voice setting and reply appropriately. Voice setting should have been negotiated previously. For example, in case of HFP, the codec is negotiated using AT commands on the RFCOMM channel. This patch only changes replies for socket with deferred setup enabled. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:11 +02:00
Frédéric Dalleau	ad10b1a487	Bluetooth: Add Bluetooth socket voice option This patch extends the current Bluetooth socket options with BT_VOICE. This is intended to choose voice data type at runtime. It only applies to SCO sockets. Incoming connections shall be setup during deferred setup. Outgoing connections shall be setup before connect(). The desired setting is stored in the SCO socket info. This patch declares needed members, modifies getsockopt() and setsockopt(). Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:09 +02:00
Frédéric Dalleau	33f2404823	Bluetooth: Remove unused mask parameter in sco_conn_defer_accept From Bluetooth Core v4.0 specification, 7.1.8 Accept Connection Request Command "When accepting synchronous connection request, the Role parameter is not used and will be ignored by the BR/EDR Controller." Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:09 +02:00
Frédéric Dalleau	e660ed6c70	Bluetooth: Use hci_connect_sco directly hci_connect is a super function for connecting hci protocols. But the voice_setting parameter (introduced in subsequent patches) is only needed by SCO and security requirements are not needed for SCO channels. Thus, it makes sense to have a separate function for SCO. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:08 +02:00
Gianluca Anzolin	ffe6b68cc5	Bluetooth: Purge the dlc->tx_queue to avoid circular dependency In rfcomm_tty_cleanup we purge the dlc->tx_queue which may contain socket buffers referencing the tty_port and thus preventing the tty_port destruction. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:08 +02:00
Gianluca Anzolin	ece3150dea	Bluetooth: Fix the reference counting of tty_port The tty_port can be released in two cases: when we get a HUP in the functions rfcomm_tty_hangup() and rfcomm_dev_state_change(). Or when the user releases the device in rfcomm_release_dev(). In these cases we set the flag RFCOMM_TTY_RELEASED so that no other function can get a reference to the tty_port. The use of !test_and_set_bit(RFCOMM_TTY_RELEASED) ensures that the 'initial' tty_port reference is only dropped once. The rfcomm_dev_del function is removed becase it isn't used anymore. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:07 +02:00
Gianluca Anzolin	cad348a17e	Bluetooth: Implement .activate, .shutdown and .carrier_raised methods Implement .activate, .shutdown and .carrier_raised methods of tty_port to manage the dlc, moving the code from rfcomm_tty_install() and rfcomm_tty_cleanup() functions. At the same time the tty .open()/.close() and .hangup() methods are changed to use the tty_port helpers that properly call the aforementioned tty_port methods. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:07 +02:00
Gianluca Anzolin	54b926a143	Bluetooth: Move the tty initialization and cleanup out of open/close Move the tty_struct initialization from rfcomm_tty_open() to rfcomm_tty_install() and do the same for the cleanup moving the code from rfcomm_tty_close() to rfcomm_tty_cleanup(). Add also extra error handling in rfcomm_tty_install() because, unlike .open()/.close(), .cleanup() is not called if .install() fails. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:06 +02:00
Gianluca Anzolin	ebe937f74b	Bluetooth: Remove the device from the list in the destructor The current code removes the device from the device list in several places. Do it only in the destructor instead and in the error path of rfcomm_add_dev() if the device couldn't be initialized. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:06 +02:00
Gianluca Anzolin	396dc223dd	Bluetooth: Take proper tty_struct references In net/bluetooth/rfcomm/tty.c the struct tty_struct is used without taking references. This may lead to a use-after-free of the rfcomm tty. Fix this by taking references properly, using the tty_port_* helpers when possible. The raw assignments of dev->port.tty in rfcomm_tty_open/close are addressed in the later commit 'rfcomm: Implement .activate, .shutdown and .carrier_raised methods'. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:05 +02:00
Marcel Holtmann	c7882cbd11	Bluetooth: Set different event mask for LE-only controllers In case of a Low Energy only controller it makes no sense to configure the full BR/EDR event mask. It will just enable events that can not be send anyway and there is no guarantee that such a controller will accept this value. Use event mask 0x90 0xe8 0x04 0x02 0x00 0x80 0x00 0x20 for LE-only controllers which enables the following events: Disconnection Complete Encryption Change Read Remote Version Information Complete Command Complete Command Status Hardware Error Number of Completed Packets Data Buffer Overflow Encryption Key Refresh Complete LE Meta This is according to Core Specification, Part E, Section 3. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:05 +02:00
Johan Hedberg	9d225d2208	Bluetooth: Fix getting SCO socket options in deferred state When a socket is in deferred state there does actually exist an underlying connection even though the connection state is not yet BT_CONNECTED. In the deferred state it should therefore be allowed to get socket options that usually depend on a connection, such as SCO_OPTIONS and SCO_CONNINFO. This patch fixes the behavior of some user space code that behaves as follows without it: $ sudo tools/btiotest -i 00:1B:DC:xx:xx:xx -d -s accept=2 reject=-1 discon=-1 defer=1 sec=0 update_sec=0 prio=0 voice=0x0000 Listening for SCO connections bt_io_get(OPT_DEST): getsockopt(SCO_OPTIONS): Transport endpoint is not connected (107) Accepting connection Successfully connected to 60:D8:19:xx:xx:xx. handle=43, class=000000 The conditions that the patch updates the if-statements to is taken from similar code in l2cap_sock.c which correctly handles the deferred state. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:04 +02:00
Simon Wunderlich	75a423f493	mac80211: ibss: fix ignored channel parameter my earlier patch "mac80211: change IBSS channel state to chandef" created a regression by ignoring the channel parameter in __ieee80211_sta_join_ibss, which breaks IBSS channel selection. This patch fixes this situation by using the right channel and adopting the selected bandwidth mode. Cc: stable@vger.kernel.org Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-21 15:33:08 +02:00
Felix Fietkau	2dfca312a9	mac80211: add a flag to indicate CCK support for HT clients brcm80211 cannot handle sending frames with CCK rates as part of an A-MPDU session. Other drivers may have issues too. Set the flag in all drivers that have been tested with CCK rates. This fixes a reported brcmsmac regression introduced in commit ef47a5e4f1aaf1d0e2e6875e34b2c9595897bef6 "mac80211/minstrel_ht: fix cck rate sampling" Cc: stable@vger.kernel.org # 3.10 Reported-by: Tom Gundersen <teg@jklm.no> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-21 15:03:25 +02:00
Johannes Berg	2a3ba63c23	mac80211: add missing channel context release IBSS needs to release the channel context when leaving but I evidently missed that. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-21 12:04:48 +02:00
Daniel Borkmann	9fd0784164	net: ipv6: mcast: minor: use defines for rfc3810/8.1 lengths Instead of hard-coding length values, use a define to make it clear where those lengths come from. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:52:02 -07:00
Daniel Borkmann	c2cef4e888	net: ipv6: minor: *_start_timer: rather use unsigned long For the functions mld_gq_start_timer(), mld_ifc_start_timer(), and mld_dad_start_timer(), rather use unsigned long than int as we operate only on unsigned values anyway. This seems more appropriate as there is no good reason to do type conversions to int, that could lead to future errors. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:52:02 -07:00
Daniel Borkmann	846989635b	net: ipv6: igmp6_event_query: use msecs_to_jiffies Use proper API functions to calculate jiffies from milliseconds and not the crude method of dividing HZ by a value. This ensures more accurate values even in the case of strange HZ values. While at it, also simplify code in the mlh2 case by using max(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:52:02 -07:00
Nicolas Dichtel	e837735ec4	ip6_tunnel: ensure to always have a link local address When an Xin6 tunnel is set up, we check other netdevices to inherit the link- local address. If none is available, the interface will not have any link-local address. RFC4862 expects that each interface has a link local address. Now than this kind of tunnels supports x-netns, it's easy to fall in this case (by creating the tunnel in a netns where ethernet interfaces stand and then moving it to a other netns where no ethernet interface is available). RFC4291, Appendix A suggests two methods: the first is the one currently implemented, the second is to generate a unique identifier, so that we can always generate the link-local address. Let's use eth_random_addr() to generate this interface indentifier. I remove completly the previous method, hence for the whole life of the interface, the link-local address remains the same (previously, it depends on which ethernet interfaces were up when the tunnel interface was set up). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:45:42 -07:00
David S. Miller	7eaa48a45c	Revert "ipv6: fix checkpatch errors in net/ipv6/addrconf.c" This reverts commit `df8372ca74`. These changes are buggy and make unintended semantic changes to ip6_tnl_add_linklocal(). Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:44:39 -07:00
Toshiaki Makita	ef40b7ef18	bridge: Use the correct bit length for bitmap functions in the VLAN code The VLAN code needs to know the length of the per-port VLAN bitmap to perform its most basic operations (retrieving VLAN informations, removing VLANs, forwarding database manipulation, etc). Unfortunately, in the current implementation we are using a macro that indicates the bitmap size in longs in places where the size in bits is expected, which in some cases can cause what appear to be random failures. Use the correct macro. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:35:57 -07:00
David S. Miller	5c751c9344	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Regarding the iwlwifi bits, Johannes says: "We revert an rfkill bugfix that unfortunately caused more bugs, shuffle some code to avoid touching the PCIe device before it's enabled and disconnect if firmware fails to do our bidding. I also have Stanislaw's fix to not crash in some channel switch scenarios." As for the mac80211 bits, Johannes says: "This time, I have one fix from Dan Carpenter for users of nl80211hdr_put(), and one fix from myself fixing a regression with the libertas driver." Along with the above... Dan Carpenter fixes some incorrectly placed "address of" operators in hostap that caused copying of junk data. Jussi Kivilinna corrects zd1201 to use an allocated buffer rather than the stack for a URB operation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 17:25:55 -07:00
Willem de Bruijn	8bcdeaff5e	packet: restore packet statistics tp_packets to include drops getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit `ee80fbf301` ("packet: account statistics only in tpacket_stats_u") cleaned up the getsockopt PACKET_STATISTICS code. This also changed semantics. Historically, tp_packets included tp_drops on return. The commit removed the line that adds tp_drops into tp_packets. This patch reinstates the old semantics. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 17:23:58 -07:00
David S. Miller	cc666c53cc	Included change: - Check if the skb has been correctly prepared before going on -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQIcBAABCAAGBQJSEkvHAAoJEADl0hg6qKeOyTQP/ifIXk5t26Tu8GTCH+lQnF36 1HY4nkEhLBkrEaKv0RXXEwLCDe1Gk8INewSXhtgDe7v696287zvxSDiftxXOwSn8 EkrP3jakxqNgyEstVUMxXuHQMxn8YsOnU+u4L4MZvcsWNmh1V8FzNLxPDWF3Z0bi ycXhFI+BR+waWzFd8rVZ5sJ00ZhgSuM5vJ/uQ28kT8DyDZXz0I0mvve7ZUh5fczc L7vvnju9VRq84RxV6bQwf9hXDk54fCLz22WSMolrqaHCl0XF4OAu6OVcYBLA0bp7 GUU7fS8IUiqAuC02FS5HEYPy1VErCok8hP/fzvjz8Bxuzz0I5SdYPurFTZQzAx73 U0GCLtNOE7zkwIsRbKhMdUcB6DoFZJVUaLo8YS9E1tl/nn1oRILFGTFb2T9/WzQ8 lbCkmYm+WhLdKeZbLkf8PPs9PDrhRQKr/QRHrCHVKh4rqzP1BUm4FqIBXo71Kiiz no4GJoern8vW0CzoR59P5++/iFOCVTIx4ZJWvnYjWbqsYRazKjjCFtHffpz6mz+1 pHlrdYAZo+DOvme/2putfe6ViR+bA3lPxPkM7k3gADMifJcCAl3D7OM53QaDSKAk Gw3yHafxBaFPXFdqxQkkC6ks7T6qoTsPI2lLqUG6srU3XA399bWVcLq7X9JEcR+9 ODwzHPj9fD5CZCe22lIO =gllO -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included change: - Check if the skb has been correctly prepared before going on	2013-08-20 16:54:29 -07:00
Dan Carpenter	ea857f28ab	ipip: dereferencing an ERR_PTR in ip_tunnel_init_net() We need to move the derefernce after the IS_ERR() check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:12:15 -07:00
Eric Dumazet	734d2725db	ipv4: raise IP_MAX_MTU to theoretical limit As discussed last year [1], there is no compelling reason to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF [1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2 Willem raised this issue again because some of our internal regression tests broke after lo mtu being set to 65536. IP_MTU reports 0xFFF0, and the test attempts to send a RAW datagram of mtu + 1 bytes, expecting the send() to fail, but it does not. Alexey raised interesting points about TCP MSS, that should be addressed in follow-up patches in TCP stack if needed, as someone could also set an odd mtu anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:04 -07:00
Christoph Paasch	397b417463	tcp: trivial: Remove nocache argument from tcp_v4_send_synack The nocache-argument was used in tcp_v4_send_synack as an argument to inet_csk_route_req. However, since `ba3f7f04ef` (ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code.) this is no more used. This patch removes the unsued argument from tcp_v4_send_synack. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:04 -07:00
dingtianhong	df8372ca74	ipv6: fix checkpatch errors in net/ipv6/addrconf.c ERROR: code indent should use tabs where possible: fix 2. ERROR: do not use assignment in if condition: fix 5. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:03 -07:00
dingtianhong	ba3542e15c	ipv6: convert the uses of ADBG and remove the superfluous parentheses Just follow the Joe Perches's opinion, it is a better way to fix the style errors. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 15:05:03 -07:00
David S. Miller	89d5e23210	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:30:54 -07:00
Alexander Aring	65d892c8ac	6lowpan: handle context based source address Handle context based address when an unspecified address is given. For other context based address we print a warning and drop the packet because we don't support it right now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:12 -07:00
Alexander Aring	ce2463b283	6lowpan: lowpan_uncompress_addr with address_mode This patch drops the pre and postcount calculation from the lowpan_uncompress_addr function.We use instead a switch/case over address_mode value. The original implementation has several bugs in this function and it was hard to decrypt how it works. To make it maintainable and fix these bugs this patch basically reimplements lowpan_uncompress_addr from scratch. A list of bugs we found in the current implementation: 1) Properly support uncompression of short-address based IPv6 addresses (instead of basically copying garbage) 2) Fix use and uncompression of long-addresses based IPv6 addresses 3) Add missing ff:fe00 in the case of SAM/DAM = 2 and M = 0 Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:12 -07:00
Alexander Aring	84c2e7bcf5	6lowpan: add function to uncompress multicast addr Add function to uncompress multicast address. This function split the uncompress function for a multicast address in a seperate function. To uncompress a multicast address is different than a other non-multicasts addresses according to rfc6282. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:12 -07:00
Alexander Aring	4666669fc3	6lowpan: introduce lowpan_fetch_skb function This patch adds a helper function to parse the ipv6 header to a 6lowpan header in stream. This function checks first if we can pull data with a specific length from a skb. If this seems to be okay, we copy skb data to a destination pointer and run skb_pull. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:11 -07:00
David Hauweele	31afe1f73e	6lowpan: Fix fragmentation with link-local compressed addresses When a new 6lowpan fragment is received, a skbuff is allocated for the reassembled packet. However when a 6lowpan packet compresses link-local addresses based on link-layer addresses, the processing function relies on the skb mac control block to find the related link-layer address. This patch copies the control block from the first fragment into the newly allocated skb to keep a trace of the link-layer addresses in case of a link-local compressed address. Edit: small changes on comment issue Signed-off-by: David Hauweele <david@hauweele.net> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:11 -07:00
Alexander Aring	84ce1ddfef	6lowpan: init ipv6hdr buffer to zero This patch simplify the handling to set fields inside of struct ipv6hdr to zero. Instead of setting some memory regions with memset to zero we initialize the whole ipv6hdr to zero. This is a simplification for parsing the 6lowpan header for the upcomming patches. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:23:11 -07:00
Andrey Vagin	7ed5c5ae96	tcp: set timestamps for restored skb-s When the repair mode is turned off, the write queue seqs are updated so that the whole queue is considered to be 'already sent. The "when" field must be set for such skb. It's used in tcp_rearm_rto for example. If the "when" field isn't set, the retransmit timeout can be calculated incorrectly and a tcp connected can stop for two minutes (TCP_RTO_MAX). Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:07:15 -07:00
Joe Perches	8be04b9374	treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks Don't emit OOM warnings when k.alloc calls fail when there there is a v.alloc immediately afterwards. Converted a kmalloc/vmalloc with memset to kzalloc/vzalloc. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-08-20 13:06:40 +02:00
Pravin B Shelar	58264848a5	openvswitch: Add vxlan tunneling support. Following patch adds vxlan vport type for openvswitch using vxlan api. So now there is vxlan dependency for openvswitch. CC: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 00:15:44 -07:00
Hannes Frederic Sowa	f46078cfcd	ipv6: drop packets with multiple fragmentation headers It is not allowed for an ipv6 packet to contain multiple fragmentation headers. So discard packets which were already reassembled by fragmentation logic and send back a parameter problem icmp. The updates for RFC 6980 will come in later, I have to do a bit more research here. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 00:11:24 -07:00
Hannes Frederic Sowa	4b08a8f1bd	ipv6: remove max_addresses check from ipv6_create_tempaddr Because of the max_addresses check attackers were able to disable privacy extensions on an interface by creating enough autoconfigured addresses: <http://seclists.org/oss-sec/2012/q4/292> But the check is not actually needed: max_addresses protects the kernel to install too many ipv6 addresses on an interface and guards addrconf_prefix_rcv to install further addresses as soon as this limit is reached. We only generate temporary addresses in direct response of a new address showing up. As soon as we filled up the maximum number of addresses of an interface, we stop installing more addresses and thus also stop generating more temp addresses. Even if the attacker tries to generate a lot of temporary addresses by announcing a prefix and removing it again (lifetime == 0) we won't install more temp addresses, because the temporary addresses do count to the maximum number of addresses, thus we would stop installing new autoconfigured addresses when the limit is reached. This patch fixes CVE-2013-0343 (but other layer-2 attacks are still possible). Thanks to Ding Tianhong to bring this topic up again. Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: George Kargiotakis <kargig@void.gr> Cc: P J P <ppandit@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 00:11:24 -07:00
John W. Linville	22f0d2d1e7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-08-19 14:24:45 -04:00
Rami Rosen	e3fec5a1c5	xfrm: remove irrelevant comment in xfrm_input(). This patch removes a comment in xfrm_input() which became irrelevant due to commit `2774c13`, "xfrm: Handle blackhole route creation via afinfo". That commit removed returning -EREMOTE in the xfrm_lookup() method when the packet should be discarded and also removed the correspoinding -EREMOTE handlers. This was replaced by calling the make_blackhole() method. Therefore the comment about -EREMOTE is not relevant anymore. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 12:45:16 +02:00
Hannes Frederic Sowa	844d48746e	xfrm: choose protocol family by skb protocol We need to choose the protocol family by skb->protocol. Otherwise we call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode, in which case we should call down to xfrm4_local_error (ip6 sockets are a superset of ip4 ones). We are called before before ip_output functions, so skb->protocol is not reset. Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 09:39:04 +02:00
Hannes Frederic Sowa	5d0ff542d0	ipv6: xfrm: dereference inner ipv6 header if encapsulated In xfrm6_local_error use inner_header if the packet was encapsulated. Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 09:38:25 +02:00
Hannes Frederic Sowa	3d483058c8	ipv6: wire up skb->encapsulation When pushing a new header before current one call skb_reset_inner_headers to record the position of the inner headers in the various ipv6 tunnel protocols. We later need this to correctly identify the addresses needed to send back an error in the xfrm layer. This change is safe, because skb->protocol is always checked before dereferencing data from the inner protocol. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-19 09:37:46 +02:00
Linus Lüssing	50fa3b31f4	batman-adv: check return type of unicast packet preparations batadv_unicast(_4addr)_prepare_skb might reallocate the skb's data. And if it tries to do so then this can potentially fail. We shouldn't continue working on this skb in such a case. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-17 20:02:32 +02:00
David S. Miller	2ff1cf12c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2013-08-16 15:37:26 -07:00
John W. Linville	d074666366	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2013-08-16 14:24:51 -04:00
Linus Torvalds	ddea368c78	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix SKB leak in 8139cp, from Dave Jones. 2) Fix use of _PAGES interfaces with mlx5 firmware, from Moshe Lazar. 3) RCU conversion of macvtap introduced two races, fixes by Eric Dumazet 4) Synchronize statistic flows in bnx2x driver to prevent corruption, from Dmitry Kravkov 5) Undo optimization in IP tunneling, we were using the inner IP header in some cases to inherit the IP ID, but that isn't correct in some circumstances. From Pravin B Shelar 6) Use correct struct size when parsing netlink attributes in rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen 7) Length verifications in tun_get_user() are bogus, from Weiping Pan and Dan Carpenter 8) Fix bad merge resolution during 3.11 networking development in openvswitch, albeit a harmless one which added some unreachable code. From Jesse Gross 9) Wrong size used in flexible array allocation in openvswitch, from Pravin B Shelar 10) Clear out firmware capability flags the be2net driver isn't ready to handle yet, from Sarveshwar Bandi 11) Revert DMA mapping error checking addition to cxgb3 driver, it's buggy. From Alexey Kardashevskiy 12) Fix regression in packet scheduler rate limiting when working with a link layer of ATM. From Jesper Dangaard Brouer 13) Fix several errors in TCP Cubic congestion control, in particular overflow errors in timestamp calculations. From Eric Dumazet and Van Jacobson 14) In ipv6 routing lookups, we need to backtrack if subtree traversal don't result in a match. From Hannes Frederic Sowa 15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs 16) Get "low latency" out of the new MIB counter names. From Eliezer Tamir 17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar Samudrala 18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung Cheng 19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean 20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong 21) call_rcu() call to destroy SCTP transport is done too early and might result in an oops. From Daniel Borkmann 22) Fix races in genetlink family dumps, from Johannes Berg 23) Flags passed into macvlan by the user need to be validated properly, from Michael S Tsirkin 24) Fix skge build on 32-bit, from Stephen Hemminger 25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira Ayuso 26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay Aleksandrov 27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel Borkmann 28) neigh_parms need to be setup before calling the ->ndo_neigh_setup() method. From Veaceslav Falico 29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet 30) Don't dereference MLD query message if the length isn't value in the bridge multicast code, from Linus Lüssing 31) Fix VXLAN IGMP join regression due to an inverted check, from Cong Wang git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits) net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes tun: signedness bug in tun_get_user() qlcnic: Fix diagnostic interrupt test for 83xx adapters qlcnic: Fix beacon state return status handling qlcnic: Fix set driver version command net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset net_sched: restore "linklayer atm" handling drivers/net/ethernet/via/via-velocity.c: update napi implementation Revert "cxgb3: Check and handle the dma mapping errors" be2net: Clear any capability flags that driver is not interested in. openvswitch: Reset tunnel key between input and output. openvswitch: Use correct type while allocating flex array. openvswitch: Fix bad merge resolution. tun: compare with 0 instead of total_len rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header ethernet/arc/arc_emac - fix NAPI "work > weight" warning ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id. bnx2x: prevent crash in shutdown flow with CNIC bnx2x: fix PTE write access error bnx2x: fix memory leak in VF ...	2013-08-16 09:35:29 -07:00
Johannes Berg	27b3eb9c06	mac80211: add APIs to allow keeping connections after WoWLAN In order to be able to (securely) keep connections alive after the system was suspended for WoWLAN, we need some additional APIs. We already have API (ieee80211_gtk_rekey_notify) to tell wpa_supplicant about the new replay counter if GTK rekeying was done by the device while the host was asleep, but that's not sufficient. If GTK rekeying wasn't done, we need to tell the host about sequence counters for the GTK (and PTK regardless of rekeying) that was used while asleep, add ieee80211_set_key_rx_seq() for that. If GTK rekeying was done, then we need to be able to disable the old keys (with ieee80211_remove_key()) and allocate the new GTK key(s) in mac80211 (with ieee80211_gtk_rekey_add()). If protocol offload (e.g. ARP) is implemented, then also the TX sequence counter for the PTK must be updated, using the new ieee80211_set_key_tx_seq() function. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-16 12:58:43 +02:00
Simon Wunderlich	d51b70ff51	mac80211: move ibss presp generation in own function Channel Switch will later require to generate beacons without setting them immediately. Therefore split the presp generation in an own function. Splitting the original very long function might be a good idea anyway. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-16 12:25:34 +02:00
Johan Almbladh	86c228a762	mac80211: perform power save processing before decryption This patch decouples the power save processing from the frame decryption by running the decrypt rx handler after sta_process. In the case where the decryption failed for some reason, the stack used to not process the PM and MOREDATA bits for that frame. The stack now always performs power save processing regardless of the decryption result. That means that encrypted data frames and NULLFUNC frames are now handled in the same way regarding power save processing, making the stack more robust. Signed-off-by: Johan Almbladh <ja@anyfi.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-16 12:19:16 +02:00
Fan Du	99565a6c47	xfrm: Make xfrm_state timer monotonic xfrm_state timer should be independent of system clock change, so switch to CLOCK_BOOTTIME base which is not only monotonic but also counting suspend time. Thus issue reported in commit: `9e0d57fd6d` ("xfrm: SAD entries do not expire correctly after suspend-resume") could ALSO be avoided. v2: Use CLOCK_BOOTTIME to count suspend time, but still monotonic. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-16 06:53:28 +02:00
Pravin B Shelar	16b304f340	netlink: Eliminate kmalloc in netlink dump operation. Following patch stores struct netlink_callback in netlink_sock to avoid allocating and freeing it on every netlink dump msg. Only one dump operation is allowed for a given socket at a time therefore we can safely convert cb pointer to cb struct inside netlink_sock. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 15:51:20 -07:00
Francesco Fusco	d14c5ab6be	net: proc_fs: trivial: print UIDs as unsigned int UIDs are printed in the proc_fs as signed int, whereas they are unsigned int. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 14:37:46 -07:00
John W. Linville	48c3e37135	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-15 15:36:55 -04:00
Luis Henriques	dee08ab83d	net: rfkill: Do not ignore errors from regulator_enable() Function regulator_enable() may return an error that has to be checked. This patch changes function rfkill_regulator_set_block() so that it checks for the return code. Also, rfkill_data->reg_enabled is set to 'true' only if there is no error. This fixes the following compilation warning: net/rfkill/rfkill-regulator.c:43:20: warning: ignoring return value of 'regulator_enable', declared with attribute warn_unused_result [-Wunused-result] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-15 18:17:05 +02:00
Jesper Dangaard Brouer	8a8e3d84b1	net_sched: restore "linklayer atm" handling commit `56b765b79` ("htb: improved accuracy at high rates") broke the "linklayer atm" handling. tc class add ... htb rate X ceil Y linklayer atm The linklayer setting is implemented by modifying the rate table which is send to the kernel. No direct parameter were transferred to the kernel indicating the linklayer setting. The commit `56b765b79` ("htb: improved accuracy at high rates") removed the use of the rate table system. To keep compatible with older iproute2 utils, this patch detects the linklayer by parsing the rate table. It also supports future versions of iproute2 to send this linklayer parameter to the kernel directly. This is done by using the __reserved field in struct tc_ratespec, to convey the choosen linklayer option, but only using the lower 4 bits of this field. Linklayer detection is limited to speeds below 100Mbit/s, because at high rates the rtab is gets too inaccurate, so bad that several fields contain the same values, this resembling the ATM detect. Fields even start to contain "0" time to send, e.g. at 1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have been more broken than we first realized. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:43:08 -07:00
David S. Miller	09a8f03197	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== Three bug fixes that are fairly small either way but resolve obviously incorrect code. For net/3.11. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:41:10 -07:00
Dan Carpenter	fce9b9be89	rtnetlink: remove an unneeded test We know that "dev" is a valid pointer at this point, so we can remove the test and clean up a little. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:29:02 -07:00
Nicolas Dichtel	0bd8762824	ip6tnl: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Nicolas Dichtel	6c742e714d	ipip: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Nicolas Dichtel	fc8f999daa	ipv4 tunnels: use net_eq() helper to check netns It's better to use available helpers for these tests. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Nicolas Dichtel	64261f230a	dev: move skb_scrub_packet() after eth_type_trans() skb_scrub_packet() was called before eth_type_trans() to let eth_type_trans() set pkt_type. In fact, we should force pkt_type to PACKET_HOST, so move the call after eth_type_trans(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-15 01:00:20 -07:00
Jesse Gross	36bf5cc66d	openvswitch: Reset tunnel key between input and output. It doesn't make sense to output a tunnel packet using the same parameters that it was received with since that will generally just result in the packet going back to us. As a result, userspace assumes that the tunnel key is cleared when transitioning through the switch. In the majority of cases this doesn't matter since a packet is either going to a tunnel port (in which the key is overwritten with new values) or to a non-tunnel port (in which case the key is ignored). However, it's theoreticaly possible that userspace could rely on the documented behavior, so this corrects it. Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-14 15:50:36 -07:00
Pravin B Shelar	42415c90ce	openvswitch: Use correct type while allocating flex array. Flex array is used to allocate hash buckets which is type struct hlist_head, but we use `struct hlist_head *` to calculate array size. Since hlist_head is of size pointer it works fine. Following patch use correct type. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-14 15:48:17 -07:00
Jesse Gross	30444e981b	openvswitch: Fix bad merge resolution. git silently included an extra hunk in vport_cmd_set() during automatic merging. This code is unreachable so it does not actually introduce a problem but it is clearly incorrect. Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-14 15:48:02 -07:00
Johannes Berg	dee8a9732e	cfg80211: don't request disconnect if not connected Neil Brown reports that with libertas, my recent cfg80211 SME changes in commit `ceca7b7121` ("cfg80211: separate internal SME implementation") broke libertas suspend because it we now asked it to disconnect while already disconnected. The problematic change is in cfg80211_disconnect() as it previously checked the SME state and now calls the driver disconnect operation unconditionally. Fix this by checking if there's a current_bss indicating a connection, and do nothing if not. Reported-and-tested-by: Neil Brown <neilb@suse.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-14 14:00:19 +02:00
Dan Carpenter	cb35fba360	nl80211: nl80211hdr_put() doesn't return an ERR_PTR There are a few places which check nl80211hdr_put() for an ERR_PTR but actually it returns NULL on error and never error values. In nl80211_testmode_dump() the return wasn't checked at all so I have added one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> [some whitespace changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-14 14:00:12 +02:00
Hannes Frederic Sowa	0ea9d5e3e0	xfrm: introduce helper for safe determination of mtu skb->sk socket can be of AF_INET or AF_INET6 address family. Thus we always have to make sure we a referring to the correct interpretation of skb->sk. We only depend on header defines to query the mtu, so we don't introduce a new dependency to ipv6 by this change. Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-14 13:09:07 +02:00
Hannes Frederic Sowa	628e341f31	xfrm: make local error reporting more robust In xfrm4 and xfrm6 we need to take care about sockets of the other address family. This could happen because a 6in4 or 4in6 tunnel could get protected by ipsec. Because we don't want to have a run-time dependency on ipv6 when only using ipv4 xfrm we have to embed a pointer to the correct local_error function in xfrm_state_afinet and look it up when returning an error depending on the socket address family. Thanks to vi0ss for the great bug report: <https://bugzilla.kernel.org/show_bug.cgi?id=58691> v2: a) fix two more unsafe interpretations of skb->sk as ipv6 socket (xfrm6_local_dontfrag and __xfrm6_output) v3: a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when building ipv6 as a module (thanks to Steffen Klassert) Reported-by: <vi0oss@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-14 13:07:12 +02:00
Asbjoern Sloth Toennesen	3e805ad288	rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header Fix the iproute2 command `bridge vlan show`, after switching from rtgenmsg to ifinfomsg. Let's start with a little history: Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in the 3.9 merge window. In the kernel commit `6cbdceeb`, he added attribute support to bridge GETLINK requests sent with rtgenmsg. Mar 6th: Vlad got this iproute2 reference implementation of the bridge vlan netlink interface accepted (iproute2 9eff0e5c) Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca) http://patchwork.ozlabs.org/patch/239602/ http://marc.info/?t=136680900700007 Apr 28th: Linus released 3.9 Apr 30th: Stephen released iproute2 3.9.0 The `bridge vlan show` command haven't been working since the switch to ifinfomsg, or in a released version of iproute2. Since the kernel side only supports rtgenmsg, which iproute2 switched away from just prior to the iproute2 3.9.0 release. I haven't been able to find any documentation, about neither rtgenmsg nor ifinfomsg, and in which situation to use which, but kernel commit `88c5b5ce` seams to suggest that ifinfomsg should be used. Fixing this in kernel will break compatibility, but I doubt that anybody have been using it due to this bug in the user space reference implementation, at least not without noticing this bug. That said the functionality is still fully functional in 3.9, when reversing iproute2 commit 63338dca. This could also be fixed in iproute2, but thats an ugly patch that would reintroduce rtgenmsg in iproute2, and from searching in netdev it seams like rtgenmsg usage is discouraged. I'm assuming that the only reason that Vlad implemented the kernel side to use rtgenmsg, was because iproute2 was using it at the time. Signed-off-by: Asbjoern Sloth Toennesen <ast@fiberby.net> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 19:09:29 -07:00
Hannes Frederic Sowa	fc4eba58b4	ipv6: make unsolicited report intervals configurable for mld Commit `cab70040df` ("net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3") and `2690048c01` ("net: igmp: Allow user-space configuration of igmp unsolicited report interval") by William Manley made igmp unsolicited report intervals configurable per interface and corrected the interval of unsolicited igmpv3 report messages resendings to 1s. Same needs to be done for IPv6: MLDv1 (RFC2710 7.10.): 10 seconds MLDv2 (RFC3810 9.11.): 1 second Both intervals are configurable via new procfs knobs mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval. (also added .force_mld_version to ipv6_devconf_dflt to bring structs in line without semantic changes) v2: a) Joined documentation update for IPv4 and IPv6 MLD/IGMP unsolicited_report_interval procfs knobs. b) incorporate stylistic feedback from William Manley v3: a) add new DEVCONF_* values to the end of the enum (thanks to David Miller) Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: William Manley <william.manley@youview.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 17:05:04 -07:00
Pravin B Shelar	4221f40513	ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id. Using inner-id for tunnel id is not safe in some rare cases. E.g. packets coming from multiple sources entering same tunnel can have same id. Therefore on tunnel packet receive we could have packets from two different stream but with same source and dst IP with same ip-id which could confuse ip packet reassembly. Following patch reverts optimization from commit `490ab08127` (IP_GRE: Fix IP-Identification.) CC: Jarno Rajahalme <jrajahalme@nicira.com> CC: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 16:52:50 -07:00
Arron Wang	39525ee1dc	NFC: Update secure element state The secure element state was not updated from the enable/disable ops, leaving the SE state to disabled for ever. Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-08-14 01:13:40 +02:00
Arron Wang	2c3832834b	NFC: Fix secure element state check Another typo from the initial commit where we check for the secure element type field instead of its state when enabling or disabling it. Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-08-14 01:13:38 +02:00
Dan Carpenter	4eba11e82a	NFC: hci: Fix enable/disable confusion There is a cut and paste bug so we enable a second time instead of disabling. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-08-14 01:13:36 +02:00
Eric Lapuyade	352a5f5fb3	NFC: netlink: Add result of firmware operation to completion event Result is added as an NFC_ATTR_FIRMWARE_DOWNLOAD_STATUS attribute containing the standard errno positive value of the completion result. This event will be sent when the firmare download operation is done and will contain the operation result. Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-08-14 01:12:58 +02:00
Yuchung Cheng	74c181d528	tcp: reset reordering est. selectively on timeout On timeout the TCP sender unconditionally resets the estimated degree of network reordering (tp->reordering). The idea behind this is that the estimate is too large to trigger fast recovery (e.g., due to a IP path change). But for example if the sender only had 2 packets outstanding, then a timeout doesn't tell much about reordering. A sender that learns about reordering on big writes and loses packets on small writes will end up falsely retransmitting again and again, especially when reordering is more likely on big writes. Therefore the sender should only suspect that tp->reordering is too high if it could have gone into fast recovery with the (lower) default estimate. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 16:08:33 -07:00
Eric Lapuyade	ef04158e13	NFC: Move nfc_fw_download_done() definition from private to public This API must be called by NFC drivers, and its prototype was incorrectly placed. Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-08-14 01:08:01 +02:00
David S. Miller	98f1b7f382	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== This is a batch of updates intended for 3.12. It is mostly driver stuff, although Johannes Berg and Simon Wunderlich make a good showing with mac80211 bits (particularly some work on 5/10 MHz channel support). The usual suspects are mostly represented. There are lots of updates to iwlwifi, ath9k, ath10k, mwifiex, rt2x00, wil6210, as usual. The bcma bus gets some love this time, as do cw1200, iwl4965, and a few other bits here and there. I don't think there is much unusual here, FWIW. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 15:59:09 -07:00
Samuel Ortiz	ac22ac466a	NFC: Add a GET_SE netlink API In order to fetch the discovered secure elements from an NFC controller, we need to send a netlink command that will dump the list of available SEs from NFC. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-08-14 00:35:19 +02:00
Samuel Ortiz	369f4d503a	NFC: Fix SE discovery failure warning condition This is a typo coming from the initial implementation. se_discover fails when it returns something different than zero and we should only display a warning in that case. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-08-14 00:35:19 +02:00
Vlad Yasevich	072017b41e	net: sctp: Add rudimentary infrastructure to account for control chunks This patch adds a base infrastructure that allows SCTP to do memory accounting for control chunks. Real accounting code will follow. This patch alos fixes the following triggered bug ... [ 553.109742] kernel BUG at include/linux/skbuff.h:1813! [ 553.109766] invalid opcode: 0000 [#1] SMP [ 553.109789] Modules linked in: sctp libcrc32c rfcomm [...] [ 553.110259] uinput i915 i2c_algo_bit drm_kms_helper e1000e drm ptp pps_core i2c_core wmi video sunrpc [ 553.110320] CPU: 0 PID: 1636 Comm: lt-test_1_to_1_ Not tainted 3.11.0-rc3+ #2 [ 553.110350] Hardware name: LENOVO 74597D6/74597D6, BIOS 6DET60WW (3.10 ) 09/17/2009 [ 553.110381] task: ffff88020a01dd40 ti: ffff880204ed0000 task.ti: ffff880204ed0000 [ 553.110411] RIP: 0010:[<ffffffffa0698017>] [<ffffffffa0698017>] skb_orphan.part.9+0x4/0x6 [sctp] [ 553.110459] RSP: 0018:ffff880204ed1bb8 EFLAGS: 00010286 [ 553.110483] RAX: ffff8802086f5a40 RBX: ffff880204303300 RCX: 0000000000000000 [ 553.110487] RDX: ffff880204303c28 RSI: ffff8802086f5a40 RDI: ffff880202158000 [ 553.110487] RBP: ffff880204ed1bb8 R08: 0000000000000000 R09: 0000000000000000 [ 553.110487] R10: ffff88022f2d9a04 R11: ffff880233001600 R12: 0000000000000000 [ 553.110487] R13: ffff880204303c00 R14: ffff8802293d0000 R15: ffff880202158000 [ 553.110487] FS: 00007f31b31fe740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [ 553.110487] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 553.110487] CR2: 000000379980e3e0 CR3: 000000020d225000 CR4: 00000000000407f0 [ 553.110487] Stack: [ 553.110487] ffff880204ed1ca8 ffffffffa068d7fc 0000000000000000 0000000000000000 [ 553.110487] 0000000000000000 ffff8802293d0000 ffff880202158000 ffffffff81cb7900 [ 553.110487] 0000000000000000 0000400000001c68 ffff8802086f5a40 000000000000000f [ 553.110487] Call Trace: [ 553.110487] [<ffffffffa068d7fc>] sctp_sendmsg+0x6bc/0xc80 [sctp] [ 553.110487] [<ffffffff8128f185>] ? sock_has_perm+0x75/0x90 [ 553.110487] [<ffffffff815a3593>] inet_sendmsg+0x63/0xb0 [ 553.110487] [<ffffffff8128f2b3>] ? selinux_socket_sendmsg+0x23/0x30 [ 553.110487] [<ffffffff8151c5d6>] sock_sendmsg+0xa6/0xd0 [ 553.110487] [<ffffffff81637b05>] ? _raw_spin_unlock_bh+0x15/0x20 [ 553.110487] [<ffffffff8151cd38>] SYSC_sendto+0x128/0x180 [ 553.110487] [<ffffffff8151ce6b>] ? SYSC_connect+0xdb/0x100 [ 553.110487] [<ffffffffa0690031>] ? sctp_inet_listen+0x71/0x1f0 [sctp] [ 553.110487] [<ffffffff8151d35e>] SyS_sendto+0xe/0x10 [ 553.110487] [<ffffffff81640202>] system_call_fastpath+0x16/0x1b [ 553.110487] Code: e0 48 c7 c7 00 22 6a a0 e8 67 a3 f0 e0 48 c7 [...] [ 553.110487] RIP [<ffffffffa0698017>] skb_orphan.part.9+0x4/0x6 [sctp] [ 553.110487] RSP <ffff880204ed1bb8> [ 553.121578] ---[ end trace 46c20c5903ef5be2 ]--- The approach taken here is to split data and control chunks creation a bit. Data chunks already have memory accounting so noting needs to happen. For control chunks, add stubs handlers. Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 15:04:02 -07:00
Pablo Neira Ayuso	bd07793705	netfilter: nfnetlink_queue: allow to attach expectations to conntracks This patch adds the capability to attach expectations via nfnetlink_queue. This is required by conntrack helpers that trigger expectations based on the first packet seen like the TFTP and the DHCPv6 user-space helpers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-13 16:32:10 +02:00
Pablo Neira Ayuso	0ef71ee1a5	netfilter: ctnetlink: refactor ctnetlink_create_expect This patch refactors ctnetlink_create_expect by spliting it in two chunks. As a result, we have a new function ctnetlink_alloc_expect to allocate and to setup the expectation from ctnetlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-13 15:48:20 +02:00
Johannes Berg	58ad436fcf	genetlink: fix family dump race When dumping generic netlink families, only the first dump call is locked with genl_lock(), which protects the list of families, and thus subsequent calls can access the data without locking, racing against family addition/removal. This can cause a crash. Fix it - the locking needs to be conditional because the first time around it's already locked. A similar bug was reported to me on an old kernel (3.4.47) but the exact scenario that happened there is no longer possible, on those kernels the first round wasn't locked either. Looking at the current code I found the race described above, which had also existed on the old kernel. Cc: stable@vger.kernel.org Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 00:57:06 -07:00
Daniel Borkmann	771085d6bf	net: sctp: sctp_transport_destroy{, _rcu}: fix potential pointer corruption Probably this one is quite unlikely to be triggered, but it's more safe to do the call_rcu() at the end after we have dropped the reference on the asoc and freed sctp packet chunks. The reason why is because in sctp_transport_destroy_rcu() the transport is being kfree()'d, and if we're unlucky enough we could run into corrupted pointers. Probably that's more of theoretical nature, but it's safer to have this simple fix. Introduced by commit `8c98653f` ("sctp: sctp_close: fix release of bindings for deferred call_rcu's"). I also did the `8c98653f` regression test and it's fine that way. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-12 22:13:47 -07:00
Daniel Borkmann	ac4f959936	net: sctp: sctp_assoc_control_transport: fix MTU size in SCTP_PF state The SCTP Quick failover draft [1] section 5.1, point 5 says that the cwnd should be 1 MTU. So, instead of 1, set it to 1 MTU. [1] https://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 Reported-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-12 22:12:20 -07:00
John W. Linville	89c2af3c14	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/ethernet/broadcom/Kconfig	2013-08-12 14:45:06 -04:00
Antonio Quartulli	d8eb741eb3	mac80211: ibss - do not scan if not needed when creating an IBSS In some cases mac80211 will scan before creating an IBSS even if bssid and frequency have been forced by the user. This is not needed and leads only to a delay in the IBSS establishment phase. Immediately create the cell if both bssid and frequency (and fixed_freq is set) have been specified. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-12 18:08:34 +02:00
David Spinadel	52981cd794	mac80211: add vif to testmode cmd Pass the wdev from cfg80211 on to the driver as the vif if given and it's valid for the driver. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-12 14:11:42 +02:00
David Spinadel	fc73f11f5f	cfg80211: add wdev to testmode cmd To allow drivers to implement per-interface testmode operations more easily, pass a wdev pointer if any identification for one was given from userspace. Clean up the code a bit while at it. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-12 14:11:37 +02:00
Johannes Berg	af61a16518	mac80211: add control port protocol TX control flag A lot of drivers check the frame protocol for ETH_P_PAE, for various reasons (like making those more reliable). Add a new flags bitmap to the TX control info and a new flag indicating the control port protocol is in use to let all drivers also apply such logic to other control port protocols, should they be configured. Also use the new flag in the iwlwifi drivers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-12 14:09:29 +02:00
Johannes Berg	1da5fcc86d	nl80211: clean up CQM settings code Clean up the CQM settings code a bit and while at it enforce that when setting the threshold to 0 (disable) the hysteresis is also set to 0 to avoid confusion. As we haven't enforce it, simply override userspace. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-12 14:08:00 +02:00
Eric Dumazet	f3dfd20860	af_unix: fix bug on large send() commit `e370a72363` ("af_unix: improve STREAM behavior with fragmented memory") added a bug on large send() because the skb_copy_datagram_from_iovec() call always start from the beginning of iovec. We must instead use the @sent variable to properly skip the already processed part. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-11 22:02:36 -07:00
dingtianhong	d4cca39d90	tipc: avoid possible deadlock while enable and disable bearer We met lockdep warning when enable and disable the bearer for commands such as: tipc-config -netid=1234 -addr=1.1.3 -be=eth:eth0 tipc-config -netid=1234 -addr=1.1.3 -bd=eth:eth0 --------------------------------------------------- [ 327.693595] ====================================================== [ 327.693994] [ INFO: possible circular locking dependency detected ] [ 327.694519] 3.11.0-rc3-wwd-default #4 Tainted: G O [ 327.694882] ------------------------------------------------------- [ 327.695385] tipc-config/5825 is trying to acquire lock: [ 327.695754] (((timer))#2){+.-...}, at: [<ffffffff8105be80>] del_timer_sync+0x0/0xd0 [ 327.696018] [ 327.696018] but task is already holding lock: [ 327.696018] (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+ 0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] which lock already depends on the new lock. [ 327.696018] [ 327.696018] [ 327.696018] the existing dependency chain (in reverse order) is: [ 327.696018] [ 327.696018] -> #1 (&(&b_ptr->lock)->rlock){+.-...}: [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff814d65b1>] _raw_spin_lock_bh+0x41/0x80 [ 327.696018] [<ffffffffa02c5d48>] disc_timeout+0x18/0xd0 [tipc] [ 327.696018] [<ffffffff8105b92a>] call_timer_fn+0xda/0x1e0 [ 327.696018] [<ffffffff8105bcd7>] run_timer_softirq+0x2a7/0x2d0 [ 327.696018] [<ffffffff8105379a>] __do_softirq+0x16a/0x2e0 [ 327.696018] [<ffffffff81053a35>] irq_exit+0xd5/0xe0 [ 327.696018] [<ffffffff81033005>] smp_apic_timer_interrupt+0x45/0x60 [ 327.696018] [<ffffffff814df4af>] apic_timer_interrupt+0x6f/0x80 [ 327.696018] [<ffffffff8100b70e>] arch_cpu_idle+0x1e/0x30 [ 327.696018] [<ffffffff810a039d>] cpu_idle_loop+0x1fd/0x280 [ 327.696018] [<ffffffff810a043e>] cpu_startup_entry+0x1e/0x20 [ 327.696018] [<ffffffff81031589>] start_secondary+0x89/0x90 [ 327.696018] [ 327.696018] -> #0 (((timer))#2){+.-...}: [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b [ 327.696018] [ 327.696018] other info that might help us debug this: [ 327.696018] [ 327.696018] Possible unsafe locking scenario: [ 327.696018] [ 327.696018] CPU0 CPU1 [ 327.696018] ---- ---- [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] [ 327.696018] * DEADLOCK * [ 327.696018] [ 327.696018] 5 locks held by tipc-config/5825: [ 327.696018] #0: (cb_lock){++++++}, at: [<ffffffff8143e608>] genl_rcv+0x18/0x40 [ 327.696018] #1: (genl_mutex){+.+.+.}, at: [<ffffffff8143ed66>] genl_rcv_msg+0xa6/0xd0 [ 327.696018] #2: (config_mutex){+.+.+.}, at: [<ffffffffa02bf889>] tipc_cfg_do_cmd+0x39/ 0x550 [tipc] [ 327.696018] #3: (tipc_net_lock){++.-..}, at: [<ffffffffa02be738>] tipc_disable_bearer+ 0x18/0x60 [tipc] [ 327.696018] #4: (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] stack backtrace: [ 327.696018] CPU: 2 PID: 5825 Comm: tipc-config Tainted: G O 3.11.0-rc3-wwd- default #4 [ 327.696018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 327.696018] 00000000ffffffff ffff880037fa77a8 ffffffff814d03dd 0000000000000000 [ 327.696018] ffff880037fa7808 ffff880037fa77e8 ffffffff810b1c4f 0000000037fa77e8 [ 327.696018] ffff880037fa7808 ffff880037e4db40 0000000000000000 ffff880037e4e318 [ 327.696018] Call Trace: [ 327.696018] [<ffffffff814d03dd>] dump_stack+0x4d/0xa0 [ 327.696018] [<ffffffff810b1c4f>] print_circular_bug+0x10f/0x120 [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff81087a28>] ? sched_clock_cpu+0xd8/0x110 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffff81218783>] ? security_capable+0x13/0x20 [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143ecc0>] ? genl_lock+0x20/0x20 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e608>] ? genl_rcv+0x18/0x40 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff81289d7c>] ? memcpy_fromiovec+0x6c/0x90 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff813fe29c>] ? release_sock+0x8c/0xa0 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fa24>] ? rw_verify_area+0x54/0x100 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b ----------------------------------------------------------------------- The problem is that the tipc_link_delete() will cancel the timer disc_timeout() when the b_ptr->lock is hold, but the disc_timeout() still call b_ptr->lock to finish the work, so the dead lock occurs. We should unlock the b_ptr->lock when del the disc_timeout(). Remove link_timeout() still met the same problem, the patch: http://article.gmane.org/gmane.network.tipc.general/4380 fix the problem, so no need to send patch for fix link_timeout() deadlock warming. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-11 21:58:41 -07:00
David S. Miller	e5ac5da807	Included change: - reassign pointers to data after skb reallocation to avoid kernel paging errors -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQIcBAABCAAGBQJSBqldAAoJEADl0hg6qKeO9eQQAKUod+Or+ldfBU5b6OvkDFAr FjSMxVgGS/YFSrD81WLnIGTW/ujz8VrVM7zrUrKJZfsVtBJw5/6hTTV5M/vl6FqQ k+q6L6R1EqR9qRxqFhxD4u6+Zscc3+2KvS2rAqL8PQmJq0STn4h+Lk3J38DrKF9l dBn28FUcX4ON/D5uI0AZU/iM/OxTgWLxhYhFbSjxPnAhJEyIii+D6yrBLNN1SZDx McAKvJ78Tf9rEWt11KBDs6uNtEOSFy0v6hJVkZEiX941rkGOiShzPcU8FCICEaie EGQisQvVYho+ABHofHcMUkMvXwrtLT0WgiIbxRKS9DJdMBOuu99v/zX2T0thZLUK l2GvQZQ6/hS/iMNSQvXhxemyZoAxvjxkSBMwrK/BVv5XLUM+sOo+Og4/ADZviPaY suFwbCDtdr4jzeqBdVxn6mZGosVLtaNM74xpc+cxm7BwmeB2XL22S3sR2WRwsurU AV+bU2NUpRWGg+rsmQCyUrfD/mGZuiyR12Z7BTYA4KoDPkeh/fpGJZS77aRTgSm5 kIXl1aCtRdw31mebdjrNU4Cp6oOnQ3A2jzb8nKoiBUiKPkEZ16h8QB5ktLjsYrKL VFQChSWHUwlgzS3xBu3BVQOZ8WwlEV6z6D3TX9RAlhQqjGbk9OujbQJjmBVwLQdH wWPzO9LSQ9cLYFqAnBof =QkbK -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included change: - reassign pointers to data after skb reallocation to avoid kernel paging errors Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-10 15:38:59 -07:00
Linus Lüssing	9d2c9488ce	batman-adv: fix potential kernel paging errors for unicast transmissions There are several functions which might reallocate skb data. Currently some places keep reusing their old ethhdr pointer regardless of whether they became invalid after such a reallocation or not. This potentially leads to kernel paging errors. This patch fixes these by refetching the ethdr pointer after the potential reallocations. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-10 22:55:42 +02:00
David S. Miller	4209423c29	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains four netfilter fixes, they are: * Fix possible invalid access and mangling of the TCPMSS option in xt_TCPMSS. This was spotted by Julian Anastasov. * Fix possible off by one access and mangling of the TCP packet in xt_TCPOPTSTRIP, also spotted by Julian Anastasov. * Fix possible information leak due to missing initialization of one padding field of several structures that are included in nfqueue and nflog netlink messages, from Dan Carpenter. * Fix TCP window tracking with Fast Open, from Yuchung Cheng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-10 13:44:22 -07:00
Yuchung Cheng	356d7d88e0	netfilter: nf_conntrack: fix tcp_in_window for Fast Open Currently the conntrack checks if the ending sequence of a packet falls within the observed receive window. However it does so even if it has not observe any packet from the remote yet and uses an uninitialized receive window (td_maxwin). If a connection uses Fast Open to send a SYN-data packet which is dropped afterward in the network. The subsequent SYNs retransmits will all fail this check and be discarded, leading to a connection timeout. This is because the SYN retransmit does not contain data payload so end == initial sequence number (isn) + 1 sender->td_end == isn + syn_data_len receiver->td_maxwin == 0 The fix is to only apply this check after td_maxwin is initialized. Reported-by: Michael Chan <mcfchan@stanford.edu> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-10 18:36:22 +02:00
Sridhar Samudrala	6453599302	rtnetlink: Fix inverted check in ndo_dflt_fdb_del() Fix inverted check when deleting an fdb entry. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-10 01:24:12 -07:00
Eric Dumazet	28d6427109	net: attempt high order allocations in sock_alloc_send_pskb() Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-10 01:16:44 -07:00
Eric Dumazet	e370a72363	af_unix: improve STREAM behavior with fragmented memory unix_stream_sendmsg() currently uses order-2 allocations, and we had numerous reports this can fail. The __GFP_REPEAT flag present in sock_alloc_send_pskb() is not helping. This patch extends the work done in commit `eb6a24816b` ("af_unix: reduce high order page allocations) for datagram sockets. This opens the possibility of zero copy IO (splice() and friends) The trick is to not use skb_pull() anymore in recvmsg() path, and instead add a @consumed field in UNIXCB() to track amount of already read payload in the skb. There is a performance regression for large sends because of extra page allocations that will be addressed in a follow-up patch, allowing sock_alloc_send_pskb() to attempt high order page allocations. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-10 01:16:44 -07:00
Yuchung Cheng	149479d019	tcp: add server ip to encrypt cookie in fast open Encrypt the cookie with both server and client IPv4 addresses, such that multi-homed server will grant different cookies based on both the source and destination IPs. No client change is needed since cookie is opaque to the client. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-10 00:35:33 -07:00
David S. Miller	71acc0ddd4	Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl" This reverts commit `cda5f98e36`. As per Vlad's request. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 13:09:41 -07:00
John W. Linville	fa5978447c	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2013-08-09 15:08:10 -04:00
John W. Linville	4f05444892	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2013-08-09 15:06:28 -04:00
Eliezer Tamir	288a937637	net: rename busy poll MIB counter Rename mib counter from "low latency" to "busy poll" v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer) Eric Dumazet suggested that the current location is better. So v2 just renames the counter to fit the new naming convention. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:39:08 -07:00
Daniel Borkmann	477143e3fe	net: sctp: trivial: update bug report in header comment With the restructuring of the lksctp.org site, we only allow bug reports through the SCTP mailing list linux-sctp@vger.kernel.org, not via SF, as SF is only used for web hosting and nothing more. While at it, also remove the obvious statement that bugs will be fixed and incooperated into the kernel. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:33:02 -07:00
Daniel Borkmann	cda5f98e36	net: sctp: convert sctp_checksum_disable module param into sctp sysctl Get rid of the last module parameter for SCTP and make this configurable via sysctl for SCTP like all the rest of SCTP's configuration knobs. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:33:02 -07:00
William Manley	2690048c01	net: igmp: Allow user-space configuration of igmp unsolicited report interval Adds the new procfs knobs: /proc/sys/net/ipv4/conf//igmpv2_unsolicited_report_interval /proc/sys/net/ipv4/conf//igmpv3_unsolicited_report_interval Which will allow userspace configuration of the IGMP unsolicited report interval (see below) in milliseconds. The defaults are 10000ms for IGMPv2 and 1000ms for IGMPv3 in accordance with RFC2236 and RFC3376. Background: If an IGMP join packet is lost you will not receive data sent to the multicast group so if no data arrives from that multicast group in a period of time after the IGMP join a second IGMP join will be sent. The delay between joins is the "IGMP Unsolicited Report Interval". Prior to this patch this value was hard coded in the kernel to 10s for IGMPv2 and 1s for IGMPv3. 10s is unsuitable for some use-cases, such as IPTV as it can cause channel change to be slow in the presence of packet loss. This patch allows the value to be overridden from userspace for both IGMPv2 and IGMPv3 such that it can be tuned accoding to the network. Tested with Wireshark and a simple program to join a (non-existent) multicast group. The distribution of timings for the second join differ based upon setting the procfs knobs. igmpvX_unsolicited_report_interval is intended to follow the pattern established by force_igmp_version, and while a procfs entry has been added a corresponding sysctl knob has not as it is my understanding that sysctl is deprecated[1]. [1]: http://lwn.net/Articles/247243/ Signed-off-by: William Manley <william.manley@youview.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:27:46 -07:00
William Manley	5c6fe01c1f	net: igmp: Don't flush routing cache when force_igmp_version is modified The procfs knob /proc/sys/net/ipv4/conf/*/force_igmp_version allows the IGMP protocol version to use to be explicitly set. As a side effect this caused the routing cache to be flushed as it was declared as a DEVINET_SYSCTL_FLUSHING_ENTRY. Flushing is unnecessary and this patch makes it so flushing does not occur. Requested by Hannes Frederic Sowa as he was reviewing other patches adding procfs entries. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: William Manley <william.manley@youview.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:27:46 -07:00
William Manley	cab70040df	net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3 If an IGMP join packet is lost you will not receive data sent to the multicast group so if no data arrives from that multicast group in a period of time after the IGMP join a second IGMP join will be sent. The delay between joins is the "IGMP Unsolicited Report Interval". Previously this value was hard coded to be chosen randomly between 0-10s. This can be too long for some use-cases, such as IPTV as it can cause channel change to be slow in the presence of packet loss. The value 10s has come from IGMPv2 RFC2236, which was reduced to 1s in IGMPv3 RFC3376. This patch makes the kernel use the 1s value from the later RFC if we are operating in IGMPv3 mode. IGMPv2 behaviour is unaffected. Tested with Wireshark and a simple program to join a (non-existent) multicast group. The distribution of timings for the second join differ based upon setting /proc/sys/net/ipv4/conf/eth0/force_igmp_version. Signed-off-by: William Manley <william.manley@youview.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:27:46 -07:00
Eric Dumazet	e11aada32b	net: flow_dissector: add 802.1ad support Same behavior than 802.1q : finds the encapsulated protocol and skip 32bit header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:06:23 -07:00
Timo Teräs	77a482bdb2	ip_gre: fix ipgre_header to return correct offset Fix ipgre_header() (header_ops->create) to return the correct amount of bytes pushed. Most callers of dev_hard_header() seem to care only if it was success, but af_packet.c uses it as offset to the skb to copy from userspace only once. In practice this fixes packet socket sendto()/sendmsg() to gre tunnels. Regression introduced in `c544193214` ("GRE: Refactor GRE tunneling code.") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:05:24 -07:00
Simon Wunderlich	ab1e8ad3b4	mac80211: fix ieee80211_sta_process_chanswitch for 5/10 MHz channels Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-09 15:16:39 +02:00
Chun-Yeow Yeoh	0670307992	mac80211: allow lowest basic rate for unicast management for mesh Allow lowest basic rate to be used for unicast management frame in mesh. Otherwise, the lowest supported rate is used for unicast management frame, such as 1Mbps for 2.4GHz and 6Mbps for 5GHz. Rename the rc_send_low_broadcast to re_send_low_basicrate since now it is also applied to unicast management frame in mesh. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-09 15:11:27 +02:00
Florian Westphal	c655bc6896	netfilter: nf_conntrack: don't send destroy events from iterator Let nf_ct_delete handle delivery of the DESTROY event. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-09 12:03:33 +02:00
Eric Dumazet	1f07d03e20	net: add SNMP counters tracking incoming ECN bits With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP counters do not count the number of frames, but number of aggregated segments. Its probably too late to change this now. This patch adds four new counters, tracking number of frames, regardless of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6. Ip[6]NoECTPkts : Number of packets received with NOECT Ip[6]ECT1Pkts : Number of packets received with ECT(1) Ip[6]ECT0Pkts : Number of packets received with ECT(0) Ip[6]CEPkts : Number of packets received with Congestion Experienced lph37:~# nstat \| egrep "Pkts\|InReceive" IpInReceives 1634137 0.0 Ip6InReceives 3714107 0.0 Ip6InNoECTPkts 19205 0.0 Ip6InECT0Pkts 52651828 0.0 IpExtInNoECTPkts 33630 0.0 IpExtInECT0Pkts 15581379 0.0 IpExtInCEPkts 6 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-08 22:24:59 -07:00
Tejun Heo	d99c8727e7	cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroup cgroup is in the process of converting to css (cgroup_subsys_state) from cgroup as the principal subsystem interface handle. This is mostly to prepare for the unified hierarchy support where css's will be created and destroyed dynamically but also helps cleaning up subsystem implementations as css is usually what they are interested in anyway. cgroup_taskset which is used by the subsystem attach methods is the last cgroup subsystem API which isn't using css as the handle. Update cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp. The conversions are pretty mechanical. One exception is cpuset::cgroup_cs(), which lost its last user and got removed. This patch shouldn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-08-08 20:11:27 -04:00
Tejun Heo	182446d087	cgroup: pass around cgroup_subsys_state instead of cgroup in file methods cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup. Please see the previous commit which converts the subsystem methods for rationale. This patch converts all cftype file operations to take @css instead of @cgroup. cftypes for the cgroup core files don't have their subsytem pointer set. These will automatically use the dummy_css added by the previous patch and can be converted the same way. Most subsystem conversions are straight forwards but there are some interesting ones. * freezer: update_if_frozen() is also converted to take @css instead of @cgroup for consistency. This will make the code look simpler too once iterators are converted to use css. * memory/vmpressure: mem_cgroup_from_css() needs to be exported to vmpressure while mem_cgroup_from_cont() can be made static. Updated accordingly. * cpu: cgroup_tg() doesn't have any user left. Removed. * cpuacct: cgroup_ca() doesn't have any user left. Removed. * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left. Removed. * net_cls: cgrp_cls_state() doesn't have any user left. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-08-08 20:11:24 -04:00
Tejun Heo	eb95419b02	cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup * in subsystem implementations for the following reasons. * With unified hierarchy, subsystems will be dynamically bound and unbound from cgroups and thus css's (cgroup_subsys_state) may be created and destroyed dynamically over the lifetime of a cgroup, which is different from the current state where all css's are allocated and destroyed together with the associated cgroup. This in turn means that cgroup_css() should be synchronized and may return NULL, making it more cumbersome to use. * Differing levels of per-subsystem granularity in the unified hierarchy means that the task and descendant iterators should behave differently depending on the specific subsystem the iteration is being performed for. * In majority of the cases, subsystems only care about its part in the cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods often obtain the matching css pointer from the cgroup and don't bother with the cgroup pointer itself. Passing around css fits much better. This patch converts all cgroup_subsys methods to take @css instead of @cgroup. The conversions are mostly straight-forward. A few noteworthy changes are * ->css_alloc() now takes css of the parent cgroup rather than the pointer to the new cgroup as the css for the new cgroup doesn't exist yet. Knowing the parent css is enough for all the existing subsystems. * In kernel/cgroup.c::offline_css(), unnecessary open coded css dereference is replaced with local variable access. This patch shouldn't cause any behavior differences. v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced with local variable @css as suggested by Li Zefan. Rebased on top of new for-3.12 which includes for-3.11-fixes so that ->css_free() invocation added by `da0a12caff` ("cgroup: fix a leak when percpu_ref_init() fails") is converted too. Suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-08-08 20:11:23 -04:00
Tejun Heo	6387698699	cgroup: add css_parent() Currently, controllers have to explicitly follow the cgroup hierarchy to find the parent of a given css. cgroup is moving towards using cgroup_subsys_state as the main controller interface construct, so let's provide a way to climb the hierarchy using just csses. This patch implements css_parent() which, given a css, returns its parent. The function is guarnateed to valid non-NULL parent css as long as the target css is not at the top of the hierarchy. freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices are converted to use css_parent() instead of accessing cgroup->parent directly. * __parent_ca() is dropped from cpuacct and its usage is replaced with parent_ca(). The only difference between the two was NULL test on cgroup->parent which is now embedded in css_parent() making the distinction moot. Note that eventually a css->parent field will be added to css and the NULL check in css_parent() will go away. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-08-08 20:11:23 -04:00
Tejun Heo	a7c6d554aa	cgroup: add/update accessors which obtain subsys specific data from css css (cgroup_subsys_state) is usually embedded in a subsys specific data structure. Subsystems either use container_of() directly to cast from css to such data structure or has an accessor function wrapping such cast. As cgroup as whole is moving towards using css as the main interface handle, add and update such accessors to ease dealing with css's. All accessors explicitly handle NULL input and return NULL in those cases. While this looks like an extra branch in the code, as all controllers specific data structures have css as the first field, the casting doesn't involve any offsetting and the compiler can trivially optimize out the branch. * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such accessor. Added. * memory, hugetlb and devices already had one but didn't explicitly handle NULL input. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-08-08 20:11:23 -04:00
Tejun Heo	6d37b97428	netprio_cgroup: pass around @css instead of @cgroup and kill struct cgroup_netprio_state cgroup controller API will be converted to primarily use struct cgroup_subsys_state instead of struct cgroup. In preparation, make the internal functions of netprio_cgroup pass around @css instead of @cgrp. While at it, kill struct cgroup_netprio_state which only contained struct cgroup_subsys_state without serving any purpose. All functions are converted to deal with @css directly. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net>	2013-08-08 20:11:22 -04:00
Tejun Heo	8af01f56a0	cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-08-08 20:11:22 -04:00
John W. Linville	1826ff2357	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-08-08 13:12:42 -04:00
Hannes Frederic Sowa	3e3be27585	ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match In case a subtree did not match we currently stop backtracking and return NULL (root table from fib_lookup). This could yield in invalid routing table lookups when using subtrees. Instead continue to backtrack until a valid subtree or node is found and return this match. Also remove unneeded NULL check. Reported-by: Teco Boot <teco@inf-net.nl> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Lamparter <equinox@diac24.net> Cc: <boutier@pps.univ-paris-diderot.fr> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 17:17:19 -07:00
David S. Miller	09effa67a1	packet: Revert recent header parsing changes. This reverts commits: `0f75b09c79` `cbd89acb9e` `c483e02614` Amongst other things, it's modifies the SKB header to pull the ethernet headers off via eth_type_trans() on the output path which is bogus. It's causing serious regressions for people. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 17:11:00 -07:00
Jason Wang	3d9953a2ef	net: use skb_copy_datagram_from_iovec() in zerocopy_sg_from_iovec() Use skb_copy_datagram_from_iovec() to avoid code duplication and make it easy to be read. Also we can do the skipping inside the zero-copy loop. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 16:52:38 -07:00
Jason Wang	0433547aa7	net: use release_pages() in zerocopy_sg_from_iovec() To reduce the duplicated codes. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 16:52:38 -07:00
Jason Wang	234a426708	net: remove the useless comment in zerocopy_sg_from_iovec() Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 16:52:37 -07:00
Jason Wang	0a57ec62df	net: use skb_fill_page_desc() in zerocopy_sg_from_iovec() Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 16:52:35 -07:00
Jason Wang	c3bdeb5c7c	net: move zerocopy_sg_from_iovec() to net/core/datagram.c To let it be reused and reduce code duplication. Also document this function. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 16:52:34 -07:00
Jason Wang	b4bf07771f	net: move iov_pages() to net/core/iovec.c To let it be reused and reduce code duplication. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 16:52:33 -07:00
stephen hemminger	6261d983f2	ip_tunnel: embed hash list head The IP tunnel hash heads can be embedded in the per-net structure since it is a fixed size. Reduce the size so that the total structure fits in a page size. The original size was overly large, even NETDEV_HASHBITS is only 8 bits! Also, add some white space for readability. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Pravin B Shelar <pshelar@nicira.com>. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 16:47:52 -07:00
Trond Myklebust	786615bc1c	SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister If rpcbind causes our connection to the AF_LOCAL socket to close after we've registered a service, then we want to be careful about reconnecting since the mount namespace may have changed. By simply refusing to reconnect the AF_LOCAL socket in the case of unregister, we avoid the need to somehow save the mount namespace. While this may lead to some services not unregistering properly, it should be safe. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Nix <nix@esperi.org.uk> Cc: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.9.x	2013-08-07 17:07:18 -04:00
Eric Dumazet	cd6b423afd	tcp: cubic: fix bug in bictcp_acked() While investigating about strange increase of retransmit rates on hosts ~24 days after boot, Van found hystart was disabled if ca->epoch_start was 0, as following condition is true when tcp_time_stamp high order bit is set. (s32)(tcp_time_stamp - ca->epoch_start) < HZ Quoting Van : At initialization & after every loss ca->epoch_start is set to zero so I believe that the above line will turn off hystart as soon as the 2^31 bit is set in tcp_time_stamp & hystart will stay off for 24 days. I think we've observed that cubic's restart is too aggressive without hystart so this might account for the higher drop rate we observe. Diagnosed-by: Van Jacobson <vanj@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 10:35:08 -07:00
Wang Sheng-Hui	15401946f9	bridge: correct the comment for file br_sysfs_br.c br_sysfs_if.c is for sysfs attributes of bridge ports, while br_sysfs_br.c is for sysfs attributes of bridge itself. Correct the comment here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 10:35:06 -07:00
Eric Dumazet	2ed0edf909	tcp: cubic: fix overflow error in bictcp_update() commit `17a6e9f1aa` ("tcp_cubic: fix clock dependency") added an overflow error in bictcp_update() in following code : /* change the unit from HZ to bictcp_HZ */ t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) - ca->epoch_start) << BICTCP_HZ) / HZ; Because msecs_to_jiffies() being unsigned long, compiler does implicit type promotion. We really want to constrain (tcp_time_stamp - ca->epoch_start) to a signed 32bit value, or else 't' has unexpected high values. This bugs triggers an increase of retransmit rates ~24 days after boot [1], as the high order bit of tcp_time_stamp flips. [1] for hosts with HZ=1000 Big thanks to Van Jacobson for spotting this problem. Diagnosed-by: Van Jacobson <vanj@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 10:35:05 -07:00
Fan Du	af83fde751	xfrm: Remove rebundant address family checking present_and_same_family has checked addresses family validness for both SADB_EXT_ADDRESS_SRC and SADB_EXT_ADDRESS_DST in the beginning. Thereafter pfkey_sadb_addr2xfrm_addr doesn't need to do the checking again. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-07 10:12:58 +02:00
Daniel Borkmann	54e35cc523	ipvs: ip_vs_sh: ip_vs_sh_get_port: check skb_header_pointer for NULL skb_header_pointer could return NULL, so check for it as we do it everywhere else in ipvs code. This fixes a coverity warning. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-08-07 08:57:57 +09:00
Johannes Berg	e7f1935c11	wireless: make TU conversion macros available A few places in the code (mac80211 and iwlmvm) use the same TU_TO_JIFFIES() macro and could use TU_TO_EXP_TIME() that mac80211 has. Make these available to everyone and use them. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-06 11:00:59 +02:00
Dragos Foianu	70e3ca79cd	ipvs: fixed spacing at for statements found using checkpatch.pl Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-08-06 15:09:58 +09:00
Fan Du	0659eea912	xfrm: Delete hold_timer when destroy policy Both policy timer and hold_timer need to be deleted when destroy policy Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-06 06:59:18 +02:00
Trond Myklebust	00326ed644	SUNRPC: Don't auto-disconnect from the local rpcbind socket There is no need for the kernel to time out the AF_LOCAL connection to the rpcbind socket, and doing so is problematic because when it is time to reconnect, our process may no longer be using the same mount namespace. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.9.x	2013-08-05 22:17:00 -04:00
Linus Lüssing	248ba8ec05	bridge: don't try to update timers in case of broken MLD queries Currently we are reading an uninitialized value for the max_delay variable when snooping an MLD query message of invalid length and would update our timers with that. Fixing this by simply ignoring such broken MLD queries (just like we do for IGMP already). This is a regression introduced by: "bridge: disable snooping if there is no querier" (`b00589af3b`) Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 15:43:39 -07:00
Eric Dumazet	aab515d7c3	fib_trie: remove potential out of bound access AddressSanitizer [1] dynamic checker pointed a potential out of bound access in leaf_walk_rcu() We could allocate one more slot in tnode_new() to leave the prefetch() in-place but it looks not worth the pain. Bug added in commit `82cfbb0085` ("[IPV4] fib_trie: iterator recode") [1] : https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 15:26:11 -07:00
Veaceslav Falico	fc7f8f5c53	neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 15:19:04 -07:00
Daniel Borkmann	7921895a5e	net: esp{4,6}: fix potential MTU calculation overflows Commit `91657eafb` ("xfrm: take net hdr len into account for esp payload size calculation") introduced a possible interger overflow in esp{4,6}_get_mtu() handlers in case of x->props.mode equals XFRM_MODE_TUNNEL. Thus, the following expression will overflow unsigned int net_adj; ... <case ipv{4,6} XFRM_MODE_TUNNEL> net_adj = 0; ... return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) - net_adj) & ~(align - 1)) + (net_adj - 2); where (net_adj - 2) would be evaluated as <foo> + (0 - 2) in an unsigned context. Fix it by simply removing brackets as those operations here do not need to have special precedence. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Benjamin Poirier <bpoirier@suse.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 12:26:50 -07:00
nikolay@redhat.com	07ce76aa9b	net_sched: make dev_trans_start return vlan's real dev trans_start Vlan devices are LLTX and don't update their own trans_start, so if dev_trans_start has to be called with a vlan device then 0 or a stale value will be returned. Currently the bonding is the only such user, and it's needed for proper arp monitoring when the slaves are vlans. Fix this by extracting the vlan's real device trans_start. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 12:17:42 -07:00
nikolay@redhat.com	0369722f02	vlan: make vlan_dev_real_dev work over stacked vlans Sometimes we might have stacked vlans on top of each other, and we're interested in the first non-vlan real device on the path, so transform vlan_dev_real_dev to go over the stacked vlans and extract the first non-vlan device. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 12:17:42 -07:00
Julia Lawall	d9af2d67e4	net/vmw_vsock/af_vsock.c: drop unneeded semicolon Drop the semicolon at the end of the list_for_each_entry loop header. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 11:07:44 -07:00
Dan Carpenter	e4d091d7bf	netfilter: nfnetlink_{log,queue}: fix information leaks in netlink message These structs have a "_pad" member. Also the "phw" structs have an 8 byte "hw_addr[]" array but sometimes only the first 6 bytes are initialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-05 17:36:04 +02:00
Florian Westphal	d8b3bfc253	netfilter: tproxy: fix build with IP6_NF_IPTABLES=n after commit `93742cf` (netfilter: tproxy: remove nf_tproxy_core.h) CONFIG_IPV6=y CONFIG_IP6_NF_IPTABLES=n gives us: net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6': net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-05 12:57:38 +02:00
Mathias Krause	8603b9556e	af_key: constify lookup tables The lookup tables for minimum sizes of extensions and for the pfkey handler functions are read only, therefore can be const. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-05 11:14:00 +02:00
Mathias Krause	e473fcb472	xfrm: constify mark argument of xfrm_find_acq() The mark argument is read only, so constify it. Also make dummy_mark in af_key const -- only used as dummy argument for this very function. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-05 11:13:53 +02:00
stephen hemminger	762a3d89eb	bridge: fix rcu check warning in multicast port group Use of RCU here with out marked pointer and function doesn't match prototype with sparse. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-04 18:41:52 -07:00
David S. Miller	0e76a3a587	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-03 21:36:46 -07:00
Linus Torvalds	72a67a94bc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Don't ignore user initiated wireless regulatory settings on cards with custom regulatory domains, from Arik Nemtsov. 2) Fix length check of bluetooth information responses, from Jaganath Kanakkassery. 3) Fix misuse of PTR_ERR in btusb, from Adam Lee. 4) Handle rfkill properly while iwlwifi devices are offline, from Emmanuel Grumbach. 5) Fix r815x devices DMA'ing to stack buffers, from Hayes Wang. 6) Kernel info leak in ATM packet scheduler, from Dan Carpenter. 7) 8139cp doesn't check for DMA mapping errors, from Neil Horman. 8) Fix bridge multicast code to not snoop when no querier exists, otherwise mutlicast traffic is lost. From Linus Lüssing. 9) Avoid soft lockups in fib6_run_gc(), from Michal Kubecek. 10) Fix races in automatic address asignment on ipv6, which can result in incorrect lifetime assignments. From Jiri Benc. 11) Cure build bustage when CONFIG_NET_LL_RX_POLL is not set and rename it CONFIG_NET_RX_BUSY_POLL to eliminate the last reference to the original naming of this feature. From Cong Wang. 12) Fix crash in TIPC when server socket creation fails, from Ying Xue. 13) macvlan_changelink() silently succeeds when it shouldn't, from Michael S Tsirkin. 14) HTB packet scheduler can crash due to sign extension, fix from Stephen Hemminger. 15) With the cable unplugged, r8169 prints out a message every 10 seconds, make it netif_dbg() instead of netif_warn(). From Peter Wu. 16) Fix memory leak in rtm_to_ifaddr(), from Daniel Borkmann. 17) sis900 gets spurious TX queue timeouts due to mismanagement of link carrier state, from Denis Kirjanov. 18) Validate somaxconn sysctl to make sure it fits inside of a u16. From Roman Gushchin. 19) Fix MAC address filtering on qlcnic, from Shahed Shaikh. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits) qlcnic: Fix for flash update failure on 83xx adapter qlcnic: Fix link speed and duplex display for 83xx adapter qlcnic: Fix link speed display for 82xx adapter qlcnic: Fix external loopback test. qlcnic: Removed adapter series name from warning messages. qlcnic: Free up memory in error path. qlcnic: Fix ingress MAC learning qlcnic: Fix MAC address filter issue on 82xx adapter net: ethernet: davinci_emac: drop IRQF_DISABLED netlabel: use domain based selectors when address based selectors are not available net: check net.core.somaxconn sysctl values sis900: Fix the tx queue timeout issue net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails r8169: remove "PHY reset until link up" log spam net: ethernet: cpsw: drop IRQF_DISABLED htb: fix sign extension bug macvlan: handle set_promiscuity failures macvlan: better mode validation tipc: fix oops when creating server socket fails net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL ...	2013-08-03 15:00:23 -07:00
Linus Torvalds	83aaf3b39c	Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfixes from Bruce Fields: "Most of this is due to a screwup on my part -- some gss-proxy crashes got fixed before the merge window but somehow never made it out of a temporary git repo on my laptop...." * 'for-3.11' of git://linux-nfs.org/~bfields/linux: svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcall svcrpc: fix kfree oops in gss-proxy code svcrpc: fix gss-proxy xdr decoding oops svcrpc: fix gss_rpc_upcall create error NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.	2013-08-03 11:15:03 -07:00
Stefan Tomanek	73f5698e77	fib_rules: fix suppressor names and default values This change brings the suppressor attribute names into line; it also changes the data types to provide a more consistent interface. While -1 indicates that the suppressor is not enabled, values >= 0 for suppress_prefixlen or suppress_ifgroup reject routing decisions violating the constraint. This changes the previously presented behaviour of suppress_prefixlen, where a prefix length _less_ than the attribute value was rejected. After this change, a prefix length less than or equal to the value is considered a violation of the rule constraint. It also changes the default values for default and newly added rules (disabling any suppression for those). Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-03 10:40:23 -07:00
Wang Sheng-Hui	0c0667a854	vlan: cleanup the usage of vlan_dev_priv(dev) This patch cleanup 2 points for the usage of vlan_dev_priv(dev): * In vlan_dev.c/vlan_dev_hard_header, we should use the var vlan directly after grabing the pointer at the beginning with vlan = vlan_dev_priv(dev); when we need to access the fields of vlan. In vlan.c/register_vlan_device, add the var vlan pointer struct vlan_dev_priv vlan; to cleanup the code to access the fields of vlan_dev_priv(new_dev). Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-03 10:32:25 -07:00
Paul Moore	6a8b7f0c85	netlabel: use domain based selectors when address based selectors are not available NetLabel has the ability to selectively assign network security labels to outbound traffic based on either the LSM's "domain" (different for each LSM), the network destination, or a combination of both. Depending on the type of traffic, local or forwarded, and the type of traffic selector, domain or address based, different hooks are used to label the traffic; the goal being minimal overhead. Unfortunately, there is a bug such that a system using NetLabel domain based traffic selectors does not correctly label outbound local traffic that is not assigned to a socket. The issue is that in these cases the associated NetLabel hook only looks at the address based selectors and not the domain based selectors. This patch corrects this by checking both the domain and address based selectors so that the correct labeling is applied, regardless of the configuration type. In order to acomplish this fix, this patch also simplifies some of the NetLabel domainhash structures to use a more common outbound traffic mapping type: struct netlbl_dommap_def. This simplifies some of the code in this patch and paves the way for further simplifications in the future. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 16:57:01 -07:00
Veaceslav Falico	63134803a6	neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 15:44:23 -07:00
Daniel Borkmann	8a849bb7f0	net: netlink: minor: remove unused pointer in alloc_pg_vec Variable ptr is being assigned, but never used, so just remove it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 15:26:12 -07:00
Stefan Tomanek	6ef94cfafb	fib_rules: add route suppression based on ifgroup This change adds the ability to suppress a routing decision based upon the interface group the selected interface belongs to. This allows it to exclude specific devices from a routing decision. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 15:24:22 -07:00
Roman Gushchin	5f671d6b4e	net: check net.core.somaxconn sysctl values It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as unsigned short. Therefore, the backlog argument in inet_listen() shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall is truncated to the somaxconn value. So, the somaxconn value shouldn't exceed 65535 (USHRT_MAX). Also, negative values of somaxconn are meaningless. before: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 net.core.somaxconn = 65536 $ sysctl -w net.core.somaxconn=-100 net.core.somaxconn = -100 after: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 error: "Invalid argument" setting key "net.core.somaxconn" $ sysctl -w net.core.somaxconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Reported-by: Changli Gao <xiaosuo@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 15:18:53 -07:00
Werner Almesberger	d1c53c8e87	icmpv6_filter: allow ICMPv6 messages with bodies < 4 bytes By using sizeof(_hdr), net/ipv6/raw.c:icmpv6_filter implicitly assumes that any valid ICMPv6 message is at least eight bytes long, i.e., that the message body is at least four bytes. The DIS message of RPL (RFC 6550 section 6.2, from the 6LoWPAN world), has a minimum length of only six bytes, and is thus blocked by icmpv6_filter. RFC 4443 seems to allow even a zero-sized body, making the minimum allowable message size four bytes. Signed-off-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 15:15:50 -07:00
Werner Almesberger	9cc08af3a1	icmpv6_filter: fix "_hdr" incorrectly being a pointer "_hdr" should hold the ICMPv6 header while "hdr" is the pointer to it. This worked by accident. Signed-off-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 15:15:50 -07:00
Phil Sutter	c483e02614	af_packet: simplify VLAN frame check in packet_snd For ethernet frames, eth_type_trans() already parses the header, so one can skip this when checking the frame size. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 14:58:32 -07:00
Phil Sutter	cbd89acb9e	af_packet: fix for sending VLAN frames via packet_mmap Since tpacket_fill_skb() parses the protocol field in ethernet frames' headers, it's easy to see if any passed frame is a VLAN one and account for the extended size. But as the real protocol does not turn up before tpacket_fill_skb() runs which in turn also checks the frame length, move the max frame length calculation into the function. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 14:58:32 -07:00
Phil Sutter	0f75b09c79	af_packet: when sending ethernet frames, parse header for skb->protocol This may be necessary when the SKB is passed to other layers on the go, which check the protocol field on their own. An example is a VLAN packet sent out using AF_PACKET on a bridge interface. The bridging code checks the SKB size, accounting for any VLAN header only if the protocol field is set accordingly. Note that eth_type_trans() sets skb->dev to the passed argument, so this can be skipped in packet_snd() for ethernet frames, as well. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 14:58:32 -07:00
Daniel Borkmann	446266b0c7	net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails Commit `5c766d642` ("ipv4: introduce address lifetime") leaves the ifa resource that was allocated via inet_alloc_ifa() unfreed when returning the function with -EINVAL. Thus, free it first via inet_free_ifa(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 14:56:06 -07:00
stephen hemminger	cbd375567f	htb: fix sign extension bug When userspace passes a large priority value the assignment of the unsigned value hopt->prio to signed int cl->prio causes cl->prio to become negative and the comparison is with TC_HTB_NUMPRIO is always false. The result is that HTB crashes by referencing outside the array when processing packets. With this patch the large value wraps around like other values outside the normal range. See: https://bugzilla.kernel.org/show_bug.cgi?id=60669 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 14:52:20 -07:00
fan.du	d27fc78208	sctp: Don't lookup dst if transport dst is still valid When sctp sits on IPv6, sctp_transport_dst_check pass cookie as ZERO, as a result ip6_dst_check always fail out. This behaviour makes transport->dst useless, because every sctp_packet_transmit must look for valid dst. Add a dst_cookie into sctp_transport, and set the cookie whenever we get new dst for sctp_transport. So dst validness could be checked against it. Since I have split genid for IPv4 and IPv6, also delete/add IPv6 address will also bump IPv6 genid. So issues we discussed in: http://marc.info/?l=linux-netdev&m=137404469219410&w=4 have all been sloved for this patch. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-02 12:36:00 -07:00
John W. Linville	89b59bcd3a	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-02 14:54:19 -04:00
fan.du	439677d766	ipv6: bump genid when delete/add address Server Client 2001:1::803/64 <-> 2001:1::805/64 2001:2::804/64 <-> 2001:2::806/64 Server side fib binary tree looks like this: (2001:/64) / / ffff88002103c380 / \ (2) / \ (2001::803/128) ffff880037ac07c0 / \ / \ (3) ffff880037ac0640 (2001::806/128) / \ (1) / \ (2001::804/128) (2001::805/128) Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3) destinate to 2001::806 with source address as 2001::804/64. That's because 2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is where the substantial difference between same prefix configuration and different prefix configuration :) So packet are still transmitted out to 2001::806 with source address as 2001::804/64. So bump genid will clear rt in (3), and up layer protocol will eventually find the right one for themselves. This problem arised from the discussion in here: http://marc.info/?l=linux-netdev&m=137404469219410&w=4 Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 16:33:32 -07:00
Ying Xue	c756891a4e	tipc: fix oops when creating server socket fails When creation of TIPC internal server socket fails, we get an oops with the following dump: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc] PGD 13719067 PUD 12008067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: tipc(+) CPU: 4 PID: 4340 Comm: insmod Not tainted 3.10.0+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880014360000 ti: ffff88001374c000 task.ti: ffff88001374c000 RIP: 0010:[<ffffffffa0011f49>] [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc] RSP: 0018:ffff88001374dc98 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff880012ac09d8 RCX: 0000000000000000 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffff880014360000 RBP: ffff88001374dcb8 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0016fa0 R13: ffffffffa0017010 R14: ffffffffa0017010 R15: ffff880012ac09d8 FS: 0000000000000000(0000) GS:ffff880016600000(0063) knlGS:00000000f76668d0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000000000020 CR3: 0000000012227000 CR4: 00000000000006e0 Stack: ffff88001374dcb8 ffffffffa0016fa0 0000000000000000 0000000000000001 ffff88001374dcf8 ffffffffa0012922 ffff88001374dce8 00000000ffffffea ffffffffa0017100 0000000000000000 ffff8800134241a8 ffffffffa0017150 Call Trace: [<ffffffffa0012922>] tipc_server_stop+0xa2/0x1b0 [tipc] [<ffffffffa0009995>] tipc_subscr_stop+0x15/0x20 [tipc] [<ffffffffa00130f5>] tipc_core_stop+0x1d/0x33 [tipc] [<ffffffffa001f0d4>] tipc_init+0xd4/0xf8 [tipc] [<ffffffffa001f000>] ? 0xffffffffa001efff [<ffffffff8100023f>] do_one_initcall+0x3f/0x150 [<ffffffff81082f4d>] ? __blocking_notifier_call_chain+0x7d/0xd0 [<ffffffff810cc58a>] load_module+0x11aa/0x19c0 [<ffffffff810c8d60>] ? show_initstate+0x50/0x50 [<ffffffff8190311c>] ? retint_restore_args+0xe/0xe [<ffffffff810cce79>] SyS_init_module+0xd9/0x110 [<ffffffff8190dc65>] sysenter_dispatch+0x7/0x1f Code: 6c 24 70 4c 89 ef e8 b7 04 8f e1 8b 73 04 4c 89 e7 e8 7c 9e 32 e1 41 83 ac 24 b8 00 00 00 01 4c 89 ef e8 eb 0a 8f e1 48 8b 43 08 <4c> 8b 68 20 4d 8d a5 48 03 00 00 4c 89 e7 e8 04 05 8f e1 4c 89 RIP [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc] RSP <ffff88001374dc98> CR2: 0000000000000020 ---[ end trace b02321f40e4269a3 ]--- We have the following call chain: tipc_core_start() ret = tipc_subscr_start() ret = tipc_server_start(){ server->enabled = 1; ret = tipc_open_listening_sock() } I.e., the server->enabled flag is unconditionally set to 1, whatever the return value of tipc_open_listening_sock(). This causes a crash when tipc_core_start() tries to clean up resources after a failed initialization: if (ret == failed) tipc_subscr_stop() tipc_server_stop(){ if (server->enabled) tipc_close_conn(){ NULL reference of con->sock-sk OOPS! } } To avoid this, tipc_server_start() should only set server->enabled to 1 in case of a succesful socket creation. In case of failure, it should release all allocated resources before returning. Problem introduced in commit `c5fa7b3cf3` ("tipc: introduce new TIPC server infrastructure") in v3.11-rc1. Note that it won't be seen often; it takes a module load under memory constrained conditions in order to trigger the failure condition. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 15:54:33 -07:00
Cong Wang	e0d1095ae3	net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL Eliezer renames several ll_poll to busy_poll, but forgets CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too. Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 15:11:17 -07:00
Jiri Benc	8a226b2cfa	ipv6: prevent race between address creation and removal There's a race in IPv6 automatic addess assignment. The address is created with zero lifetime when it's added to various address lists. Before it gets assigned the correct lifetime, there's a window where a new address may be configured. This causes the semi-initiated address to be deleted in addrconf_verify. This was discovered as a reference leak caused by concurrent run of __ipv6_ifa_notify for both RTM_NEWADDR and RTM_DELADDR with the same address. Fix this by setting the lifetime before the address is added to inet6_addr_lst. A few notes: 1. In addrconf_prefix_rcv, by setting update_lft to zero, the if (update_lft) { ... } condition is no longer executed for newly created addresses. This is okay, as the ifp fields are set in ipv6_add_addr now and ipv6_ifa_notify is called (and has been called) through addrconf_dad_start. 2. The removal of the whole block under ifp->lock in inet6_addr_add is okay, too, as tstamp is initialized to jiffies in ipv6_add_addr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 14:16:20 -07:00
Jiri Pirko	3f8f52982a	ipv6: move peer_addr init into ipv6_add_addr() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 14:16:20 -07:00
Michal Kubeček	49a18d86f6	ipv6: update ip6_rt_last_gc every time GC is run As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got there from ip6_dst_gc(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 14:16:20 -07:00
Michal Kubeček	2ac3ac8f86	ipv6: prevent fib6_run_gc() contention On a high-traffic router with many processors and many IPv6 dst entries, soft lockup in fib6_run_gc() can occur when number of entries reaches gc_thresh. This happens because fib6_run_gc() uses fib6_gc_lock to allow only one thread to run the garbage collector but ip6_dst_gc() doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc() returns. On a system with many entries, this can take some time so that in the meantime, other threads pass the tests in ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for the lock. They then have to run the garbage collector one after another which blocks them for quite long. Resolve this by replacing special value ~0UL of expire parameter to fib6_run_gc() by explicit "force" parameter to choose between spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with force=false if gc_thresh is reached but not max_size. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-01 14:16:20 -07:00
John W. Linville	7546ff9549	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2013-08-01 15:26:52 -04:00
John W. Linville	22e02a0272	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-08-01 14:30:59 -04:00

... 2 3 4 5 6 ...

29322 Commits