Commit Graph

495480 Commits

Author SHA1 Message Date
David S. Miller
3f3558bb51 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netfront.c

Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 00:53:17 -05:00
Linus Torvalds
a6391a924c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't use uninitialized data in IPVS, from Dan Carpenter.

 2) conntrack race fixes from Pablo Neira Ayuso.

 3) Fix TX hangs with i40e, from Jesse Brandeburg.

 4) Fix budget return from poll calls in dnet and alx, from Eric
    Dumazet.

 5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
    Jaeger.

 6) Fix bug introduced by conversion to list_head in TIPC retransmit
    code, from Jon Paul Maloy.

 7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
    Khoroshilov.

 8) Fix bridge build with INET disabled, from Arnd Bergmann.

 9) Fix netlink array overrun for PROBE attributes in openvswitch, from
    Thomas Graf.

10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
    Prashant Sreedharan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  tg3: Release tp->lock before invoking synchronize_irq()
  tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
  tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
  team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
  openvswitch: packet messages need their own probe attribtue
  i40e: adds FCoE configure option
  cxgb4vf: Fix queue allocation for 40G adapter
  netdevice: Add missing parentheses in macro
  bridge: only provide proxy ARP when CONFIG_INET is enabled
  neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
  net: fec: fix MDIO bus assignement for dual fec SoC's
  xen-netfront: use different locks for Rx and Tx stats
  drivers: net: cpsw: fix multicast flush in dual emac mode
  cxgb4vf: Initialize mdio_addr before using it
  net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
  usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
  MAINTAINERS: add me as ibmveth maintainer
  tipc: fix bug in broadcast retransmit code
  update ip-sysctl.txt documentation (v2)
  net/at91_ether: prepare and unprepare clock
  ...
2015-01-15 11:17:37 +13:00
David S. Miller
c637dbcedf Merge branch 'tg3-net'
Prashant Sreedharan says:

====================
tg3: synchronize_irq() should be called without taking locks

v2: Added Reported-by, Tested-by fields and reference to the thread that
    reported the problem

This series addresses the problem reported by Peter Hurley in mail thread
https://lkml.org/lkml/2015/1/12/1082
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 17:05:55 -05:00
Prashant Sreedharan
932f19de6a tg3: Release tp->lock before invoking synchronize_irq()
synchronize_irq() can sleep waiting, for pending IRQ handlers so driver
should release the tp->lock spin lock before invoking synchronize_irq()

Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 17:05:51 -05:00
Prashant Sreedharan
db84bf43ef tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
Currently tg3_reset_task() uses only tp->lock for synchronizing with code
paths like tg3_open() etc. But since tp->lock is released before doing
synchronize_irq(), rtnl_lock should be taken in tg3_reset_task() to
synchronize it with other code paths.

Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 17:05:51 -05:00
Prashant Sreedharan
4fd190a938 tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
This is to avoid the race between tg3_timer() and the execution paths
which does not invoke tg3_timer_stop() and releases tp->lock before
calling synchronize_irq()

Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 17:05:50 -05:00
Linus Torvalds
48c53db220 Two bugfixes for arm64. I will have another pull request next week,
but otherwise things are calm.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUtonAAAoJEL/70l94x66DeaUH/0ZKXvCUkrfhXMwjgWFkEA/X
 zF3wm6Et7tn+5UdRTPaO9ko9HmmIZJBlZqVu9+RFFbPBFthzZDdGGXUUjVoKvWgS
 2dMejNvf3a+tw9ovXCQwr7Uy1TTqysIQP+0fcOTRJlH4peh1RTEr1JF5IEI3pM0q
 gWAsQIqijiDUg8rLYQOBBqL/Mz2j09K4YYORS548JESXdQBcBJf3nkAeaeh7RNhw
 QDt2dH9rCgLFAWdmg0wmKq12CCcHr01aZav11u30OLUEr9OcGpl9ohMqiFYJr1o9
 9aV/xk7BDBgxEXZpkpth3ziovAQ8Z6MmczykSZFLNMElhYi0V3DMNYm13SCTLBM=
 =uUFZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Two bugfixes for arm64.  I will have another pull request next week,
  but otherwise things are calm"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  arm64: KVM: Fix HCR setting for 32bit guests
  arm64: KVM: Fix TLB invalidation by IPA/VMID
2015-01-15 10:54:30 +13:00
Jiri Pirko
b0d11b4278 team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").

Consider following scenario:

count_pending == 2
   CPU0                                           CPU1
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
 team_notify_peers
   atomic_add (adding 1 to count_pending)
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 0)
   schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to -1)

Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.

Fixes: fc423ff00d ("team: add peer notification")
Fixes: 492b200efd  ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:53:57 -05:00
Linus Torvalds
6fb400d36d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Two small performance tweaks, the plumbing for the execveat system
  call and a couple of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uprobes: fix user space PER events
  s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X)
  s390/bpf: Fix ALU_NEG (A = -A)
  s390/mm: avoid using pmd_to_page for !USE_SPLIT_PMD_PTLOCKS
  s390/timex: fix get_tod_clock_ext() inline assembly
  s390: wire up execveat syscall
  s390/kernel: use stnsm 255 instead of stosm 0
  s390/vtime: Get rid of redundant WARN_ON
  s390/zcrypt: kernel oops at insmod of the z90crypt device driver
2015-01-15 10:50:29 +13:00
Thomas Graf
1ba398041f openvswitch: packet messages need their own probe attribtue
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.

Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:49:44 -05:00
Vasu Dev
776d4e9f5c i40e: adds FCoE configure option
Adds FCoE config option I40E_FCOE, so that FCoE can be enabled
as needed but otherwise have it disabled by default.

This also eliminate multiple FCoE config checks, instead now just
one config check for CONFIG_I40E_FCOE.

The I40E FCoE was added with 3.17 kernel and therefore this patch
shall be applied to stable 3.17 kernel also.

CC: <stable@vger.kernel.org>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:48:39 -05:00
Hariprasad Shenai
14b3812f7a cxgb4vf: Fix queue allocation for 40G adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:48:08 -05:00
Linus Torvalds
fb005c47f7 File-locking related bugfix for v3.19 (and v3.18-stable)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUtmOYAAoJEAAOaEEZVoIV+PwP/1YEQw67tfWoz9M67xhKZ4w7
 KO4+dXnCQFLM0zU29f7o4Yw9tWlh2SSsR07oH8YdnQBrZPxC5CI5tMg9ghDXT/9a
 jBIjsO7gM1Kgi822R2webQ6hcEwutVfOZT37Pk/ZA9jPGUuWaAexJRF8G5VzOW3O
 a4J8pYDT34coo3PsFhUfird5sgLLISCmTQ1izhDd1hE9bpgieL4rimOji84Gprqd
 InXiQvtTvAdDJMWWogEfDNbAO5LzAXy2+5/Jxe3aBKnkS3C0nHIcvBPVeM9s6Dnx
 k0vm47FTA91NTsDF3ybDSTc7QRSw8I7W2IawDoEXoMtW4TgSbE9Yd0/VeP0gFA0v
 QRm80I1pwCBEMA3E6LiDHdsZb8cPTQK4NrR91wlDKq1ln5+4QpIe33YLPk1iv39l
 XagWuxcrub3oxBlxghNw9VQI7rWbpXWtFL7GgaQI+rH8xVRojxa8CvVNxYYvMow+
 5YfrKwgWEm7wksvrzs5dHK/qzl3tlWf2oOe0DMmcy88u74q47AWOTL+N/SQXGIq2
 XJBjiK0ZcSlcF2rT5Ti9WsiJtUvB88duKq9vXIQDJ/8qV5EFKuLIDsRvCgaHNxqz
 NFNb80Dsr+BjZhnWihBpe/l/cs2ECagYE23MlBRmI+gfnL7iGZv/99zG3vx9um1Y
 AJntD7x0JF7cKGc78Epg
 =k0Aw
 -----END PGP SIGNATURE-----

Merge tag 'locks-v3.19-1' of git://git.samba.org/jlayton/linux

Pull file locking fix from Jeff Layton:
 "Just a simple bugfix for a regression that I introduced into v3.18
  with the internal lease API overhaul -- mea culpa.  Kudos to Linda and
  Neil for tracking this down and fixing it"

* tag 'locks-v3.19-1' of git://git.samba.org/jlayton/linux:
  locks: fix NULL-deref in generic_delete_lease
2015-01-15 10:38:07 +13:00
zhuyj
9a6b4b392d ipv6:icmp:remove unnecessary brackets
There are too many brackets. Maybe only one bracket is enough.

Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:35:49 -05:00
Benjamin Poirier
4ccce02eb3 netdevice: Add missing parentheses in macro
For example, one could conceivably call
	for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
and get an unexpected result.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:34:41 -05:00
Fan Du
3f4c1d87af openvswitch: Introduce ovs_tunnel_route_lookup
Introduce ovs_tunnel_route_lookup to consolidate route lookup
shared by vxlan, gre, and geneve ports.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:32:06 -05:00
Linus Torvalds
31238e61b5 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "The major part is an update to the NVMe driver, fixing various issues
  around surprise removal and hung controllers.  Most of that is from
  Keith, and parts are simple blk-mq fixes or exports/additions of minor
  functions to aid this effort, and parts are changes directly to the
  NVMe driver.

  Apart from the above, this contains:

   - Small blk-mq change from me, killing an unused member of the
     hardware queue structure.

   - Small fix from Ming Lei, fixing up a few drivers that didn't
     properly check for ERR_PTR() returns from blk_mq_init_queue()"

* 'for-linus' of git://git.kernel.dk/linux-block:
  NVMe: Fix locking on abort handling
  NVMe: Start and stop h/w queues on reset
  NVMe: Command abort handling fixes
  NVMe: Admin queue removal handling
  NVMe: Reference count admin queue usage
  NVMe: Start all requests
  blk-mq: End unstarted requests on a dying queue
  blk-mq: Allow requests to never expire
  blk-mq: Add helper to abort requeued requests
  blk-mq: Let drivers cancel requeue_work
  blk-mq: Export if requests were started
  blk-mq: Wake tasks entering queue on dying
  blk-mq: get rid of ->cmd_size in the hardware queue
  block: fix checking return value of blk_mq_init_queue
  block: wake up waiters when a queue is marked dying
  NVMe: Fix double free irq
  blk-mq: Export freeze/unfreeze functions
  blk-mq: Exit queue on alloc failure
2015-01-15 10:27:56 +13:00
David S. Miller
2733135329 Merge branch 'vxlan_rco'
Tom Herbert says:

====================
net: Remote checksum offload for VXLAN

This patch set adds support for remote checksum offload in VXLAN.

The remote checksum offload is generalized by creating a common
function (remcsum_adjust) that does the work of modifying the
checksum in remote checksum offload. This function can be called
from normal or GRO path. GUE was modified to use this function.

To support RCO is VXLAN we use the 9th bit in the reserved
flags to indicated remote checksum offload. The start and offset
values are encoded n a compressed form in the low order (reserved)
byte of the vni field.

Remote checksum offload is described in
https://tools.ietf.org/html/draft-herbert-remotecsumoffload-01

Changes in v2:
  - Add udp_offload_callbacks which has GRO functions that take a
    udp_offload pointer argument. This argument can be used to retrieve
    a per port structure of the encapsulation for use in gro processing
    (mostly by doing container_of on the structure).
  - Use the 10th bit in VXLAN flags for RCO which does not seem to
    conflict with other proposals at this time (ie. VXLAN-GPE and
    VXLAN-GPB)
  - Require that RCO must be explicitly enabled on the receiver
    as well as the sender.

Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).

With UDP checksums and Remote Checksum Offload
  IPv4
      Client
        11.84% CPU utilization
      Server
        12.96% CPU utilization
      9197 Mbps
  IPv6
      Client
        12.46% CPU utilization
      Server
        14.48% CPU utilization
      8963 Mbps

With UDP checksums, no remote checksum offload
  IPv4
      Client
        15.67% CPU utilization
      Server
        14.83% CPU utilization
      9094 Mbps
  IPv6
      Client
        16.21% CPU utilization
      Server
        14.32% CPU utilization
      9058 Mbps

No UDP checksums
  IPv4
      Client
        15.03% CPU utilization
      Server
        23.09% CPU utilization
      9089 Mbps
  IPv6
      Client
        16.18% CPU utilization
      Server
        26.57% CPU utilization
       8954 Mbps
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:20:11 -05:00
Tom Herbert
dfd8645ea1 vxlan: Remote checksum offload
Add support for remote checksum offload in VXLAN. This uses a
reserved bit to indicate that RCO is being done, and uses the low order
reserved eight bits of the VNI to hold the start and offset values in a
compressed manner.

Start is encoded in the low order seven bits of VNI. This is start >> 1
so that the checksum start offset is 0-254 using even values only.
Checksum offset (transport checksum field) is indicated in the high
order bit in the low order byte of the VNI. If the bit is set, the
checksum field is for UDP (so offset = start + 6), else checksum
field is for TCP (so offset = start + 16). Only TCP and UDP are
supported in this implementation.

Remote checksum offload for VXLAN is described in:

https://tools.ietf.org/html/draft-herbert-vxlan-rco-00

Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).

With UDP checksums and Remote Checksum Offload
  IPv4
      Client
        11.84% CPU utilization
      Server
        12.96% CPU utilization
      9197 Mbps
  IPv6
      Client
        12.46% CPU utilization
      Server
        14.48% CPU utilization
      8963 Mbps

With UDP checksums, no remote checksum offload
  IPv4
      Client
        15.67% CPU utilization
      Server
        14.83% CPU utilization
      9094 Mbps
  IPv6
      Client
        16.21% CPU utilization
      Server
        14.32% CPU utilization
      9058 Mbps

No UDP checksums
  IPv4
      Client
        15.03% CPU utilization
      Server
        23.09% CPU utilization
      9089 Mbps
  IPv6
      Client
        16.18% CPU utilization
      Server
        26.57% CPU utilization
       8954 Mbps

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:20:04 -05:00
Tom Herbert
a2b12f3c7a udp: pass udp_offload struct to UDP gro callbacks
This patch introduces udp_offload_callbacks which has the same
GRO functions (but not a GSO function) as offload_callbacks,
except there is an argument to a udp_offload struct passed to
gro_receive and gro_complete functions. This additional argument
can be used to retrieve the per port structure of the encapsulation
for use in gro processing (mostly by doing container_of on the
structure).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:20:04 -05:00
Arnd Bergmann
d92cfdbbea bridge: only provide proxy ARP when CONFIG_INET is enabled
When IPV4 support is disabled, we cannot call arp_send from
the bridge code, which would result in a kernel link error:

net/built-in.o: In function `br_handle_frame_finish':
:(.text+0x59914): undefined reference to `arp_send'
:(.text+0x59a50): undefined reference to `arp_tbl'

This makes the newly added proxy ARP support in the bridge
code depend on the CONFIG_INET symbol and lets the compiler
optimize the code out to avoid the link error.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 958501163d ("bridge: Add support for IEEE 802.11 Proxy ARP")
Cc: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:08:02 -05:00
hayeswang
d823ab68fb r8152: replace tasklet with NAPI
Replace tasklet with NAPI.

Add rx_queue to queue the remaining rx packets if the number of the
rx packets is more than the request from poll().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:06:25 -05:00
David S. Miller
237de6efc1 Merge branch 'hip04'
Ding Tianhong says:

====================
add hisilicon hip04 ethernet driver

v13:
- Fix the problem of alignment parameters for function and checkpatch warming.

v12:
- According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
  for hip04 ethernet.

v11:
- Add ethtool support for tx coalecse getting and setting, the xmit_more
  is not supported for this patch, but I think it could work for hip04,
  will support it later after some tests for performance better.

  Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
  it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

  - Before:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

  - After:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10:
- According Arnd's suggestion, remove the skb_orphan and use the hrtimer
  for the cleanup of the TX queue and add some modification for the hip04
  drivers.
  1) drop the broken skb_orphan call
  2) drop the workqueue
  3) batch cleanup based on tx_coalesce_frames/usecs for better throughput
  4) use a reasonable default tx timeout (200us, could be shorted
     based on measurements) with a range timer
  5) fix napi poll function return value
  6) use a lockless queue for cleanup

v9:
- There is no tx completion interrupts to free DMAd Tx packets, it means taht
  we rely on new tx packets arriving to run the destructors of completed packets,
  which open up space in their sockets's send queues. Sometimes we don't get such
  new packets causing Tx to stall, a single UDP transmitter is a good example of
  this situation, so we need a clean up workqueue to reclaims completed packets,
  the workqueue will only free the last packets which is already stay for several jiffies.
  Also fix some format cleanups.

v8:
- Use poll to reclaim xmitted buffer as workaround since no tx done interrupt

v7:
- Remove select NET_CORE in 0002

v6:
- Suggest by Russell: Use netdev_sent_queue & netdev_completed_queue to solve latency issue
  Also shorten the period of timer, which is used to wakeup the queue since no
  tx completed interrupt.

v5:
- no big change, fix typo

v4:
- Modify accoringly to the suggetion from Arnd, Florian, Eric, David
  Use of_parse_phandle_with_fixed_args & syscon_node_to_regmap get ppe info
  Add skb_orphan() and tx_timer for reclaim since no tx_finished interrupt
  Update timeout, and move of_phy_connect to probe to reuse open/stop

v3:
- Suggest from Arnd, use syscon & regmap_write/read to replace static void __iomem *ppebase.
  Modify hisilicon-hip04-net.txt accrordingly to suggestion from Florian and Sergei.

v2:
- Got many suggestions from Russell, Arnd, Florian, Mark and Sergei
  Remove memcpy, use dma_map/unmap_single, use dma_alloc_coherent rather than dma_pool, etc.
  Refer property in ethernet.txt, change ppe description, etc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 01:52:51 -05:00
dingtianhong
a41ea46a9a net: hisilicon: new hip04 ethernet driver
Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
The controller has no tx done interrupt, reclaim xmitted buffer in the poll.

v13: Fix the problem of alignment parameters for function and checkpatch warming.

v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
     for hip04 ethernet.

v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
     is not supported for this patch, but I think it could work for hip04,
     will support it later after some tests for performance better.

     Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
     it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

     - Before:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

     - After:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10: According David Miller and Arnd Bergmann's suggestion, add some modification
     for v9 version
     - drop the workqueue
     - batch cleanup based on tx_coalesce_frames/usecs for better throughput
     - use a reasonable default tx timeout (200us, could be shorted
       based on measurements) with a range timer
     - fix napi poll function return value
     - use a lockless queue for cleanup

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 01:52:45 -05:00
Zhangfei Gao
4a841ee928 net: hisilicon: new hip04 MDIO driver
Hisilicon hip04 platform mdio driver
Reuse Marvell phy drivers/net/phy/marvell.c

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 01:52:45 -05:00
Zhangfei Gao
ef80c32dd0 Documentation: add Device tree bindings for Hisilicon hip04 ethernet
This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 01:52:45 -05:00
Jean-Francois Remy
4bf6980dd0 neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.

This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
    recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.

This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.

Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.

Signed-off-by: Jean-Francois Remy <jeff@melix.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:28:00 -05:00
Stefan Agner
3d125f9c91 net: fec: fix MDIO bus assignement for dual fec SoC's
On i.MX28, the MDIO bus is shared between the two FEC instances.
The driver makes sure that the second FEC uses the MDIO bus of the
first FEC. This is done conditionally if FEC_QUIRK_ENET_MAC is set.
However, in newer designs, such as Vybrid or i.MX6SX, each FEC MAC
has its own MDIO bus. Simply removing the quirk FEC_QUIRK_ENET_MAC
is not an option since other logic, triggered by this quirk, is
still needed.

Furthermore, there are board designs which use the same MDIO bus
for both PHY's even though the second bus would be available on the
SoC side. Such layout are popular since it saves pins on SoC side.
Due to the above quirk, those boards currently do work fine. The
boards in the mainline tree with such a layout are:
- Freescale Vybrid Tower with TWR-SER2 (vf610-twr.dts)
- Freescale i.MX6 SoloX SDB Board (imx6sx-sdb.dts)

This patch adds a new quirk FEC_QUIRK_SINGLE_MDIO for i.MX28, which
makes sure that the MDIO bus of the first FEC is used in any case.

However, the boards above do have a SoC with a MDIO bus for each FEC
instance. But the PHY's are not connected in a 1:1 configuration. A
proper device tree description is needed to allow the driver to
figure out where to find its PHY. This patch fixes that shortcoming
by adding a MDIO bus child node to the first FEC instance, along
with the two PHY's on that bus, and making use of the phy-handle
property to add a reference to the PHY's.

Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:27:10 -05:00
Xander Huff
3ff13f1c62 net/macb: improved ethtool statistics support
Currently `ethtool -S` simply returns "no stats available". It
would be more useful to see what the various ethtool statistics
registers' values are. This change implements get_ethtool_stats,
get_strings, and get_sset_count functions to accomplish this.

Read all GEM statistics registers and sum them into
macb.ethtool_stats. Add the necessary infrastructure to make this
accessible via `ethtool -S`.

Update gem_update_stats to utilize ethtool_stats.

Signed-off-by: Xander Huff <xander.huff@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:25:50 -05:00
Xander Huff
5c2fa0f6d0 net/macb: Adding comments to various #defs to make interpretation easier
This change is to help improve at-a-glace knowledge of the purpose of the
various Cadence MACB/GEM registers. Comments are more helpful for human
readability than short acronyms.

Describe various #define varibles Cadence MACB/GEM registers as documented
in Xilinix's "Zynq-7000 All Programmable SoC TechnicalReference Manual, v1.9.1
(UG-585)"

Signed-off-by: Xander Huff <xander.huff@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:25:50 -05:00
David S. Miller
6a38cc2be6 Merge branch 'xen-netfront-next'
David Vrabel says:

====================
xen-netfront: refactor making Tx requests

As netfront as evolved to handle different sorts of skbs the code to
fill a Tx requests has been copy and pasted several times.  The series
refactors this and a few other areas.

The first patch is to a Xen header but this can be merged via
net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:22:12 -05:00
David Vrabel
a55e8bb8fb xen-netfront: refactor making Tx requests
Eliminate all the duplicate code for making Tx requests by
consolidating them into a single xennet_make_one_txreq() function.

xennet_make_one_txreq() and xennet_make_txreqs() work with pages and
offsets so it will be easier to make netfront handle highmem frags in
the future.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:22:01 -05:00
David Vrabel
e84448d521 xen-netfront: refactor skb slot counting
A function to count the number of slots an skb needs is more useful
than one that counts the slots needed for only the frags.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:22:00 -05:00
David Vrabel
28e98c2c20 xen: add page_to_mfn()
pfn_to_mfn(page_to_pfn(p)) is a common use case so add a generic
helper for it.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:22:00 -05:00
Thomas Graf
933685cacb rhashtable: Add MAINTAINERS entry
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:21:44 -05:00
Thomas Graf
80ca8c3a84 rhashtable: Lower/upper bucket may map to same lock while shrinking
Each per bucket lock covers a configurable number of buckets. While
shrinking, two buckets in the old table contain entries for a single
bucket in the new table. We need to lock down both while linking.
Check if they are protected by different locks to avoid a recursive
lock.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:21:44 -05:00
Linus Torvalds
188c901941 Merge branch 'leds-fixes-for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
Pull LED fix from Bryan Wu.

* 'leds-fixes-for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
  leds: netxbig: fix oops at probe time
2015-01-14 12:22:56 +13:00
Linus Torvalds
0c133dd00e As discussed on LKML http://marc.info/?l=linux-kernel&m=142081181707596
it was agreement that WRITE_ONCE(x, val) is better than ASSIGN_ONCE(val, x)
 
 Lets change that for 3.19 as 3.19 has no user yet, but the first users
 will hit linux-next soon.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJUtXl3AAoJEBF7vIC1phx8QLsQAIthwLT9FC94YWYrKYkC1CT3
 CY8gvjVXIki+CVs/a1qdcvuKXj4qLHBDO2ignhPvbqesmHyqIa010gsIT7sm6/d4
 GyDY5AIzoeBFW8jYwVZGr+OMxjgZCTTopqB/vO//CI4aVHB2pE9Jw4cNA6Rl8uWo
 SRXJXBWIw6nKElpTvqjV2PQXiOFqssfW5asjedzHfwpeJucjk3q59K54/RG19EUs
 uqFVoElQMMiJ8QPlb91gYWH4AAdoc/G+YQkxlwt95638U6TKX/5FHc02glvgMNtI
 EYt4/Aavn/NQLGUMeDF2IxptUCbr4F7GcSkYgYFeaYWPssgOAkMdtxyDT9SwwEt5
 Oyq27CJ9TGPzdjRACteR+NUm6I5wOsy/mLX54BoumUGrApYtZzIvJDDEmIuWDzl8
 OdrSR1+D9oi31gcL4qMJEIAe4s+ALJ/witFcdAoEXPugZsV4mkjBB7sEew1qrha9
 WovyVsqlTnKtpNaoqPWH6jdBdsW49YsEVLVxsS5BT+48ZAYQEDXcAHJ1t65mQfsk
 0c9hIwiOgt5Y1nNe0CMaQ6zLpzBRhf+wUAsg/fDZgB/P0O+Yg9XnGlNqL6WPV9xP
 HK3PVgabzYcAr+ahEwvN3Ng5SE1FJ5xUv6kXhnVVjkOrfCMbgrckNHHVr4mvzK0n
 E4VlqRGrSzfSqE7RuS2s
 =GROK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux

Pull WRITE_ONCE argument order change from Christian Borntraeger:
 "As discussed on LKML[1] it was agreed that WRITE_ONCE(x, val) is
  better than ASSIGN_ONCE(val, x)

  Lets change that for 3.19 as 3.19 has no user yet, but the first users
  will hit linux-next soon"

[1] http://marc.info/?l=linux-kernel&m=142081181707596

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
  kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)
2015-01-14 11:54:12 +13:00
Jiri Pirko
df8a39defa net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Jiri Pirko
d8b9605d26 net: sched: fix skb->protocol use in case of accelerated vlan path
tc code implicitly considers skb->protocol even in case of accelerated
vlan paths and expects vlan protocol type here. However, on rx path,
if the vlan header was already stripped, skb->protocol contains value
of next header. Similar situation is on tx path.

So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Rickard Strandqvist
8bdda5ddd1 atm: horizon: Remove some unused functions
Removes some functions that are not used anywhere:
channel_to_vpivci() query_tx_channel_config() rx_disabled_handler()

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:28:19 -05:00
Rickard Strandqvist
927a97ccd2 atm: lanai: Remove unused function
Remove the function aal5_spacefor() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:27:18 -05:00
Sasha Levin
357c4774b5 tipc: correctly handle releasing a not fully initialized sock
Commit f2f9800d49 "tipc: make tipc node table aware of net
namespace" has added a dereference of sock->sk before making sure it's
not NULL, which makes releasing a tipc socket NULL pointer dereference
for sockets that are not fully initialized.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:26:27 -05:00
David L Stevens
86cfeab6b5 sunvnet: fix rx packet length check to allow for TSO
This patch fixes the rx packet length check in the sunvnet driver to allow
for a TSO max packet length greater than the LDC channel negotiated MTU.
These are negotiated separately and there is no requirement that
port->tsolen be less than port->rmtu, but if it isn't, it'll drop packets
with rx length errors.

Signed-off-by: David L Stevens <david.stevens@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:24:21 -05:00
David Vrabel
900e183301 xen-netfront: use different locks for Rx and Tx stats
In netfront the Rx and Tx path are independent and use different
locks.  The Tx lock is held with hard irqs disabled, but Rx lock is
held with only BH disabled.  Since both sides use the same stats lock,
a deadlock may occur.

  [ INFO: possible irq lock inversion dependency detected ]
  3.16.2 #16 Not tainted
  ---------------------------------------------------------
  swapper/0/0 just changed the state of lock:
   (&(&queue->tx_lock)->rlock){-.....}, at: [<c03adec8>]
  xennet_tx_interrupt+0x14/0x34
  but this lock took another, HARDIRQ-unsafe lock in the past:
   (&stat->syncp.seq#2){+.-...}
  and interrupts could create inverse lock ordering between them.
  other info that might help us debug this:
   Possible interrupt unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&stat->syncp.seq#2);
                                 local_irq_disable();
                                 lock(&(&queue->tx_lock)->rlock);
                                 lock(&stat->syncp.seq#2);
    <Interrupt>
      lock(&(&queue->tx_lock)->rlock);

Using separate locks for the Rx and Tx stats fixes this deadlock.

Reported-by: Dmitry Piotrovsky <piotrovskydmitry@gmail.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:22:11 -05:00
Arnd Bergmann
3e7a8716e2 mISDN: avoid arch specific __builtin_return_address call
Not all architectures are able to call __builtin_return_address().
On ARM, the mISDN code produces this warning:

hardware/mISDN/w6692.c: In function 'w6692_dctrl':
hardware/mISDN/w6692.c:1181:75: warning: unsupported argument to '__builtin_return_address'
  pr_debug("%s: %s dev(%d) open from %p\n", card->name, __func__,
                                                                           ^
hardware/mISDN/mISDNipac.c: In function 'open_dchannel':
hardware/mISDN/mISDNipac.c:759:75: warning: unsupported argument to '__builtin_return_address'
  pr_debug("%s: %s dev(%d) open from %p\n", isac->name, __func__,
                                                                           ^

In a lot of cases, this is relatively easy to work around by
passing the value of __builtin_return_address(0) from the
callers into the functions that want it. One exception is
the indirect 'open' function call in struct isac_hw. While it
would be possible to fix this as well, this patch only addresses
the other callers properly and lets this one return the direct
parent function, which should be good enough.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:08:21 -05:00
Arnd Bergmann
7835bfb526 infiniband: mlx5: avoid a compile-time warning
The return type of find_first_bit() is architecture specific,
on ARM it is 'unsigned int', while the asm-generic code used
on x86 and a lot of other architectures returns 'unsigned long'.

When building the mlx5 driver on ARM, we get a warning about
this:

infiniband/hw/mlx5/mem.c: In function 'mlx5_ib_cont_pages':
infiniband/hw/mlx5/mem.c:84:143: warning: comparison of distinct pointer types lacks a cast
     m = min(m, find_first_bit(&tmp, sizeof(tmp)));

This patch changes the driver to use min_t to make it behave
the same way on all architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:08:21 -05:00
Arnd Bergmann
065bd8c28b mlx5: avoid build warnings on 32-bit
The mlx5 driver passes a string pointer in through a 'u64' variable,
which on 32-bit machines causes a build warning:

drivers/net/ethernet/mellanox/mlx5/core/debugfs.c: In function 'qp_read_field':
drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

The code is in fact safe, so we can shut up the warning by adding
extra type casts.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:08:20 -05:00
Arnd Bergmann
adedf37b59 rocker: fix harmless warning on 32-bit machines
The rocker driver tries to assign a pointer to a 64-bit integer
and then back to a pointer. This is safe on all architectures,
but causes a compiler warning when pointers are shorter than
64-bit:

rocker/rocker.c: In function 'rocker_desc_cookie_ptr_get':
rocker/rocker.c:809:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  return (void *) desc_info->desc->cookie;
         ^

This adds another cast to uintptr_t to tell the compiler
that it's safe.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 16:56:40 -05:00
Mugunthan V N
25906052d9 drivers: net: cpsw: fix multicast flush in dual emac mode
Since ALE table is a common resource for both the interfaces in Dual EMAC
mode and while bringing up the second interface in cpsw_ndo_set_rx_mode()
all the multicast entries added by the first interface is flushed out and
only second interface multicast addresses are added. Fixing this by
flushing multicast addresses based on dual EMAC port vlans which will not
affect the other emac port multicast addresses.

Fixes: d9ba8f9 (driver: net: ethernet: cpsw: dual emac interface implementation)
Cc: <stable@vger.kernel.org> # v3.9+
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 16:54:23 -05:00