Commit Graph

2609 Commits

Author SHA1 Message Date
Linus Torvalds
f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Linus Torvalds
b20dcab9d4 LLVMLinux patches for v3.16
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlOTY+wACgkQuseO5dulBZXrIgCdFZyXRojufLLKikWEvHjZ3/k5
 KsQAnimtcge+62/IX7YwDjWS+xg9Wt3m
 =yPrI
 -----END PGP SIGNATURE-----

Merge tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel

Pull LLVM patches from Behan Webster:
 "Next set of patches to support compiling the kernel with clang.
  They've been soaking in linux-next since the last merge window.

  More still in the works for the next merge window..."

* tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel:
  arm, unwind, LLVMLinux: Enable clang to be used for unwinding the stack
  ARM: LLVMLinux: Change "extern inline" to "static inline" in glue-cache.h
  all: LLVMLinux: Change DWARF flag to support gcc and clang
  net: netfilter: LLVMLinux: vlais-netfilter
  crypto: LLVMLinux: aligned-attribute.patch
2014-06-08 12:27:44 -07:00
Linus Torvalds
3f17ea6dea Merge branch 'next' (accumulated 3.16 merge window patches) into master
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.

* accumulated work in next: (6809 commits)
  ufs: sb mutex merge + mutex_destroy
  powerpc: update comments for generic idle conversion
  cris: update comments for generic idle conversion
  idle: remove cpu_idle() forward declarations
  nbd: zero from and len fields in NBD_CMD_DISCONNECT.
  mm: convert some level-less printks to pr_*
  MAINTAINERS: adi-buildroot-devel is moderated
  MAINTAINERS: add linux-api for review of API/ABI changes
  mm/kmemleak-test.c: use pr_fmt for logging
  fs/dlm/debug_fs.c: replace seq_printf by seq_puts
  fs/dlm/lockspace.c: convert simple_str to kstr
  fs/dlm/config.c: convert simple_str to kstr
  mm: mark remap_file_pages() syscall as deprecated
  mm: memcontrol: remove unnecessary memcg argument from soft limit functions
  mm: memcontrol: clean up memcg zoneinfo lookup
  mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
  mm/mempool.c: update the kmemleak stack trace for mempool allocations
  lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
  mm: introduce kmemleak_update_trace()
  mm/kmemleak.c: use %u to print ->checksum
  ...
2014-06-08 11:31:16 -07:00
Mark Charlebois
066c6807f7 net: netfilter: LLVMLinux: vlais-netfilter
Replaced non-standard C use of Variable Length Arrays In Structs (VLAIS) in
xt_repldata.h with a C99 compliant flexible array member and then calculated
offsets to the other struct members. These other members aren't referenced by
name in this code, however this patch maintains the same memory layout and
padding as was previously accomplished using VLAIS.

Had the original structure been ordered differently, with the entries VLA at
the end, then it could have been a flexible member, and this patch would have
been a lot simpler. However since the data stored in this structure is
ultimately exported to userspace, the order of this structure can't be changed.

This patch makes no attempt to change the existing behavior, merely the way in
which the current layout is accomplished using standard C99 constructs. As such
the code can now be compiled with either gcc or clang.

This version of the patch removes the trailing alignment that the VLAIS
structure would allocate in order to simplify the patch.

Author: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com>
2014-06-07 11:44:39 -07:00
David S. Miller
6934e79ed1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/nf_tables fixes for net-next

This patchset contains fixes for recent updates available in your
net-next, they are:

1) Fix double memory allocation for accounting objects that results
   in a leak, this slipped through with the new quota extension,
   patch from Mathieu Poirier.

2) Fix broken ordering when adding set element transactions.

3) Make sure that objects are released in reverse order in the abort
   path, to avoid possible use-after-free when accessing dependencies.

4) Allow to delete several objects (as long as dependencies are
   fulfilled) by using one batch. This includes changes in the use
   counter semantics of the nf_tables objects.

5) Fix illegal sleeping allocation from rcu callback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 15:35:04 -07:00
WANG Cong
4cb28970a2 net: use the new API kvfree()
It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 00:49:51 -07:00
David S. Miller
c99f7abf0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/inetpeer.h
	net/ipv6/output_core.c

Changes in net were fixing bugs in code removed in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 23:32:12 -07:00
Linus Torvalds
776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
Eric Dumazet
73f156a6e8 inetpeer: get rid of ip_id_count
Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:00:41 -07:00
Pablo Neira Ayuso
31f8441c32 netfilter: nf_tables: atomic allocation in set notifications from rcu callback
Use GFP_ATOMIC allocations when sending removal notifications of
anonymous sets from rcu callback context. Sleeping in that context
is illegal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:38 +02:00
Pablo Neira Ayuso
4fefee570d netfilter: nf_tables: allow to delete several objects from a batch
Three changes to allow the deletion of several objects with dependencies
in one transaction, they are:

1) Introduce speculative counter increment/decrement that is undone in
   the abort path if required, thus we avoid hitting -EBUSY when deleting
   the chain. The counter updates are reverted in the abort path.

2) Increment/decrement table/chain use counter for each set/rule. We need
   this to fully rely on the use counters instead of the list content,
   eg. !list_empty(&chain->rules) which evaluate true in the middle of the
   transaction.

3) Decrement table use counter when an anonymous set is bound to the
   rule in the commit path. This avoids hitting -EBUSY when deleting
   the table that contains anonymous sets. The anonymous sets are released
   in the nf_tables_rule_destroy path. This should not be a problem since
   the rule already bumped the use counter of the chain, so the bound
   anonymous set reflects dependencies through the rule object, which
   already increases the chain use counter.

So the general assumption after this patch is that the use counters are
bumped by direct object dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:35 +02:00
Pablo Neira Ayuso
7632667d26 netfilter: nft_rbtree: introduce locking
There's no rbtree rcu version yet, so let's fall back on the spinlock
to protect the concurrent access of this structure both from user
(to update the set content) and kernel-space (in the packet path).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:31 +02:00
Pablo Neira Ayuso
a1cee076f4 netfilter: nf_tables: release objects in reverse order in the abort path
The patch c7c32e7 ("netfilter: nf_tables: defer all object release via
rcu") indicates that we always release deleted objects in the reverse
order, but that is only needed in the abort path. These are the two
possible scenarios when releasing objects:

1) Deletion scenario in the commit path: no need to release objects in
the reverse order since userspace already ensures that dependencies are
fulfilled), ie. userspace tells us to delete rule -> ... -> rule ->
chain -> table. In this case, we have to release the objects in the
*same order* as userspace provided.

2) Deletion scenario in the abort path: we have to iterate in the reverse
order to undo what it cannot be added, ie. userspace sent us a batch
that includes: table -> chain -> rule -> ... -> rule, and that needs to
be partially undone. In this case, we have to release objects in the
reverse order to ensure that the set and chain objects point to valid
rule and table objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:28 +02:00
Pablo Neira Ayuso
46bbafceb2 netfilter: nf_tables: fix wrong transaction ordering in set elements
The transaction needs to be placed at the end of the commit list,
otherwise event notifications are reordered and we may crash when
releasing object via call_rcu.

This problem was introduced in 60319eb ("netfilter: nf_tables: use new
transaction infrastructure to handle elements").

Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:25 +02:00
Mathieu Poirier
4c552a64df netfilter: nfnetlink_acct: Fix memory leak
Allocation of memory need only to happen once, that is
after the proper checks on the NFACCT_FLAGS have been
done.  Otherwise the code can return without freeing
already allocated memory.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:46:52 +02:00
David S. Miller
90d0e08e57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

This small patchset contains three accumulated Netfilter/IPVS updates,
they are:

1) Refactorize common NAT code by encapsulating it into a helper
   function, similarly to what we do in other conntrack extensions,
   from Florian Westphal.

2) A minor format string mismatch fix for IPVS, from Masanari Iida.

3) Add quota support to the netfilter accounting infrastructure, now
   you can add quotas to accounting objects via the nfnetlink interface
   and use them from iptables. You can also listen to quota
   notifications from userspace. This enhancement from Mathieu Poirier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:54:47 -07:00
Peter Christensen
f44a5f45f5 ipvs: Fix panic due to non-linear skb
Receiving a ICMP response to an IPIP packet in a non-linear skb could
cause a kernel panic in __skb_pull.

The problem was introduced in
commit f2edb9f770 ("ipvs: implement
passive PMTUD for IPIP packets").

Signed-off-by: Peter Christensen <pch@ordbogen.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-05-26 10:22:46 +09:00
David S. Miller
54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Daniel Borkmann
b1fcd35cf5 net: filter: let unattached filters use sock_fprog_kern
The sk_unattached_filter_create() API is used by BPF filters that
are not directly attached or related to sockets, and are used in
team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own
internal managment of obtaining filter blocks and thus already
have them in kernel memory and set up before calling into
sk_unattached_filter_create(). As a result, due to __user annotation
in sock_fprog, sparse triggers false positives (incorrect type in
assignment [different address space]) when filters are set up before
passing them to sk_unattached_filter_create(). Therefore, let
sk_unattached_filter_create() API use sock_fprog_kern to overcome
this issue.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:48:05 -04:00
David S. Miller
8af750d739 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
Netfilter/nftables updates for net-next

The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:

1) Add set element update notification via netlink, from Arturo Borrero.

2) Put all object updates in one single message batch that is sent to
   kernel-space. Before this patch only rules where included in the batch.
   This series also introduces the generic transaction infrastructure so
   updates to all objects (tables, chains, rules and sets) are applied in
   an all-or-nothing fashion, these series from me.

3) Defer release of objects via call_rcu to reduce the time required to
   commit changes. The assumption is that all objects are destroyed in
   reverse order to ensure that dependencies betweem them are fulfilled
   (ie. rules and sets are destroyed first, then chains, and finally
   tables).

4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
   include two patches to prepare this new feature.

5) Implement the proper set selection based on the characteristics of the
   data. The new infrastructure also allows you to specify your preferences
   in terms of memory and computational complexity so the underlying set
   type is also selected according to your needs, from Patrick McHardy.

6) Several cleanup patches for nft expressions, including one minor possible
   compilation breakage due to missing mark support, also from Patrick.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 12:06:23 -04:00
Pablo Neira Ayuso
c7c32e72cb netfilter: nf_tables: defer all object release via rcu
Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso
128ad3322b netfilter: nf_tables: remove skb and nlh from context structure
Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso
35151d840c netfilter: nf_tables: simplify nf_tables_*_notify
Now that all these function are called from the commit path, we can
pass the context structure to reduce the amount of parameters in all
of the nf_tables_*_notify functions. This patch also removes unneeded
branches to check for skb, nlh and net that should be always set in
the context structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
60319eb1ca netfilter: nf_tables: use new transaction infrastructure to handle elements
Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
55dd6f9307 netfilter: nf_tables: use new transaction infrastructure to handle table
This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
e1aaca93ee netfilter: nf_tables: pass context to nf_tables_updtable()
So nf_tables_uptable() only takes one single parameter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
f75edf5e9c netfilter: nf_tables: disabling table hooks always succeeds
nf_tables_table_disable() always succeeds, make this function void.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
91c7b38dc9 netfilter: nf_tables: use new transaction infrastructure to handle chain
This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
ff3cd7b3c9 netfilter: nf_tables: refactor chain statistic routines
Add new routines to encapsulate chain statistics allocation and
replacement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
958bee14d0 netfilter: nf_tables: use new transaction infrastructure to handle sets
This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:

 1) create the set and send netlink message to the kernel
 2) process the response from the kernel that contains the allocated name.
 3) add the set elements and send netlink message to the kernel.
 4) process the response from the kernel (to check for errors).

To:

 1) add the set to the batch.
 2) add the set elements to the batch.
 3) add the rule that points to the set.
 4) send batch to the kernel.

This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.

Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
b380e5c733 netfilter: nf_tables: add message type to transactions
The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
37082f930b netfilter: nf_tables: relocate commit and abort routines in the source file
Move the commit and abort routines to the bottom of the source code
file. This change is required by the follow up patches that add the
set, chain and table transaction support.

This patch is just a cleanup to access several functions without
having to declare their prototypes.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
1081d11b08 netfilter: nf_tables: generalise transaction infrastructure
This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
7c95f6d866 netfilter: nf_tables: deconstify table and chain in context structure
The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:09 +02:00
Pablo Neira Ayuso
3b084e99a3 netfilter: nf_tables: fix trace of matching non-terminal rule
Add the corresponding trace if we have a full match in a non-terminal
rule. Note that the traces will look slightly different than in
x_tables since the log message after all expressions have been
evaluated (contrary to x_tables, that emits it before the target
action). This manifests in two differences in nf_tables wrt. x_tables:

1) The rule that enables the tracing is included in the trace.

2) If the rule emits some log message, that is shown before the
   trace log message.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-15 19:44:20 +02:00
WANG Cong
60ff746739 net: rename local_df to ignore_df
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-12 14:03:41 -04:00
David S. Miller
5f013c9bc7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/altera/altera_sgdma.c
	net/netlink/af_netlink.c
	net/sched/cls_api.c
	net/sched/sch_api.c

The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces.  These were simple transformations from
netlink_capable to netlink_ns_capable.

The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-12 13:19:14 -04:00
Pablo Neira Ayuso
7e9bc10db2 netfilter: nf_tables: fix missing return trace at the end of non-base chain
Display "return" for implicit rule at the end of a non-base chain,
instead of when popping chain from the stack.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-12 16:33:11 +02:00
Pablo Neira Ayuso
f7e7e39b21 netfilter: nf_tables: fix bogus rulenum after goto action
After returning from the chain that we just went to with no matchings,
we get a bogus rule number in the trace. To fix this, we would need
to iterate over the list of remaining rules in the chain to update the
rule number counter.

Patrick suggested to set this to the maximum value since the default
base chain policy is the very last action when the processing the base
chain is over.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-12 16:33:10 +02:00
Pablo Neira Ayuso
7b9d5ef932 netfilter: nf_tables: fix tracing of the goto action
Add missing code to trace goto actions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-12 16:33:08 +02:00
Pablo Neira Ayuso
5467a51221 netfilter: nf_tables: fix goto action
This patch fixes a crash when trying to access the counters and the
default chain policy from the non-base chain that we have reached
via the goto chain. Fix this by falling back on the original base
chain after returning from the custom chain.

While fixing this, kill the inline function to account chain statistics
to improve source code readability.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-12 16:32:41 +02:00
Pablo Neira Ayuso
d088be8042 netfilter: nf_tables: reset rule number counter after jump and goto
Otherwise we start incrementing the rule number counter from the
previous chain iteration.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-10 19:12:04 +02:00
Denys Fedoryshchenko
ecd15dd7e4 netfilter: nfnetlink: Fix use after free when it fails to process batch
This bug manifests when calling the nft command line tool without
nf_tables kernel support.

kernel message:
[   44.071555] Netfilter messages via NETLINK v0.30.
[   44.072253] BUG: unable to handle kernel NULL pointer dereference at 0000000000000119
[   44.072264] IP: [<ffffffff8171db1f>] netlink_getsockbyportid+0xf/0x70
[   44.072272] PGD 7f2b74067 PUD 7f2b73067 PMD 0
[   44.072277] Oops: 0000 [#1] SMP
[...]
[   44.072369] Call Trace:
[   44.072373]  [<ffffffff8171fd81>] netlink_unicast+0x91/0x200
[   44.072377]  [<ffffffff817206c9>] netlink_ack+0x99/0x110
[   44.072381]  [<ffffffffa004b951>] nfnetlink_rcv+0x3c1/0x408 [nfnetlink]
[   44.072385]  [<ffffffff8171fde3>] netlink_unicast+0xf3/0x200
[   44.072389]  [<ffffffff817201ef>] netlink_sendmsg+0x2ff/0x740
[   44.072394]  [<ffffffff81044752>] ? __mmdrop+0x62/0x90
[   44.072398]  [<ffffffff816dafdb>] sock_sendmsg+0x8b/0xc0
[   44.072403]  [<ffffffff812f1af5>] ? copy_user_enhanced_fast_string+0x5/0x10
[   44.072406]  [<ffffffff816dbb6c>] ? move_addr_to_kernel+0x2c/0x50
[   44.072410]  [<ffffffff816db423>] ___sys_sendmsg+0x3c3/0x3d0
[   44.072415]  [<ffffffff811301ba>] ? handle_mm_fault+0xa9a/0xc60
[   44.072420]  [<ffffffff811362d6>] ? mmap_region+0x166/0x5a0
[   44.072424]  [<ffffffff817da84c>] ? __do_page_fault+0x1dc/0x510
[   44.072428]  [<ffffffff812b8b2c>] ? apparmor_capable+0x1c/0x60
[   44.072435]  [<ffffffff817d6e9a>] ? _raw_spin_unlock_bh+0x1a/0x20
[   44.072439]  [<ffffffff816dfc86>] ? release_sock+0x106/0x150
[   44.072443]  [<ffffffff816dc212>] __sys_sendmsg+0x42/0x80
[   44.072446]  [<ffffffff816dc262>] SyS_sendmsg+0x12/0x20
[   44.072450]  [<ffffffff817df616>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-04 15:14:08 +02:00
Florian Westphal
f768e5bdef netfilter: add helper for adding nat extension
Reduce copy-past a bit by adding a common helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-29 20:56:22 +02:00
Florian Westphal
fe337ac283 netfilter: ctnetlink: don't add null bindings if no nat requested
commit 0eba801b64 tried to fix a race
where nat initialisation can happen after ctnetlink-created conntrack
has been created.

However, it causes the nat module(s) to be loaded needlessly on
systems that are not using NAT.

Fortunately, we do not have to create null bindings in that case.

conntracks injected via ctnetlink always have the CONFIRMED bit set,
which prevents addition of the nat extension in nf_nat_ipv4/6_fn().

We only need to make sure that either no nat extension is added
or that we've created both src and dst manips.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-29 20:49:08 +02:00
Mathieu Poirier
683399eddb netfilter: nfnetlink_acct: Adding quota support to accounting framework
nfacct objects already support accounting at the byte and packet
level.  As such it is a natural extension to add the possiblity to
define a ceiling limit for both metrics.

All the support for quotas itself is added to nfnetlink acctounting
framework to stay coherent with current accounting object management.
Quota limit checks are implemented in xt_nfacct filter where
statistic collection is already done.

Pablo Neira Ayuso has also contributed to this feature.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-29 18:25:14 +02:00
Pablo Neira
4c1f7818e4 netfilter: nf_tables: relax string validation of NFTA_CHAIN_TYPE
Use NLA_STRING for consistency with other string attributes in
nf_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-28 16:54:15 +02:00
David S. Miller
a64d90fd96 netfilter: Fix warning in nfnetlink_receive().
net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’:
net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable]

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 13:51:29 -04:00
Eric W. Biederman
90f62cf30a net: Use netlink_ns_capable to verify the permisions of netlink messages
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 13:44:54 -04:00
Masanari Iida
1404c3ab98 netfilter: Fix format string mismatch in ip_vs_proto_name()
Fix string mismatch in ip_vs_proto_name()

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-23 14:14:14 +02:00