Commit Graph

353 Commits

Author SHA1 Message Date
David S. Miller
f894cbf847 net: Add optional SKB arg to dst_ops->neigh_lookup().
Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:04:01 -07:00
David McCullough
4dc27d1cf3 net/ipv6/route.c: packets originating on device match lo
Fix to allow IPv6 packets originating locally to match rules with the "iff"
set to "lo".  This allows IPv6 rule matching work the same as it does for
IPv4.  From the iproute2 man page:

   iif NAME
		  select  the incoming device to match.  If the interface is loop‐
		  back, the rule only matches packets originating from this  host.
		  This  means that you may create separate routing tables for for‐
		  warded and local packets and, hence, completely segregate them.

Signed-off-by: David McCullough <david_mccullough@mcafee.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 23:54:32 -07:00
David S. Miller
e486463e82 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/translation-table.c
	net/ipv6/route.c

qmi_wwan.c resolution provided by Bjørn Mork.

batman-adv conflict is dealing merely with the changes
of global function names to have a proper subsystem
prefix.

ipv6's route.c conflict is merely two side-by-side additions
of network namespace methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 15:50:32 -07:00
Thomas Graf
d189634eca ipv6: Move ipv6 proc file registration to end of init order
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.

This opens up a short time frame to access fib_table_hash with its pants
down.

Move the registration of the proc files to a later point in the init
order to avoid the race.

Tested :-)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-18 18:38:50 -07:00
David S. Miller
aee289baaa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/route.c

Pull in 'net' again to get the revert of Thomas's change
which introduced regressions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16 01:23:04 -07:00
David S. Miller
e8803b6c38 Revert "ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route"
This reverts commit 2a0c451ade.

It causes crashes, because now ip6_null_entry is used before
it is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16 01:12:19 -07:00
David S. Miller
42ae66c80d ipv6: Fix types of ip6_update_pmtu().
The mtu should be a __be32, not the mark.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 20:01:57 -07:00
David S. Miller
7e52b33bd5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/route.c

This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 15:51:55 -07:00
Thomas Graf
2a0c451ade ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.

This opens up a short time frame to access fib_table_hash with its pants
down.

fib6_init() as a whole can't be moved to an earlier position as it also
registers the rtnetlink message handlers which should be registered at
the end. Therefore split it into fib6_init() which is run early and
fib6_init_late() to register the rtnetlink message handlers.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 15:30:15 -07:00
David S. Miller
81aded2467 ipv6: Handle PMTU in ICMP error handlers.
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
to handle the error pass the 32-bit info cookie in network byte order
whereas ipv4 passes it around in host byte order.

Like the ipv4 side, we have two helper functions.  One for when we
have a socket context and one for when we do not.

ip6ip6 tunnels are not handled here, because they handle PMTU events
by essentially relaying another ICMP packet-too-big message back to
the original sender.

This patch allows us to get rid of rt6_do_pmtu_disc().  It handles all
kinds of situations that simply cannot happen when we do the PMTU
update directly using a fully resolved route.

In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
likely be removed or changed into a BUG_ON() check.  We should never
have a prefixed ipv6 route when we get there.

Another piece of strange history here is that TCP and DCCP, unlike in
ipv4, never invoke the update_pmtu() method from their ICMP error
handlers.  This is incredibly astonishing since this is the context
where we have the most accurate context in which to make a PMTU
update, namely we have a fully connected socket and associated cached
socket route.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 14:54:11 -07:00
David S. Miller
7b34ca2ac7 inet: Avoid potential NULL peer dereference.
We handle NULL in rt{,6}_set_peer but then our caller will try to pass
that NULL pointer into inet_putpeer() which isn't ready for it.

Fix this by moving the NULL check one level up, and then remove the
now unnecessary NULL check from inetpeer_ptr_set_peer().

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 04:13:57 -07:00
David S. Miller
8b96d22d7a inet: Use FIB table peer roots in routes.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:10:54 -07:00
David S. Miller
97bab73f98 inet: Hide route peer accesses behind helpers.
We encode the pointer(s) into an unsigned long with one state bit.

The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.

Later the peer roots will be per-FIB table, and this change works to
facilitate that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:08:47 -07:00
David S. Miller
c0efc887dc inet: Pass inetpeer root into inet_getpeer*() interfaces.
Otherwise we reference potentially non-existing members when
ipv6 is disabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 19:12:36 -07:00
David S. Miller
2b823f7258 ipv6: Do not mark ipv6_inetpeer_ops as __net_initdata.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 19:00:16 -07:00
David S. Miller
56a6b248eb inet: Consolidate inetpeer_invalidate_tree() interfaces.
We only need one interface for this operation, since we always know
which inetpeer root we want to flush.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 16:32:41 -07:00
David S. Miller
c3426b4719 inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.c
Instead of net/ipv4/inetpeer.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09 16:27:05 -07:00
David S. Miller
fbfe95a42e inet: Create and use rt{,6}_get_peer_create().
There's a lot of places that open-code rt{,6}_get_peer() only because
they want to set 'create' to one.  So add an rt{,6}_get_peer_create()
for their sake.

There were also a few spots open-coding plain rt{,6}_get_peer() and
those are transformed here as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 23:24:18 -07:00
Gao feng
54db0cc2ba inetpeer: add parameter net for inet_getpeer_v4,v6
add struct net as a parameter of inet_getpeer_v[4,6],
use net to replace &init_net.

and modify some places to provide net for inet_getpeer_v[4,6]

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 14:27:23 -07:00
Eric Dumazet
a50feda546 ipv6: bool/const conversions phase2
Mostly bool conversions, some inline removals and const additions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19 01:08:16 -04:00
Joe Perches
f32138319c net: ipv6: Standardize prefixes for message logging
Add #define pr_fmt(fmt) as appropriate.

Add "IPv6: " to appropriate files.

Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG).
Standardize on "%s: " not "%s(): " when emitting __func__.
Use "%s: ", __func__ instead of embedding function name.
Coalesce formats, align arguments.

ADDRCONF output is now prefixed with "IPv6: "

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 01:01:03 -04:00
Joe Perches
e87cc4728f net: Convert net_ratelimit uses to net_<level>_ratelimited
Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15 13:45:03 -04:00
David S. Miller
56845d78ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/atheros/atlx/atl1.c
	drivers/net/ethernet/atheros/atlx/atl1.h

Resolved a conflict between a DMA error bug fix and NAPI
support changes in the atl1 driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:19:04 -04:00
Eric Dumazet
95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Gao feng
1716a96101 ipv6: fix problem with expired dst cache
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.

Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.

rt6_check_expired check if rt6_info.dst.from is expired.

ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

ip6_dst_destroy release the ort.

Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.

Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.

Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 12:58:29 -04:00
Shmulik Ladkani
2173bff5dc ipv6: Fix 'inet6_rtm_getroute' to release 'rt->dst' in case of 'alloc_skb' failure
In 72331bc [ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be
consistent with ipv4] the code of 'inet6_rtm_getroute()' was re-ordered
such that the reference to 'rt->dst' is incremented prior skb
allocation.

Hence, if 'alloc_skb()' fails, must drop a reference from 'rt->dst'.
Add the missing 'dst_release()' call.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 05:25:51 -04:00
David S. Miller
c78679e8f3 ipv6: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:43 -04:00
Shmulik Ladkani
72331bc0cd ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4
In IPv4, if an RTA_IIF attribute is specified within an RTM_GETROUTE
message, then a route is searched as if a packet was received on the
specified 'iif' interface.

However in IPv6, RTA_IIF is not interpreted in the same way:
'inet6_rtm_getroute()' always calls 'ip6_route_output()', regardless the
RTA_IIF attribute.

As a result, in IPv6 there's no way to use RTM_GETROUTE in order to look
for a route as if a packet was received on a specific interface.

Fix 'inet6_rtm_getroute()' so that RTA_IIF is interpreted as "lookup a
route as if a packet was received on the specified interface", similar
to IPv4's 'inet_rtm_getroute()' interpretation.

Reported-by: Ami Koren <amikoren@yahoo.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 17:29:40 -04:00
Eric Dumazet
94f826b807 net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()
Commit f2c31e32b3 (net: fix NULL dereferences in check_peer_redir() )
added a regression in rt6_fill_node(), leading to rcu_read_lock()
imbalance.

Thats because NLA_PUT() can make a jump to nla_put_failure label.

Fix this by using nla_put()

Many thanks to Ben Greear for his help

Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-27 18:48:35 -04:00
David S. Miller
4da0bd7365 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-18 23:29:41 -04:00
Eric Dumazet
122bdf67f1 ipv6: fix icmp6_dst_alloc()
commit 87a115783 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()

Many thanks to Dave Jones for providing a very complete bug report.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 01:53:42 -07:00
David S. Miller
a7563f342d ipv6: Use ipv6_addr_any()
Suggested by YOSHIFUJI Hideaki.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-26 16:29:16 -05:00
David S. Miller
39232973b7 ipv4/ipv6: Prepare for new route gateway semantics.
In the future the ipv4/ipv6 route gateway will take on two types
of values:

1) INADDR_ANY/IN6ADDR_ANY, for local network routes, and in this case
   the neighbour must be obtained using the destination address in
   ipv4/ipv6 header as the lookup key.

2) Everything else, the actual nexthop route address.

So if the gateway is not inaddr-any we use it, otherwise we must use
the packet's destination address.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-26 15:22:32 -05:00
RongQing.Li
252c3d84ed ipv6: release idev when ip6_neigh_lookup failed in icmp6_dst_alloc
release idev when ip6_neigh_lookup failed in icmp6_dst_alloc

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-13 10:10:46 -08:00
Josh Hunt
32b293a53d IPv6: Avoid taking write lock for /proc/net/ipv6_route
During some debugging I needed to look into how /proc/net/ipv6_route
operated and in my digging I found its calling fib6_clean_all() which uses
"write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
found this on 2.6.32, but reading the code I believe the same basic idea
exists currently. Looking at the rtnetlink code they are only calling
"read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
reading from proc isn't the recommended way of fetching the ipv6 route
table; taking a write lock seems unnecessary and would probably cause
network performance issues.

To verify this I loaded up the ipv6 route table and then ran iperf in 3
cases:
  * doing nothing
  * reading ipv6 route table via proc
    (while :; do cat /proc/net/ipv6_route > /dev/null; done)
  * reading ipv6 route table via rtnetlink
    (while :; do ip -6 route show table all > /dev/null; done)

* Load the ipv6 route table up with:
  * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done

* iperf commands:
  * client: iperf -i 1 -V -c <ipv6 addr>
  * server: iperf -V -s

* iperf results - 3 runs each (in Mbits/sec)
  * nothing: client: 927,927,927 server: 927,927,927
  * proc: client: 179,97,96,113 server: 142,112,133
  * iproute: client: 928,927,928 server: 927,927,927

lock_stat shows taking the write lock is causing the slowdown. Using this
info I decided to write a version of fib6_clean_all() which replaces
write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
this new function I see the same results as with my rtnetlink iperf test.

Signed-off-by: Josh Hunt <joshhunt00@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 17:07:33 -05:00
David S. Miller
8ade06c616 ipv6: Fix neigh lookup using NULL device.
In some of the rt6_bind_neighbour() call sites, it hasn't hooked
up the rt->dst.dev pointer yet, so we'd deref a NULL pointer when
obtaining dev->ifindex for the neighbour hash function computation.

Just pass the netdevice explicitly in to fix this problem.

Reported-by: Bjarke Istrup Pedersen <gurligebis@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 18:51:57 -05:00
David S. Miller
346f870b8a ipv6: Report TCP timetstamp info in cacheinfo just like ipv4 does.
I missed this while adding ipv6 support to inet_peer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 15:22:33 -05:00
David S. Miller
d191854282 ipv6: Kill rt6i_dev and rt6i_expires defines.
It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.

And it makes grepping for dst_entry member uses much harder too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28 20:19:20 -05:00
David S. Miller
f83c7790dc ipv6: Create fast inline ipv6 neigh lookup just like ipv4.
Also, create and use an rt6_bind_neighbour() in net/ipv6/route.c to
consolidate some common logic.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28 15:41:23 -05:00
David S. Miller
c159d30c59 ipv6: Kill useless route tracing bits in net/ipv6/route.c
RDBG() wasn't even used, and the messages printed by RT6_DEBUG() were
far from useful.  Just get rid of all this stuff, we can replace it
with something more suitable if we want.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-26 15:24:36 -05:00
David S. Miller
c5e1fd8cca Merge branch 'nf-next' of git://1984.lsi.us.es/net-next 2011-12-25 02:21:45 -05:00
David S. Miller
b26e478f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/freescale/fsl_pq_mdio.c
	net/batman-adv/translation-table.c
	net/ipv6/route.c
2011-12-16 02:11:14 -05:00
David S. Miller
bb3c36863e ipv6: Check dest prefix length on original route not copied one in rt6_alloc_cow().
After commit 8e2ec63917 ("ipv6: don't
use inetpeer to store metrics for routes.") the test in rt6_alloc_cow()
for setting the ANYCAST flag is now wrong.

'rt' will always now have a plen of 128, because it is set explicitly
to 128 by ip6_rt_copy.

So to restore the semantics of the test, check the destination prefix
length of 'ort'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 17:35:06 -05:00
David S. Miller
b43faac690 ipv6: If neigh lookup fails during icmp6 dst allocation, propagate error.
Don't just succeed with a route that has a NULL neighbour attached.
This follows the behavior of addrconf_dst_alloc().

Allowing this kind of route to end up with a NULL neigh attached will
result in packet drops on output until the route is somehow
invalidated, since nothing will meanwhile try to lookup the neigh
again.

A statistic is bumped for the case where we see a neigh-less route on
output, but the resulting packet drop is otherwise silent in nature,
and frankly it's a hard error for this to happen and ipv6 should do
what ipv4 does which is say something in the kernel logs.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 16:51:51 -05:00
David S. Miller
87a115783e ipv6: Move xfrm_lookup() call down into icmp6_dst_alloc().
And return error pointers.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 17:04:13 -05:00
David S. Miller
8f0315190d ipv6: Make third arg to anycast_dst_alloc() bool.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 16:48:14 -05:00
David Miller
2721745501 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:19 -05:00
Florian Westphal
ea6e574e34 ipv6: add ip6_route_lookup
like rt6_lookup, but allows caller to pass in flowi6 structure.
Will be used by the upcoming ipv6 netfilter reverse path filter
match.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-04 22:44:07 +01:00
David S. Miller
04a6f4417b ipv6: Kill ndisc_get_neigh() inline helper.
It's only used in net/ipv6/route.c and the NULL device check is
superfluous for all of the existing call sites.

Just expand the __ndisc_lookup_errno() call at each location.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-03 18:29:30 -05:00
David S. Miller
3830847396 ipv6: Various cleanups in route.c
1) x == NULL --> !x
2) x != NULL --> x
3) (x&BIT) --> (x & BIT)
4) (BIT1|BIT2) --> (BIT1 | BIT2)
5) proper argument and struct member alignment

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-03 18:02:47 -05:00