Commit Graph

92 Commits

Author SHA1 Message Date
Eric Dumazet
31e6d363ab net: correct off-by-one write allocations reports
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:12 -07:00
Eric Dumazet
c564039fd8 net: sk_wmem_alloc has initial value of one, not zero
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

Some protocols check sk_wmem_alloc value to determine if a timer
must delay socket deallocation. We must take care of the sk_wmem_alloc
value being one instead of zero when no write allocations are pending.

Reported by Ingo Molnar, and full diagnostic from David Miller.

This patch introduces three helpers to get read/write allocations
and a followup patch will use these helpers to report correct
write allocations to user.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 04:31:25 -07:00
Patrick McHardy
5b54814022 net: use symbolic values for ndo_start_xmit() return codes
Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively.

0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases
where its in direct proximity to one of the other values.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-13 01:18:50 -07:00
David S. Miller
6fd4777a1f Revert "rose: zero length frame filtering in af_rose.c"
This reverts commit 244f46ae6e.

Alan Cox did the research, and just like the other radio protocols
zero-length frames have meaning because at the top level ROSE is
X.25 PLP.

So this zero-length filtering is invalid.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-14 20:28:00 -07:00
Alan Cox
83e0bbcbe2 af_rose/x25: Sanity check the maximum user frame size
Otherwise we can wrap the sizes and end up sending garbage.

Closes #10423

Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-27 00:28:21 -07:00
Stephen Hemminger
3170c65687 rose: convert to network_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:02:04 -08:00
Stephen Hemminger
d289d120b4 rose: convert to internal net_device_stats
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:02:02 -08:00
Linus Torvalds
0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
David S. Miller
5b9ab2ec04 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/hp-plus.c
	drivers/net/wireless/ath5k/base.c
	drivers/net/wireless/ath9k/recv.c
	net/wireless/reg.c
2008-11-26 23:48:40 -08:00
Bernard Pidoux
244f46ae6e rose: zero length frame filtering in af_rose.c
Since changeset e79ad711a0 from  mainline,
>From David S. Miller,
empty packet can be transmitted on connected socket for datagram protocols.

However, this patch broke a high level application using ROSE network protocol with connected datagram.

Bulletin Board Stations perform bulletins forwarding between BBS stations via ROSE network using a forward protocol.
Now, if for some reason, a buffer in the application software happens to be empty at a specific moment,
ROSE sends an empty packet via unfiltered packet socket.
When received, this ROSE packet introduces perturbations of data exchange of BBS forwarding,
for the application message forwarding protocol is waiting for something else.
We agree that a more careful programming of the application protocol would avoid this situation and we are
willing to debug it.
But, as an empty frame is no use and does not have any meaning for ROSE protocol,
we may consider filtering zero length data both when sending and receiving socket data.

The proposed patch repaired BBS data exchange through ROSE network that were broken since 2.6.22.11 kernel.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 00:56:20 -08:00
David Howells
c2a2b8d3b2 CRED: Wrap task credential accesses in the ROSE protocol
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:08 +11:00
Alexey Dobriyan
6d9f239a1e net: '&' redux
I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.

So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 18:21:05 -08:00
David S. Miller
cf508b1211 netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep.
The new address list lock needs to handle the same device layering
issues that the _xmit_lock one does.

This integrates work done by Patrick McHardy.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:16:42 -07:00
YOSHIFUJI Hideaki
721499e893 netns: Use net_eq() to compare net-namespaces for optimization.
Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 22:34:43 -07:00
David S. Miller
e8a0464cc9 netdev: Allocate multiple queues for TX.
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:00 -07:00
David S. Miller
c773e847ea netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue.
Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:13:53 -07:00
Bernard Pidoux
fe2c802ab6 rose: improving AX25 routing frames via ROSE network
ROSE network is organized through nodes connected via hamradio or Internet.
AX25 packet radio frames sent to a remote ROSE address destination are routed
through these nodes.

Without the present patch, automatic routing mechanism did not work optimally
due to an improper parameter checking.

rose_get_neigh() function is called either by rose_connect() or by
rose_route_frame().

In the case of a call from rose_connect(), f0 timer is checked to find if a connection
is already pending. In that case it returns the address of the neighbour, or returns a NULL otherwise.

When called by rose_route_frame() the purpose was to route a packet AX25 frame
through an adjacent node given a destination rose address.
However, in that case, t0 timer checked does not indicate if the adjacent node
is actually connected even if the timer is not null. Thus, for each frame sent, the
function often tried to start a new connexion even if the adjacent node was already connected.

The patch adds a "new" parameter that is true when the function is called by
rose route_frame().
This instructs rose_get_neigh() to check node parameter "restarted". 
If restarted is true it means that the route to the destination address is opened via a neighbour
node already connected.
If "restarted" is false the function returns a NULL.
In that case the calling function will initiate a new connection as before.

This results in a fast routing of frames, from nodes to nodes, until
destination is reached, as originaly specified by ROSE protocole.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 17:08:32 -07:00
David S. Miller
44ccff1f53 rose: Use sock_graft() and remove bogus sk_socket and sk_sleep init.
This is the rose variant of changeset
9375cb8a12
("ax25: Use sock_graft() and remove bogus sk_socket and sk_sleep init.")

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 02:39:21 -07:00
Bernard Pidoux
f37f2c62a2 rose: Wrong list_lock argument in rose_node seqops
In rose_node_start() as well as in rose_node_stop() __acquires() and
spin_lock_bh() were wrongly passing rose_neigh_list_lock instead of
rose_node_list_lock arguments.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 17:03:22 -07:00
Bernard Pidoux
047f7617eb [ROSE]: Fix soft lockup wrt. rose_node_list_lock
[ INFO: possible recursive locking detected ]
2.6.25 #3
---------------------------------------------
ax25ipd/3811 is trying to acquire lock:
  (rose_node_list_lock){-+..}, at: [<f8d31f1a>] rose_get_neigh+0x1a/0xa0 
[rose]

but task is already holding lock:
  (rose_node_list_lock){-+..}, at: [<f8d31fed>] 
rose_route_frame+0x4d/0x620 [rose]

other info that might help us debug this:
6 locks held by ax25ipd/3811:
  #0:  (&tty->atomic_write_lock){--..}, at: [<c0259a1c>] 
tty_write_lock+0x1c/0x50
  #1:  (rcu_read_lock){..--}, at: [<c02aea36>] net_rx_action+0x96/0x230
  #2:  (rcu_read_lock){..--}, at: [<c02ac5c0>] netif_receive_skb+0x100/0x2f0
  #3:  (rose_node_list_lock){-+..}, at: [<f8d31fed>] 
rose_route_frame+0x4d/0x620 [rose]
  #4:  (rose_neigh_list_lock){-+..}, at: [<f8d31ff7>] 
rose_route_frame+0x57/0x620 [rose]
  #5:  (rose_route_list_lock){-+..}, at: [<f8d32001>] 
rose_route_frame+0x61/0x620 [rose]

stack backtrace:
Pid: 3811, comm: ax25ipd Not tainted 2.6.25 #3
  [<c0147e27>] print_deadlock_bug+0xc7/0xd0
  [<c0147eca>] check_deadlock+0x9a/0xb0
  [<c0149cd2>] validate_chain+0x1e2/0x310
  [<c0149b95>] ? validate_chain+0xa5/0x310
  [<c010a7d8>] ? native_sched_clock+0x88/0xc0
  [<c0149fa1>] __lock_acquire+0x1a1/0x750
  [<c014a5d1>] lock_acquire+0x81/0xa0
  [<f8d31f1a>] ? rose_get_neigh+0x1a/0xa0 [rose]
  [<c03201a3>] _spin_lock_bh+0x33/0x60
  [<f8d31f1a>] ? rose_get_neigh+0x1a/0xa0 [rose]
  [<f8d31f1a>] rose_get_neigh+0x1a/0xa0 [rose]
  [<f8d32404>] rose_route_frame+0x464/0x620 [rose]
  [<c031ffdd>] ? _read_unlock+0x1d/0x20
  [<f8d31fa0>] ? rose_route_frame+0x0/0x620 [rose]
  [<f8d1c396>] ax25_rx_iframe+0x66/0x3b0 [ax25]
  [<f8d1f42f>] ? ax25_start_t3timer+0x1f/0x40 [ax25]
  [<f8d1e65b>] ax25_std_frame_in+0x7fb/0x890 [ax25]
  [<c0320005>] ? _spin_unlock_bh+0x25/0x30
  [<f8d1bdf6>] ax25_kiss_rcv+0x2c6/0x800 [ax25]
  [<c02a4769>] ? sock_def_readable+0x59/0x80
  [<c014a8a7>] ? __lock_release+0x47/0x70
  [<c02a4769>] ? sock_def_readable+0x59/0x80
  [<c031ffdd>] ? _read_unlock+0x1d/0x20
  [<c02a4769>] ? sock_def_readable+0x59/0x80
  [<c02a4d3a>] ? sock_queue_rcv_skb+0x13a/0x1d0
  [<c02a4c45>] ? sock_queue_rcv_skb+0x45/0x1d0
  [<f8d1bb30>] ? ax25_kiss_rcv+0x0/0x800 [ax25]
  [<c02ac715>] netif_receive_skb+0x255/0x2f0
  [<c02ac5c0>] ? netif_receive_skb+0x100/0x2f0
  [<c02af05c>] process_backlog+0x7c/0xf0
  [<c02aeb0c>] net_rx_action+0x16c/0x230
  [<c02aea36>] ? net_rx_action+0x96/0x230
  [<c012bd53>] __do_softirq+0x93/0x120
  [<f8d2a68a>] ? mkiss_receive_buf+0x33a/0x3f0 [mkiss]
  [<c012be37>] do_softirq+0x57/0x60
  [<c012c265>] local_bh_enable_ip+0xa5/0xe0
  [<c0320005>] _spin_unlock_bh+0x25/0x30
  [<f8d2a68a>] mkiss_receive_buf+0x33a/0x3f0 [mkiss]
  [<c025ea37>] pty_write+0x47/0x60
  [<c025c620>] write_chan+0x1b0/0x220
  [<c0259a1c>] ? tty_write_lock+0x1c/0x50
  [<c011fec0>] ? default_wake_function+0x0/0x10
  [<c0259bea>] tty_write+0x12a/0x1c0
  [<c025c470>] ? write_chan+0x0/0x220
  [<c018bbc6>] vfs_write+0x96/0x130
  [<c0259ac0>] ? tty_write+0x0/0x1c0
  [<c018c24d>] sys_write+0x3d/0x70
  [<c0104d1e>] sysenter_past_esp+0x5f/0xa5
  =======================
BUG: soft lockup - CPU#0 stuck for 61s! [ax25ipd:3811]

Pid: 3811, comm: ax25ipd Not tainted (2.6.25 #3)
EIP: 0060:[<c010a9db>] EFLAGS: 00000246 CPU: 0
EIP is at native_read_tsc+0xb/0x20
EAX: b404aa2c EBX: b404a9c9 ECX: 017f1000 EDX: 0000076b
ESI: 00000001 EDI: 00000000 EBP: ecc83afc ESP: ecc83afc
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b7f5f000 CR3: 2cd8e000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
  [<c0204937>] delay_tsc+0x17/0x30
  [<c02048e9>] __delay+0x9/0x10
  [<c02127f6>] __spin_lock_debug+0x76/0xf0
  [<c0212618>] ? spin_bug+0x18/0x100
  [<c0147923>] ? __lock_contended+0xa3/0x110
  [<c0212998>] _raw_spin_lock+0x68/0x90
  [<c03201bf>] _spin_lock_bh+0x4f/0x60
  [<f8d31f1a>] ? rose_get_neigh+0x1a/0xa0 [rose]
  [<f8d31f1a>] rose_get_neigh+0x1a/0xa0 [rose]
  [<f8d32404>] rose_route_frame+0x464/0x620 [rose]
  [<c031ffdd>] ? _read_unlock+0x1d/0x20
  [<f8d31fa0>] ? rose_route_frame+0x0/0x620 [rose]
  [<f8d1c396>] ax25_rx_iframe+0x66/0x3b0 [ax25]
  [<f8d1f42f>] ? ax25_start_t3timer+0x1f/0x40 [ax25]
  [<f8d1e65b>] ax25_std_frame_in+0x7fb/0x890 [ax25]
  [<c0320005>] ? _spin_unlock_bh+0x25/0x30
  [<f8d1bdf6>] ax25_kiss_rcv+0x2c6/0x800 [ax25]
  [<c02a4769>] ? sock_def_readable+0x59/0x80
  [<c014a8a7>] ? __lock_release+0x47/0x70
  [<c02a4769>] ? sock_def_readable+0x59/0x80
  [<c031ffdd>] ? _read_unlock+0x1d/0x20
  [<c02a4769>] ? sock_def_readable+0x59/0x80
  [<c02a4d3a>] ? sock_queue_rcv_skb+0x13a/0x1d0
  [<c02a4c45>] ? sock_queue_rcv_skb+0x45/0x1d0
  [<f8d1bb30>] ? ax25_kiss_rcv+0x0/0x800 [ax25]
  [<c02ac715>] netif_receive_skb+0x255/0x2f0
  [<c02ac5c0>] ? netif_receive_skb+0x100/0x2f0
  [<c02af05c>] process_backlog+0x7c/0xf0
  [<c02aeb0c>] net_rx_action+0x16c/0x230
  [<c02aea36>] ? net_rx_action+0x96/0x230
  [<c012bd53>] __do_softirq+0x93/0x120
  [<f8d2a68a>] ? mkiss_receive_buf+0x33a/0x3f0 [mkiss]
  [<c012be37>] do_softirq+0x57/0x60
  [<c012c265>] local_bh_enable_ip+0xa5/0xe0
  [<c0320005>] _spin_unlock_bh+0x25/0x30
  [<f8d2a68a>] mkiss_receive_buf+0x33a/0x3f0 [mkiss]
  [<c025ea37>] pty_write+0x47/0x60
  [<c025c620>] write_chan+0x1b0/0x220
  [<c0259a1c>] ? tty_write_lock+0x1c/0x50
  [<c011fec0>] ? default_wake_function+0x0/0x10
  [<c0259bea>] tty_write+0x12a/0x1c0
  [<c025c470>] ? write_chan+0x0/0x220
  [<c018bbc6>] vfs_write+0x96/0x130
  [<c0259ac0>] ? tty_write+0x0/0x1c0
  [<c018c24d>] sys_write+0x3d/0x70
  [<c0104d1e>] sysenter_past_esp+0x5f/0xa5
  =======================

Since rose_route_frame() does not use rose_node_list we can safely
remove rose_node_list_lock spin lock here and let it be free for
rose_get_neigh().

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-20 15:58:07 -07:00
Bernard Pidoux
43837b1e6c rose: Socket lock was not released before returning to user space
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
xfbbd/3683 is leaving the kernel with locks still held!
1 lock held by xfbbd/3683:
  #0:  (sk_lock-AF_ROSE){--..}, at: [<c8cd1eb3>] rose_connect+0x73/0x420 [rose]

INFO: task xfbbd:3683 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfbbd         D 00000246     0  3683   3669
        c6965ee0 00000092 c02c5c40 00000246 c0f6b5f0 c0f6b5c0 c0f6b5f0 c0f6b5c0
        c0f6b614 c6965f18 c024b74b ffffffff c06ba070 00000000 00000000 00000001
        c6ab07c0 c012d450 c0f6b634 c0f6b634 c7b5bf10 c0d6004c c7b5bf10 c6965f40
Call Trace:
  [<c024b74b>] lock_sock_nested+0x6b/0xd0
  [<c012d450>] ? autoremove_wake_function+0x0/0x40
  [<c02488f1>] sock_fasync+0x41/0x150
  [<c0249e69>] sock_close+0x19/0x40
  [<c0175d54>] __fput+0xb4/0x170
  [<c0176018>] fput+0x18/0x20
  [<c017300e>] filp_close+0x3e/0x70
  [<c01744e9>] sys_close+0x69/0xb0
  [<c0103bda>] sysenter_past_esp+0x5f/0xa5
  =======================
INFO: lockdep is turned off.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-19 18:41:51 -07:00
David S. Miller
e1ec1b8ccd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/s2io.c
2008-04-02 22:35:23 -07:00
Jarek Poplawski
4965291acf [ROSE/AX25] af_rose: rose_release() fix
rose_release() doesn't release sockets properly, e.g. it skips
sock_orphan(), so OOPSes are triggered in sock_def_write_space(),
which was observed especially while ROSE skbs were kfreed from
ax25_frames_acked(). There is also sock_hold() and lock_sock() added -
similarly to ax25_release(). Thanks to Bernard Pidoux for substantial
help in debugging this problem.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Reported-and-tested-by: Bernard Pidoux <bpidoux@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-01 23:56:17 -07:00
YOSHIFUJI Hideaki
3b1e0a655f [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki
c346dca108 [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:53 +09:00
Eric Dumazet
95b7d924a5 [ROSE]: Supress sparse warnings
CHECK   net/rose/af_rose.c
net/rose/af_rose.c:125:11: warning: expensive signed divide
net/rose/af_rose.c:976:46: warning: expensive signed divide
net/rose/af_rose.c:1379:13: warning: context imbalance in 'rose_info_start' - wrong count at exit
net/rose/af_rose.c:1406:13: warning: context imbalance in 'rose_info_stop' - unexpected unlock
  CHECK   net/rose/rose_in.c
net/rose/rose_in.c:185:25: warning: expensive signed divide
  CHECK   net/rose/rose_route.c
net/rose/rose_route.c:997:46: warning: expensive signed divide
net/rose/rose_route.c:1070:13: warning: context imbalance in 'rose_node_start' - wrong count at exit
net/rose/rose_route.c:1093:13: warning: context imbalance in 'rose_node_stop' - unexpected unlock
net/rose/rose_route.c:1146:13: warning: context imbalance in 'rose_neigh_start' - wrong count at exit
net/rose/rose_route.c:1169:13: warning: context imbalance in 'rose_neigh_stop' - unexpected unlock
net/rose/rose_route.c:1229:13: warning: context imbalance in 'rose_route_start' - wrong count at exit
net/rose/rose_route.c:1252:13: warning: context imbalance in 'rose_route_stop' - unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:44 -08:00
Pavel Emelyanov
b5ccd792fa [NET]: Simple ctl_table to ctl_path conversions.
This patch includes many places, that only required
replacing the ctl_table-s with appropriate ctl_paths
and call register_sysctl_paths().

Nothing special was done with them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:07 -08:00
Pavel Emelyanov
b24b8a247f [NET]: Convert init_timer into setup_timer
Many-many code in the kernel initialized the timer->function
and  timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:35 -08:00
Pavel Emelyanov
78f150bf94 [ROSE]: Trivial compilation CONFIG_INET=n case
The rose_rebuild_header() consists only of some variables in
case INET=n, and gcc will warn us about it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-05 05:37:28 -08:00
Pavel Emelyanov
6257ff2177 [NET]: Forget the zero_it argument of sk_alloc()
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.

Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.

This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope 
this particular split helped.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:39:31 -07:00
Stephen Hemminger
3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Eric W. Biederman
881d966b48 [NET]: Make the device list and device lookups per namespace.
This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:10 -07:00
Eric W. Biederman
e9dc865340 [NET]: Make device event notification network namespace safe
Every user of the network device notifiers is either a protocol
stack or a pseudo device.  If a protocol stack that does not have
support for multiple network namespaces receives an event for a
device that is not in the initial network namespace it quite possibly
can get confused and do the wrong thing.

To avoid problems until all of the protocol stacks are converted
this patch modifies all netdev event handlers to ignore events on
devices that are not in the initial network namespace.

As the rest of the code is made network namespace aware these
checks can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:09 -07:00
Eric W. Biederman
1b8d7ae42d [NET]: Make socket creation namespace safe.
This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting.  By
virtue of this all socket create methods are touched.  In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.

Failing if we attempt to create a socket outside of the default
network namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.

Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.

[ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:07 -07:00
Eric W. Biederman
457c4cbc5a [NET]: Make /proc/net per network namespace
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:06 -07:00
Alexey Dobriyan
891e6a9312 [ROSE]: Fix rose.ko oops on unload
Commit a3d384029a aka
"[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
transformed rose_loopback_neigh var into statically allocated one.
However, on unload it will be kfree's which can't work.

Steps to reproduce:

	modprobe rose
	rmmod rose

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c014c664
*pde = 00000000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom
CPU:    0
EIP:    0060:[<c014c664>]    Not tainted VLI
EFLAGS: 00210086   (2.6.23-rc9 #3)
EIP is at kfree+0x48/0xa1
eax: 00000556   ebx: c1734aa0   ecx: f6a5e000   edx: f7082000
esi: 00000000   edi: f9a55d20   ebp: 00200287   esp: f6a5ef28
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rmmod (pid: 1823, ti=f6a5e000 task=f7082000 task.ti=f6a5e000)
Stack: f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00 
       00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000 
       f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000 
Call Trace:
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a51f3f>] rose_exit+0x4c/0xd5 [rose]
 [<c0132c60>] sys_delete_module+0x15e/0x186
 [<c014244a>] remove_vma+0x40/0x45
 [<c01025e6>] sysenter_past_esp+0x8f/0x99
 [<c012bacf>] trace_hardirqs_on+0x118/0x13b
 [<c01025b6>] sysenter_past_esp+0x5f/0x99
 =======================
Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff <8b> 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f 
EIP: [<c014c664>] kfree+0x48/0xa1 SS:ESP 0068:f6a5ef28

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-07 23:44:17 -07:00
YOSHIFUJI Hideaki
6140efb536 [NET] ROSE: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-07-19 10:44:40 +09:00
Philippe De Muyter
56b3d975bb [NET]: Make all initialized struct seq_operations const.
Make all initialized struct seq_operations in net/ const

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 23:07:31 -07:00
Pavel Emelianov
7562f876cd [NET]: Rework dev_base via list_head (v3)
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 15:13:45 -07:00
Ralf Baechle
75606dc69a [AX25/NETROM/ROSE]: Convert to use modern wait queue API
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:15 -07:00
Arnaldo Carvalho de Melo
27d7ff46a3 [SK_BUFF]: Introduce skb_copy_to_linear_data{_offset}
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-04-25 22:28:29 -07:00
Arnaldo Carvalho de Melo
d626f62b11 [SK_BUFF]: Introduce skb_copy_from_linear_data{_offset}
To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-04-25 22:28:23 -07:00
Arnaldo Carvalho de Melo
eeeb03745b [SK_BUFF]: More skb_put related conversions to skb_reset_transport_header
This is similar to the skb_reset_network_header(), i.e. at the point we reset
the transport header pointer/offset skb->tail is equal to skb->data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:59 -07:00
Arnaldo Carvalho de Melo
badff6d01a [SK_BUFF]: Introduce skb_reset_transport_header(skb)
For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more "complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:15 -07:00
Eric Dumazet
ae40eb1ef3 [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:04 -07:00
Ralf Baechle
2536b94a2d [ROSE]: Socket locking is a great invention.
Especially if you actually try to do it ;-)

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-12 15:53:33 -07:00
Ralf Baechle
6cee77dbf2 [ROSE]: Remove ourselves from waitqueue when receiving a signal
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-12 15:52:52 -07:00
Eric W. Biederman
0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Eric W. Biederman
2d4381dec3 [PATCH] sysctl: rose: remove unnecessary insert_at_head flag
The sysctl numbers used are unique so setting the insert_at_head flag serves
no semantic purpose.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:55 -08:00