[INET]: Remove per bucket rwlock in tcp/dccp ehash table.

As done two years ago on IP route cache table (commit
22c047ccbc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Eric Dumazet
2007-11-07 02:40:20 -08:00
committed by David S. Miller
parent efac52762b
commit 230140cffa
8 changed files with 106 additions and 37 deletions

View File

@ -2049,8 +2049,9 @@ static void *established_get_first(struct seq_file *seq)
struct sock *sk;
struct hlist_node *node;
struct inet_timewait_sock *tw;
rwlock_t *lock = inet_ehash_lockp(&tcp_hashinfo, st->bucket);
read_lock_bh(&tcp_hashinfo.ehash[st->bucket].lock);
read_lock_bh(lock);
sk_for_each(sk, node, &tcp_hashinfo.ehash[st->bucket].chain) {
if (sk->sk_family != st->family) {
continue;
@ -2067,7 +2068,7 @@ static void *established_get_first(struct seq_file *seq)
rc = tw;
goto out;
}
read_unlock_bh(&tcp_hashinfo.ehash[st->bucket].lock);
read_unlock_bh(lock);
st->state = TCP_SEQ_STATE_ESTABLISHED;
}
out:
@ -2094,11 +2095,11 @@ get_tw:
cur = tw;
goto out;
}
read_unlock_bh(&tcp_hashinfo.ehash[st->bucket].lock);
read_unlock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket));
st->state = TCP_SEQ_STATE_ESTABLISHED;
if (++st->bucket < tcp_hashinfo.ehash_size) {
read_lock_bh(&tcp_hashinfo.ehash[st->bucket].lock);
read_lock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket));
sk = sk_head(&tcp_hashinfo.ehash[st->bucket].chain);
} else {
cur = NULL;
@ -2206,7 +2207,7 @@ static void tcp_seq_stop(struct seq_file *seq, void *v)
case TCP_SEQ_STATE_TIME_WAIT:
case TCP_SEQ_STATE_ESTABLISHED:
if (v)
read_unlock_bh(&tcp_hashinfo.ehash[st->bucket].lock);
read_unlock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket));
break;
}
}