linux-kernel-test

Author	SHA1	Message	Date
Stefan Richter	1821bc19d5	firewire: core: fix crash in iso resource management This fixes a regression due to post 2.6.30 commit "firewire: core: do not DMA-map stack addresses" `6fdc037094`. As David Moore noted, a previously correct sizeof() expression became wrong since the commit changed its argument from an array to a pointer. This resulted in an oops in ohci_cancel_packet in the shared workqueue thread's context when an isochronous resource was to be freed. Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2009-09-05 15:59:34 +02:00
Jan Glauber	81bd5f6c96	crypto: sha-s390 - Fix warnings in import function That patch should fix the warnings. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2009-09-05 16:27:35 +10:00
Patrick McHardy	c9f1d0389b	net_sched: fix class grafting errno codes If the parent qdisc doesn't support classes, use EOPNOTSUPP. If the parent class doesn't exist, use ENOENT. Currently EINVAL is returned in both cases. Additionally check whether grafting is supported and remove a now unnecessary graft function from sch_ingress. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-04 23:10:15 -07:00
Brian Haley	b1f5719558	netlink: silence compiler warning CC net/netlink/genetlink.o net/netlink/genetlink.c: In function ‘genl_register_mc_group’: net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function From following the code 'err' is initialized, but set it to zero to silence the warning. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-04 20:36:52 -07:00
Steven Rostedt	85bac32c4a	ring-buffer: only enable ring_buffer_swap_cpu when needed Since the ability to swap the cpu buffers adds a small overhead to the recording of a trace, we only want to add it when needed. Only the irqsoff and preemptoff tracers use this feature, and both are not recommended for production kernels. This patch disables its use when neither irqsoff nor preemptoff is configured. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-04 19:42:22 -04:00
Steven Rostedt	62f0b3eb5c	ring-buffer: check for swapped buffers in start of committing Because the irqsoff tracer can swap an internal CPU buffer, it is possible that a swap happens between the start of the write and before the committing bit is set (the committing bit will disable swapping). This patch adds a check for this and will fail the write if it detects it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-04 19:38:42 -04:00
Steven Rostedt	e8165dbb03	tracing: report error in trace if we fail to swap latency buffer The irqsoff tracer will fail to swap the cpu buffer with the max buffer if it preempts a commit. Instead of ignoring this, this patch makes the tracer report it if the last max latency failed due to preempting a current commit. The output of the latency tracer will look like this: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 2.6.31-rc5 # -------------------------------------------------------------------- # latency: 112 us, #1/1, CPU#1 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # \| task: -4281 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: save_args # => ended at: __do_softirq # # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress Note the latency time and the functions that disabled the irqs or preemption will still be listed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-04 19:22:41 -04:00
Steven Rostedt	659372d3e4	tracing: add trace_array_printk for internal tracers to use This patch adds a trace_array_printk to allow a tracer to use the trace_printk on its own trace array. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-04 19:13:53 -04:00
Steven Rostedt	e77405ad80	tracing: pass around ring buffer instead of tracer The latency tracers (irqsoff and wakeup) can swap trace buffers on the fly. If an event is happening and has reserved data on one of the buffers, and the latency tracer swaps the global buffer with the max buffer, the result is that the event may commit the data to the wrong buffer. This patch changes the API to the trace recording to be recieve the buffer that was used to reserve a commit. Then this buffer can be passed in to the commit. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-04 18:59:39 -04:00
Steven Rostedt	f633903af2	tracing: make tracing_reset safe for external use Reseting the trace buffer without first disabling the buffer and waiting for any writers to complete, can corrupt the ring buffer. This patch makes the external version of tracing_reset safe from corruption by disabling the ring buffer and calling synchronize_sched. This version can no longer be called from interrupt context. But all those callers have been removed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-04 18:46:51 -04:00
Steven Rostedt	2f26ebd549	tracing: use timestamp to determine start of latency traces Currently the latency tracers reset the ring buffer. Unfortunately if a commit is in process (due to a trace event), this can corrupt the ring buffer. When this happens, the ring buffer will detect the corruption and then permanently disable the ring buffer. The bug does not crash the system, but it does prevent further tracing after the bug is hit. Instead of reseting the trace buffers, the timestamp of the start of the trace is used instead. The buffers will still contain the previous data, but the output will not count any data that is before the timestamp of the trace. Note, this only affects the static trace output (trace) and not the runtime trace output (trace_pipe). The runtime trace output does not make sense for the latency tracers anyway. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-04 18:44:22 -04:00
Vlad Yasevich	f1751c57f7	sctp: Catch bogus stream sequence numbers Since our TSN map is capable of holding at most a 4K chunk gap, there is no way that during this gap, a stream sequence number (unsigned short) can wrap such that the new number is smaller then the next expected one. If such a case is encountered, this is a protocol violation. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:03 -04:00
Wei Yongjun	be2971438d	sctp: remove dup code in net/sctp/output.c Use sctp_packet_reset() instead of dup code. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:02 -04:00
Wei Yongjun	9237ccbc0b	sctp: turn flags in 'struct sctp_association' into bit fields This shrinks the size of struct sctp_association a little. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:02 -04:00
Bhaskar Dutta	723884339f	sctp: Sysctl configuration for IPv4 Address Scoping This patch introduces a new sysctl option to make IPv4 Address Scoping configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>. In networking environments where DNAT rules in iptables prerouting chains convert destination IP's to link-local/private IP addresses, SCTP connections fail to establish as the INIT chunk is dropped by the kernel due to address scope match failure. For example to support overlapping IP addresses (same IP address with different vlan id) a Layer-5 application listens on link local IP's, and there is a DNAT rule that maps the destination IP to a link local IP. Such applications never get the SCTP INIT if the address-scoping draft is strictly followed. This sysctl configuration allows SCTP to function in such unconventional networking environments. Sysctl options: 0 - Disable IPv4 address scoping draft altogether 1 - Enable IPv4 address scoping (default, current behavior) 2 - Enable address scoping but allow IPv4 private addresses in init/init-ack 3 - Enable address scoping but allow IPv4 link local address in init/init-ack Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:01 -04:00
Vlad Yasevich	8da645e101	sctp: Get rid of an extra routing lookup when adding a transport. We used to perform 2 routing lookups for a new transport: one just for path mtu detection, and one to actually route to destination and path mtu update when sending a packet. There is no point in doing both of them, especially since the first one just for path mtu doesn't take into account source address and sometimes gives the wrong route, causing path mtu updates anyway. We now do just the one call to do both route to destination and get path mtu updates. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:01 -04:00
Vlad Yasevich	a803c94230	sctp: Turn flags in 'sctp_packet' into bit fields This shrinks the size of sctp_packet a little. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:01 -04:00
Vlad Yasevich	4007cc88ce	sctp: Correctly track if AUTH has been bundled. We currently track if AUTH has been bundled using the 'auth' pointer to the chunk. However, AUTH is disallowed after DATA is already in the packet, so we need to instead use the 'has_auth' field. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:00 -04:00
Wei Yongjun	d521c08f4c	sctp: fix to reset packet information after packet transmit The packet information does not reset after packet transmit, this may cause some problems such as following DATA chunk be sent without AUTH chunk, even if the authentication of DATA chunk has been requested by the peer. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:00 -04:00
Vlad Yasevich	31b02e1549	sctp: Failover transmitted list on transport delete Add-IP feature allows users to delete an active transport. If that transport has chunks in flight, those chunks need to be moved to another transport or association may get into unrecoverable state. Reported-by: Rafael Laufer <rlaufer@cisco.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:00 -04:00
Vlad Yasevich	f68b2e05f3	sctp: Fix SCTP_MAXSEG socket option to comply to spec. We had a bug that we never stored the user-defined value for MAXSEG when setting the value on an association. Thus future PMTU events ended up re-writing the frag point and increasing it past user limit. Additionally, when setting the option on the socket/endpoint, we effect all current associations, which is against spec. Now, we store the user 'maxseg' value along with the computed 'frag_point'. We inherit 'maxseg' from the socket at association creation and use it as an upper limit for 'frag_point' when its set. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:00 -04:00
Vlad Yasevich	cb95ea32a4	sctp: Don't do NAGLE delay on large writes that were fragmented small SCTP will delay the last part of a large write due to NAGLE, if that part is smaller then MTU. Since we are doing large writes, we might as well send the last portion now instead of waiting untill the next large write happens. The small portion will be sent as is regardless, so it's better to not delay it. This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com> and Doug Graham <dgraham@nortel.com>. Many thanks go out to them. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:59 -04:00
Vlad Yasevich	b29e790728	sctp: Nagle delay should be based on path mtu The decision to delay due to Nagle should be based on the path mtu and future packet size. We currently incorrectly base it on 'frag_point' which is the SCTP DATA segment size, and also we do not count DATA chunk header overhead in the computation. This actuall allows situations where a user can set low 'frag_point', and then send small messages without delay. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:59 -04:00
Vlad Yasevich	d4d6fb5787	sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN. We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN. This results in an hung association if the remote only uses SHUTDOWNs (which it's allowed to do) to acknowlege DATA when closing. The reason for that is that we simply honor the a_rwnd from the sack, but since we faked it to be 0, we enter 0-window probing. The fix is to use the peers old rwnd and add our flight size to it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:59 -04:00
Vlad Yasevich	4d3c46e683	sctp: drop a_rwnd to 0 when receive buffer overflows. SCTP has a problem that when small chunks are used, it is possible to exhaust the receiver buffer without fully closing receive window. This happens due to all overhead that we have account for with small messages. To fix this, when receive buffer is exceeded, we'll drop the window to 0 and save the 'drop' portion. When application starts reading data and freeing up recevie buffer space, we'll wait until we've reached the 'drop' window and then add back this 'drop' one mtu at a time. This worked well in testing and under stress produced rather even recovery. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:59 -04:00
Vlad Yasevich	33ce828131	sctp: Clear fast_recovery on the transport when T3 timer expires. If T3 timer expires, we are retransmitting data due to timeout any any fast recovery is null and void. We can clear the fast recovery flag. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:58 -04:00
Vlad Yasevich	b9f8478682	sctp: Fix error count increments that were results of HEARTBEATS SCTP RFC 4960 states that unacknowledged HEARTBEATS count as errors agains a given transport or endpoint. As such, we should increment the error counts for only for unacknowledged HB, otherwise we detect failure too soon. This goes for both the overall error count and the path error count. Now, there is a difference in how the detection is done between the two. The path error detection is done after the increment, so to detect it properly, we actually need to exceed the path threshold. The overall error detection is done _BEFORE_ the increment. Thus to detect the failure, it's enough for the error count to match the threshold. This is why all the state functions use '>=' to detect failure, while path detection uses '>'. Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first proposed patches to fix this issue and made me re-read the spec and the code to figure out how this cruft really works. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:58 -04:00
Alexey Dobriyan	d71a09ed55	sctp: use proc_create() create_proc_entry() is deprecated (not formally, though). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:58 -04:00
Wei Yongjun	dadb50cc1a	sctp: fix check the chunk length of received HEARTBEAT-ACK chunk The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK that contains the Heartbeat Information field copied from the received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk must have a length of: sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t) A badly formatted HB-ACK chunk, it is possible that we may access invalid memory. We should really make sure that the chunk format is what we expect, before attempting to touch the data. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:58 -04:00
Wei Yongjun	a2f36eec56	sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN If Cumulative TSN Ack field of SHUTDOWN chunk is less than the Cumulative TSN Ack Point then drop the SHUTDOWN chunk. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:57 -04:00
Vlad Yasevich	9c5c62be2f	sctp: Send user messages to the lower layer as one Currenlty, sctp breaks up user messages into fragments and sends each fragment to the lower layer by itself. This means that for each fragment we go all the way down the stack and back up. This also discourages bundling of multiple fragments when they can fit into a sigle packet (ex: due to user setting a low fragmentation threashold). We introduce a new command SCTP_CMD_SND_MSG and hand the whole message down state machine. The state machine and the side-effect parser will cork the queue, add all chunks from the message to the queue, and then un-cork the queue thus causing the chunks to get transmitted. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:57 -04:00
Vlad Yasevich	5d7ff261ef	sctp: Try to encourage SACK bundling with DATA. If the association has a SACK timer pending and now DATA queued to be send, we'll try to bundle the SACK with the next application send. As such, try encourage bundling by accounting for SACK in the size of the first chunk fragment. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:56 -04:00
Vlad Yasevich	e83963b769	sctp: Generate SACKs when actually sending outbound DATA We are now trying to bundle SACKs when we have outbound DATA to send. However, there are situations where this outbound DATA will not be sent (due to congestion or available window). In such cases it's ok to wait for the timer to expire. This patch refactors the sending code so that betfore attempting to bundle the SACK we check to see if the DATA will actually be transmitted. Based on eirlier works for Doug Graham <dgraham@nortel.com> and Wei Youngjun <yjwei@cn.fujitsu.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:56 -04:00
Vlad Yasevich	3e62abf92f	sctp: Fix data segmentation with small frag_size Since an application may specify the maximum SCTP fragment size that all data should be fragmented to, we need to fix how we do segmentation. Right now, if a user specifies a small fragment size, the segment size can go negative in the presence of AUTH or COOKIE_ECHO bundling. What we need to do is track the largest possbile DATA chunk that can fit into the mtu. Then if the fragment size specified is bigger then this maximum length, we'll shrink it down. Otherwise, we just use the smaller segment size without changing it further. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:56 -04:00
Vlad Yasevich	bec9640bb0	sctp: Disallow new connection on a closing socket If a socket has a lot of association that are in the process of of being closed/aborted, it is possible for a remote to establish new associations during the time period that the old ones are shutting down. If this was a result of a close() call, there will be no socket and will cause a memory leak. We'll prevent this by setting the socket state to CLOSING and disallow new associations when in this state. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:56 -04:00
Doug Graham	af87b823ca	sctp: Fix piggybacked ACKs This patch corrects the conditions under which a SACK will be piggybacked on a DATA packet. The previous condition was incorrect due to a misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the following paragraph from section 6.2 had not been implemented correctly: Before an endpoint transmits a DATA chunk, if any received DATA chunks have not been acknowledged (e.g., due to delayed ack), the sender should create a SACK and bundle it with the outbound DATA chunk, as long as the size of the final SCTP packet does not exceed the current MTU. See Section 6.2. When about to send a DATA chunk, the code now checks to see if the SACK timer is running. If it is, we know we have a SACK to send to the peer, so we append the SACK (assuming available space in the packet) and turn off the timer. For a simple request-response scenario, this will result in the SACK being bundled with the response, meaning the the SACK is received quickly by the client, and also meaning that no separate SACK packet needs to be sent by the server to acknowledge the request. Prior to this patch, a separate SACK packet would have been sent by the server SCTP only after its delayed-ACK timer had expired (usually 200ms). This is wasteful of bandwidth, and can also have a major negative impact on performance due the interaction of delayed ACKs with the Nagle algorithm. Signed-off-by: Doug Graham <dgraham@nortel.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:55 -04:00
Rami Rosen	b4e8c6a7e6	sctp: remove unused union (sctp_cmsg_data_t) definition This patch removes an unused union definition (sctp_cmsg_data_t) from include/net/sctp/user.h. Signed-off-by: Rami Rosen <rosenrami@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:55 -04:00
Vlad Yasevich	40187886bc	sctp: release cached route when the transport goes down. When the sctp transport is marked down, we can release the cached route and force a new lookup when attempting to use this transport for anything. This way, if a better route or source address is available, we'll try to use it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:55 -04:00
Wei Yongjun	3cd9749c0b	sctp: update the route for non-active transports after addresses are added Update the route and saddr entries for the non-active transports as some of the added addresses can be used as better source addresses, or may be there is a better route. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:55 -04:00
Wei Yongjun	44e65c1ef1	sctp: check the unrecognized ASCONF parameter before access it This patch fix to check the unrecognized ASCONF parameter before access it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:54 -04:00
Wei Yongjun	425e0f6852	sctp: avoid overwrite the return value of sctp_process_asconf_ack() The return value of sctp_process_asconf_ack() may be overwritten while process parameters with no error. This patch fixed the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:20:54 -04:00
Albin Tonnerre	4a88d44ab1	tracing: Remove mentioning of legacy latency_trace file from documentation The latency_trace file got removed a while back by commit `886b5b73d7` and has been replaced by the latency-format option. This patch fixes the documentation by reflecting this change. Changes since v1: - mention that the trace format is configurable through the latency-format option - Fix a couple mistakes related to the timestamps Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20090831204007.GE4237@pc-ras4041.res.insa> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-04 23:30:38 +02:00
Sunil Mushran	8379e7c46c	ocfs2: ocfs2_write_begin_nolock() should handle len=0 Bug introduced by mainline commit `e7432675f8` The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 14:28:31 -07:00
Li Zefan	c58b43218c	tracing/filters: Defer pred allocation, fix memory leak The predicates of an event and their filter structure are allocated when we create an event filter for the first time. These objects must be created once but each time we come with a new filter, we overwrite such pre-existing allocation, if any. Thus, this patch checks if the filter has already been allocated before going ahead. Spotted-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <4A9CB1BA.3060402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-04 23:22:33 +02:00
Ulrich Drepper	6b58e7f146	perf tools: Avoid unnecessary work in directory lookups This patch improves some (common) inefficiencies in the handling of directory lookups: - not using the d_type information returned by the kernel - constructing (absolute) paths for file operation even though directory-relative operations using the *at functions is possible There are more places to fix but this is a start. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20090904193951.GB6186@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 21:50:17 +02:00
Mikulas Patocka	ae0b7448e9	dm snapshot: fix on disk chunk size validation Fix some problems seen in the chunk size processing when activating a pre-existing snapshot. For a new snapshot, the chunk size can either be supplied by the creator or a default value can be used. For an existing snapshot, the chunk size in the snapshot header on disk should always be used. If someone attempts to load an existing snapshot and has the 'default chunk size' option set, the kernel uses its default value even when it is incorrect for the snapshot being loaded. This patch ensures the correct on-disk value is always used. Secondly, when the code does use the chunk size stored on the disk it is prudent to revalidate it, so the code can exit cleanly if it got corrupted as happened in https://bugzilla.redhat.com/show_bug.cgi?id=461506 . Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-09-04 20:40:43 +01:00
Mikulas Patocka	2defcc3fb4	dm exception store: split set_chunk_size Break the function set_chunk_size to two functions in preparation for the fix in the following patch. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-09-04 20:40:41 +01:00
Mikulas Patocka	61578dcd3f	dm snapshot: fix header corruption race on invalidation If a persistent snapshot fills up, a race can corrupt the on-disk header which causes a crash on any future attempt to activate the snapshot (typically while booting). This patch fixes the race. When the snapshot overflows, __invalidate_snapshot is called, which calls snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that calls write_header. write_header constructs the new header in the "area" location. Concurrently, an existing kcopyd job may finish, call copy_callback and commit_exception method, that goes to persistent_commit_exception. persistent_commit_exception doesn't do locking, relying on the fact that callbacks are single-threaded, but it can race with snapshot invalidation and overwrite the header that is just being written while the snapshot is being invalidated. The result of this race is a corrupted header being written that can lead to a crash on further reactivation (if chunk_size is zero in the corrupted header). The fix is to use separate memory areas for each. See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506 Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-09-04 20:40:39 +01:00
Mikulas Patocka	02d2fd31de	dm snapshot: refactor zero_disk_area to use chunk_io Refactor chunk_io to prepare for the fix in the following patch. Pass an area pointer to chunk_io and simplify zero_disk_area to use chunk_io. No functional change. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-09-04 20:40:37 +01:00
Jonathan Brassow	7ec23d5094	dm log: userspace add luid to distinguish between concurrent log instances Device-mapper userspace logs (like the clustered log) are identified by a universally unique identifier (UUID). This identifier is used to associate requests from the kernel to a specific log in userspace. The UUID must be unique everywhere, since multiple machines may use this identifier when communicating about a particular log, as is the case for cluster logs. Sometimes, device-mapper/LVM may re-use a UUID. This is the case during pvmoves, when moving from one segment of an LV to another, or when resizing a mirror, etc. In these cases, a new log is created with the same UUID and loaded in the "inactive" slot. When a device-mapper "resume" is issued, the "live" table is deactivated and the new "inactive" table becomes "live". (The "inactive" table can also be removed via a device-mapper 'clear' command.) The above two issues were colliding. More than one log was being created with the same UUID, and there was no way to distinguish between them. So, sometimes the wrong log would be swapped out during the exchange. The solution is to create a locally unique identifier, 'luid', to go along with the UUID. This new identifier is used to determine exactly which log is being referenced by the kernel when the log exchange is made. The identifier is not universally safe, but it does not need to be, since create/destroy/suspend/resume operations are bound to a specific machine; and these are the operations that make up the exchange. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-09-04 20:40:34 +01:00

... 8 9 10 11 12 ...

160490 Commits