Commit Graph

264531 Commits

Author SHA1 Message Date
Keith Busch
c7d36ab8fa NVMe: Fix uninitialized iod compiler warning
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-07-27 14:25:22 -04:00
Keith Busch
a0cadb85b8 NVMe: Do not set IO queue depth beyond device max
Set the depth for IO queues to the device's maximum supported queue
entries if the requested depth exceeds the device's capabilities.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-07-27 13:57:23 -04:00
Keith Busch
8fc23e032d NVMe: Set block queue max sectors
Set the max hw sectors in a namespace's request queue if the nvme device
has a max data transfer size.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-07-26 13:43:20 -04:00
Keith Busch
a42ceccef0 NVMe: use namespace id for nvme_get_features
The specification does not provide a use for command dword11 in the NVMe
Get Features command, but does use the NSID for some features.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-07-26 12:46:36 -04:00
Keith Busch
50af8baec4 NVMe: replace nvme_ns with nvme_dev for user admin
The function nvme_user_admin_command does not require a namespace to
proceed.  Replace with the nvme_dev structure so that it can be called
from contexts that do not have a namespace.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-07-26 12:24:36 -04:00
Keith Busch
5c42ea1643 NVMe: Fix nvme module init when nvme_major is set
register_blkdev returns 0 when given a valid major number.

Reported-by:Ross Zwisler <ross.zwisler@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-07-26 12:23:38 -04:00
Keith Busch
e9ef46369f NVMe: Set request queue logical block size
Sets the request queue logical block size with the block size of the
namespace.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-07-25 14:52:49 -04:00
Matthew Wilcox
df34813990 NVMe: Set number of queues correctly
The number of submission & completion queues should be set by calling
Set Features, not Get Features.

Reported-by: Kwok Kong <Kwok.Kong@idt.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-11 09:22:24 -05:00
Matthew Wilcox
366e8217e5 NVMe: Version 0.8
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 16:30:15 -05:00
Matthew Wilcox
4eeb9215a0 NVMe: Set queue flags correctly
QUEUE_FLAG_* are flags (other than QUEUE_FLAG_DEFAULT), so they cannot
be ORed together.  Set the queue flags using queue_flag_set_unlocked().

Reported-by: Donald Wood <donald.e.wood@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 16:29:23 -05:00
Matthew Wilcox
1c2ad9faaf NVMe: Simplify nvme_unmap_user_pages
By using the iod->nents field (the same way other I/O paths do), we can
avoid recalculating the number of sg entries at unmap time, and make
nvme_unmap_user_pages() easier to call.

Also, use the 'write' parameter instead of assuming DMA_FROM_DEVICE.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:54:22 -05:00
Matthew Wilcox
fe304c43c6 NVMe: Mark the end of the sg list
For user I/O and admin commands, we were forgetting to mark the end of
the SG list.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:54:14 -05:00
Matthew Wilcox
497421880a NVMe: Fix DMA mapping for admin commands
We were always mapping as DMA_FROM_DEVICE then unmapping with
DMA_TO_DEVICE which was clearly not correct.  Follow the same pattern as
nvme_submit_io() and key off the bottom bit of the opcode to determine
whether this is a read or a write.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:54:05 -05:00
Matthew Wilcox
ff976d724a NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT
IO_TIMEOUT is a little too generic and might be used by other parts of
the kernel in the future.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:53:54 -05:00
Matthew Wilcox
eca18b2394 NVMe: Merge the nvme_bio and nvme_prp data structures
The new merged data structure is called nvme_iod.  This improves performance
for mid-sized I/Os (in the 16k range) since we save a memory allocation.
It is also a slightly simpler interface to use.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:51:20 -05:00
Matthew Wilcox
5c1281a3bf NVMe: Change nvme_completion_fn to take a dev
The queue is only needed for some rare occasions, and it's more consistent
to pass the device around.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:51:00 -05:00
Matthew Wilcox
040a93b52a NVMe: Change get_nvmeq to take a dev instead of a namespace
Upcoming patches require calling get_nvmeq when we don't have a namespace.
Some callers already have the device in a local variable anyway.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:49:18 -05:00
Matthew Wilcox
c2f5b65020 NVMe: Simplify completion handling
Instead of encoding the handler type in the bottom two bits of the
per-completion context pointer, store the handler function as well
as the context pointer.  This gives us more flexibility and the code
is clearer.  It comes at the cost of an extra 8k of memory per queue,
but this feels like a reasonable price to pay.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2012-01-10 14:47:46 -05:00
Matthew Wilcox
010e646ba2 NVMe: Update Identify Controller data structure
The driver was still using an old definition of Identify Controller
which only came to light once we started using the 'number of namespaces'
field properly.

Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Reported-by: Khosrow Panah <Khosrow.Panah@idt.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 16:24:23 -04:00
Matthew Wilcox
f1938f6e1e NVMe: Implement doorbell stride capability
The doorbell stride allows devices to spread out their doorbells instead
of packing them tightly.  This feature was added as part of ECN 003.

This patch also enables support for more than 512 queues :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:05 -04:00
Matthew Wilcox
ce38c14957 NVMe: Version 0.7
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:05 -04:00
Matthew Wilcox
2b2c189687 NVMe: Don't probe namespace 0
ECN 001 documented that namespace 0 is not valid.  Sending an Identify
with CNS of 0 and Namespace of 0 is an undefined command.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Nisheeth Bhat
0d1bc91258 Fix calculation of number of pages in a PRP List
The existing calculation underestimated the number of pages required
as it did not take into account the pointer at the end of each page.
The replacement calculation may overestimate the number of pages required
if the last page in the PRP List is entirely full.  By using ->npages
as a counter as we fill in the pages, we ensure that we don't try to
free a page that was never allocated.

Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox
bc5fc7e4b2 NVMe: Create nvme_identify and nvme_get_features functions
Instead of open-coding calls to nvme_submit_admin_cmd, these
small wrappers are simpler to use (the patch removes 14 lines from
nvme_dev_add() for example).

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox
684f5c2025 NVMe: Fix memory leak in nvme_dev_add()
The driver was allocating 8k of memory, then freeing 4k of it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Nisheeth Bhat
d1a490e026 NVMe: Fix calls to dma_unmap_sg
dma_unmap_sg() must be called with the same 'nents' passed to
dma_map_sg(), not the number returned from dma_map_sg().

Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox
d0ba1e497b NVMe: Correct sg list setup in nvme_map_user_pages
Our SG list was constructed to always fill the entire first page, even
if that was more than the length of the I/O.  This is probably harmless,
but some IOMMUs might do something bad.

Correcting the first call to sg_set_page() made it look a lot closer to
the sg_set_page() in the loop, so fold the first call to sg_set_page()
into the loop.

Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox
6413214c5d Fix bug in NVME_IOCTL_SUBMIT_IO
Missing 'break' in the switch statement meant that we'd fall through
to the 'return -EINVAL' case.
2011-11-04 15:53:04 -04:00
Matthew Wilcox
6bbf1acdde NVMe: Rework ioctls
Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE
and ACTIVATE_FIRMWARE commands.  Replace them with a generic ADMIN_CMD
ioctl that can submit any admin command.

Add a new ID ioctl that returns the namespace ID of the queried device.
It corresponds to the SCSI Idlun ioctl.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox
eac623ba7a NVMe: Add the nvme thread to the wait queue before waking it up
If the I/O was not completed by a single NVMe command, we add the
bio to the congestion list and wake up the kthread to resubmit it.
But the kthread calls remove_wait_queue() unconditionally, which
will oops if it's not on the wait queue.  So add the kthread to
the wait queue before waking it up.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox
6f0f54499f NVMe: Return real error from nvme_create_queue
nvme_setup_io_queues() was assuming that a NULL return from
nvme_create_queue() was an out-of-memory error.  That's not necessarily
true; the adapter might return -EIO, for example.  Change the calling
convention to return an ERR_PTR on failure instead of NULL.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox
be5e094840 NVMe: Version 0.6
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox
184d2944cb NVMe: Add a few calling convention notes
For the benefit of reviewers, add comments to a few functions describing
their calling context

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox
b77954cbdd NVMe: Handle failures from memory allocations in nvme_setup_prps
If any of the memory allocations in nvme_setup_prps fail, handle it by
modifying the passed-in data length to reflect the number of bytes we are
actually able to send.  Also allow the caller to specify the GFP flags
they need; for user-initiated commands, we can use GFP_KERNEL allocations.

The various callers are updated to handle this possibility; the main
I/O path is already prepared for this possibility (as it may happen
due to nvme_map_bio being unable to map all the segments of the I/O).
The other callers return -ENOMEM instead of doing partial I/Os.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox
5aff9382dd NVMe: Use an IDA to allocate minor numbers
The current approach of using the namespace ID as the minor number
doesn't work when there are multiple adapters in the machine.  Rather
than statically partitioning the number of namespaces between adapters,
dynamically allocate minor numbers to namespaces as they are detected.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox
fd63e9ceee NVMe: Add include of delay.h for msleep
Previously it was being implicitly included through some other header file

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox
8de055350f NVMe: Add support for timing out I/Os
In the kthread, walk the list of outstanding I/Os and check they've not
hit the timeout.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox
21075bdee0 NVMe: Rename cancel_cmdid_data to cancel_cmdid
The trailing '_data' on the end was annoying and inconsistent.  Also, make
it actually return the data since this is needed for timing out commands.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox
09a58f5364 NVMe: Fix bug in error handling
When an I/O completed with an error, we would call bio_endio twice
(once with -EIO and once with 0).  Found by inspection.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox
22605f9681 NVMe: Time out initialisation after a few seconds
THe device reports (in its capability register) how long it will take
to initialise.  If that time elapses before the ready bit becomes set,
conclude the device is broken and refuse to initialise it.  Log a nice
error message so the user knows why we did nothing.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox
aba2080f3f NVMe: Fix warning in free_irq
We need to clear the affinity mask before calling free_irq()

Reported-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox
7f53f9d242 NVMe: Correct the Controller Configuration settings
The arbitration field was extended by one bit, shifting the shutdown
notification bits by one.  Also, the SQ/CQ entry size was made
configurable for future extensions.

Reported-by: Paul Luse <paul.e.luse@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox
8ef700678f NVMe: Version 0.5
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox
6c7d49455c NVMe: Change the definition of nvme_user_io
The read and write commands don't define a 'result', so there's no need
to copy it back to userspace.

Remove the ability of the ioctl to submit commands to a different
namespace; it's just asking for trouble, and the use case I have in mind
will be addressed througha  different ioctl in the future.  That removes
the need for both the block_shift and nsid arguments.

Check that the opcode is one of 'read' or 'write'.  Future opcodes may
be added in the future, but we will need a different structure definition
for them.

The nblocks field is redefined to be 0-based.  This allows the user to
request the full 65536 blocks.

Don't byteswap the reftag, apptag and appmask.  Martin Petersen tells
me these are calculated in big-endian and are transmitted to the device
in big-endian.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox
9d4af1b779 NVMe: Correct the definitions of two ioctls
NVME_IOCTL_SUBMIT_IO has a struct nvme_user_io, not a struct nvme_rw_command
as a parameter, and NVME_IOCTL_DOWNLOAD_FW is a Write, not a Read.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox
4948168280 NVMe: Add compat_ioctl
Make ioctls work for 32-bit applications on 64-bit kernels.  The structures
are defined to be the same for both 32- and 64-bit applications, so
we can use the same handler for both.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox
9ecdc94621 NVMe: Simplify queue lookup
Fill in all the num_possible_cpus() entries with duplicate pointers.
This reduces the complexity of the frequently-called get_nvmeq(), as
well as avoiding a bug in it when there are fewer queues than CPUs.

Reported-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox
3cb967c039 NVMe: Remove the kthread from the wait queue
Once there are no more bios on the congestion list, we can stop waking
up the nvme kthread every time a completion happens.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox
7523d834dd NVMe: Fix off-by-one when filling in PRP lists
If the last element in the PRP list fits on the end of the page, there's
no need to allocate an extra page to put that single element in.  It can
fit on the end of the page.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox
ac88c36a38 NVMe: Fix interpretation of 'Number of Namespaces' field
The spec says this is a 0s based value.  We don't need to handle the
maximal value because it's reserved to mean "every namespace".

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00