linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-09-24	nvme/scsi: Remove power management support	Andy Lutomirski
	As far as I can tell, there is basically nothing correct about this code. It misinterprets npss (off-by-one). It hardcodes a bunch of power states, which is nonsense, because they're all just indices into a table that software needs to parse. It completely ignores the distinction between operational and non-operational states. And, until 4.8, if all of the above magically succeeded, it would dereference a NULL pointer and OOPS. Since this code appears to be useless, just delete it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23	nvmet: Make dsm number of ranges zero based	Alexander Solganik
	This caused the nvmet request data length to be incorrect. Signed-off-by: Alexander Solganik <sashas@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-09-23	nvmet: Use direct IO for writes	Sagi Grimberg
	We're designed to work with high-end devices where direct IO makes perfect sense. We noticed that we context switch by scheduling kblockd instead of going directly to the device without REQ_SYNC for writes. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jens Axboe <axboe@kernel.dk>
2016-09-23	admin-cmd: Added smart-log command support.	Chaitanya Kulkarni
	This patch implements the support for smart-log command (NVM Express 1.2.1-section 5.10.1.2 SMART / Health Information (Log Identifier 02h)) on the target for NVMe over Fabric. In current implementation host can retrieve following statistics:- 1. Data Units Read. 2. Data Units Written. 3. Host Read Commands. 4. Host Write Commands. Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23	nvme-fabrics: Add host_traddr options field to host infrastructure	James Smart
	Add the host_traddr field to allow specification of the host-port connection info for the transport. Will be used by FC transport. Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23	nvme-fabrics: revise host transport option descriptions	James Smart
	Revise some of the comments so not so ethernet-network centric Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23	nvme-fabrics: rework nvmf_get_address() for variable options	James Smart
	Revise nvmf_get_address() string to account for not all options being present. Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23	nbd: use BLK_MQ_F_BLOCKING	Josef Bacik
	We take a mutex when sending commands and send stuff over the network, we need to have queue_rq called asynchronously. Signed-off-by: Josef Bacik <jbacik@fb.com> Fixes: fd8383fd88a2 ("nbd: convert to blkmq") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23	blkcg: Annotate blkg_hint correctly	Bart Van Assche
	Avoid that sparse complains about blkg_hint manipulations. Fixes: a637120e4902 ("blkcg: use radix tree to index blkgs from blkcg") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23	cfq: fix starvation of asynchronous writes	Glauber Costa
	While debugging timeouts happening in my application workload (ScyllaDB), I have observed calls to open() taking a long time, ranging everywhere from 2 seconds - the first ones that are enough to time out my application - to more than 30 seconds. The problem seems to happen because XFS may block on pending metadata updates under certain circumnstances, and that's confirmed with the following backtrace taken by the offcputime tool (iovisor/bcc): ffffffffb90c57b1 finish_task_switch ffffffffb97dffb5 schedule ffffffffb97e310c schedule_timeout ffffffffb97e1f12 __down ffffffffb90ea821 down ffffffffc046a9dc xfs_buf_lock ffffffffc046abfb _xfs_buf_find ffffffffc046ae4a xfs_buf_get_map ffffffffc046babd xfs_buf_read_map ffffffffc0499931 xfs_trans_read_buf_map ffffffffc044a561 xfs_da_read_buf ffffffffc0451390 xfs_dir3_leaf_read.constprop.16 ffffffffc0452b90 xfs_dir2_leaf_lookup_int ffffffffc0452e0f xfs_dir2_leaf_lookup ffffffffc044d9d3 xfs_dir_lookup ffffffffc047d1d9 xfs_lookup ffffffffc0479e53 xfs_vn_lookup ffffffffb925347a path_openat ffffffffb9254a71 do_filp_open ffffffffb9242a94 do_sys_open ffffffffb9242b9e sys_open ffffffffb97e42b2 entry_SYSCALL_64_fastpath 00007fb0698162ed [unknown] Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very high "Dispatch wait" times, on the dozens of seconds range and consistent with the open() times I have saw in that run. Still from the blktrace output, we can after searching a bit, identify the request that wasn't dispatched: 8,0 11 152 81.092472813 804 A WM 141698288 + 8 <- (8,1) 141696240 8,0 11 153 81.092472889 804 Q WM 141698288 + 8 [xfsaild/sda1] 8,0 11 154 81.092473207 804 G WM 141698288 + 8 [xfsaild/sda1] 8,0 11 206 81.092496118 804 I WM 141698288 + 8 ( 22911) [xfsaild/sda1] <==== 'I' means Inserted (into the IO scheduler) ===================================> 8,0 0 289372 96.718761435 0 D WM 141698288 + 8 (15626265317) [swapper/0] <==== Only 15s later the CFQ scheduler dispatches the request ======================> As we can see above, in this particular example CFQ took 15 seconds to dispatch this request. Going back to the full trace, we can see that the xfsaild queue had plenty of opportunity to run, and it was selected as the active queue many times. It would just always be preempted by something else (example): 8,0 1 0 81.117912979 0 m N cfq1618SN / insert_request 8,0 1 0 81.117913419 0 m N cfq1618SN / add_to_rr 8,0 1 0 81.117914044 0 m N cfq1618SN / preempt 8,0 1 0 81.117914398 0 m N cfq767A / slice expired t=1 8,0 1 0 81.117914755 0 m N cfq767A / resid=40 8,0 1 0 81.117915340 0 m N / served: vt=1948520448 min_vt=1948520448 8,0 1 0 81.117915858 0 m N cfq767A / sl_used=1 disp=0 charge=0 iops=1 sect=0 where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB IO dispatchers. The requests preempting the xfsaild queue are synchronous requests. That's a characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests. While it can be argued that preempting ASYNC requests in favor of SYNC is part of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's goal. Moreover, unless I am misunderstanding something, that breaks the expectation set by the "fifo_expire_async" tunable, which in my system is set to the default. Looking at the code, it seems to me that the issue is that after we make an async queue active, there is no guarantee that it will execute any request. When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC requests in flight. An incoming request from another queue can also preempt it in such situation before we have the chance to execute anything (as seen in the trace above). This patch sets the must_dispatch flag if we notice that we have requests that are already fifo_expired. This flag is always cleared after cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin the queue for subsequent requests (unless they are themselves expired) Care is taken during preempt to still allow rt requests to preempt us regardless. Testing my workload with this patch applied produces much better results. From the application side I see no timeouts, and the open() latency histogram generated by systemtap looks much better, with the worst outlier at 131ms: Latency histogram of xfs_buf_lock acquisition (microseconds): value \|-------------------------------------------------- count 0 \| 11 1 \|@@@@ 161 2 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1966 4 \|@ 54 8 \| 36 16 \| 7 32 \| 0 64 \| 0 ~ 1024 \| 0 2048 \| 0 4096 \| 1 8192 \| 1 16384 \| 2 32768 \| 0 65536 \| 0 131072 \| 1 262144 \| 0 524288 \| 0 Signed-off-by: Glauber Costa <glauber@scylladb.com> CC: Jens Axboe <axboe@kernel.dk> CC: linux-block@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Glauber Costa <glauber@scylladb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-22	blk-mq: add flag for drivers wanting blocking ->queue_rq()	Jens Axboe
	If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its ->queue_rq() handler. For that case, blk-mq ensures that we always calls it from a safe context. Signed-off-by: Jens Axboe <axboe@fb.com> Tested-by: Josef Bacik <jbacik@fb.com>
2016-09-22	blk-mq: remove non-blocking pass in blk_mq_map_request	Christoph Hellwig
	bt_get already does a non-blocking pass as well as running the queue when scheduling internally, no need to duplicate it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-22	blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()	Jens Axboe
	Two cases: 1) blk_mq_alloc_request() needlessly re-runs the queue, after calling into the tag allocation without NOWAIT set. We don't need to do that. 2) blk_mq_map_request() should just use blk_mq_run_hw_queue() with the async flag set to false. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-09-22	block: export bio_free_pages to other modules	Guoqing Jiang
	bio_free_pages is introduced in commit 1dfa0f68c040 ("block: add a helper to free bio bounce buffer pages"), we can reuse the func in other modules after it was imported. Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@fb.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	lightnvm: propagate device_add() error code	Arnd Bergmann
	device_add() may fail, and all callers are supposed to check the return value, but one new user in lightnvm doesn't: drivers/lightnvm/sysfs.c: In function 'nvm_sysfs_register_dev': drivers/lightnvm/sysfs.c:184:2: error: ignoring return value of 'device_add', declared with attribute warn_unused_result [-Werror=unused-result] This changes the caller to propagate any error codes, which avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 38c9e260b9f9 ("lightnvm: expose device geometry through sysfs") Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	lightnvm: expose device geometry through sysfs	Simon A. F. Lund
	For a host to access an Open-Channel SSD, it has to know its geometry, so that it writes and reads at the appropriate device bounds. Currently, the geometry information is kept within the kernel, and not exported to user-space for consumption. This patch exposes the configuration through sysfs and enables user-space libraries, such as liblightnvm, to use the sysfs implementation to get the geometry of an Open-Channel SSD. The sysfs entries are stored within the device hierarchy, and can be found using the "lightnvm" device type. An example configuration looks like this: /sys/class/nvme/ └── nvme0n1 ├── capabilities: 3 ├── device_mode: 1 ├── erase_max: 1000000 ├── erase_typ: 1000000 ├── flash_media_type: 0 ├── media_capabilities: 0x00000001 ├── media_type: 0 ├── multiplane: 0x00010101 ├── num_blocks: 1022 ├── num_channels: 1 ├── num_luns: 4 ├── num_pages: 64 ├── num_planes: 1 ├── page_size: 4096 ├── prog_max: 100000 ├── prog_typ: 100000 ├── read_max: 10000 ├── read_typ: 10000 ├── sector_oob_size: 0 ├── sector_size: 4096 ├── media_manager: gennvm ├── ppa_format: 0x380830082808001010102008 ├── vendor_opcode: 0 ├── max_phys_secs: 64 └── version: 1 Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	lightnvm: control life of nvm_dev in driver	Matias Bjørling
	LightNVM compatible device drivers does not have a method to expose LightNVM specific sysfs entries. To enable LightNVM sysfs entries to be exposed, lightnvm device drivers require a struct device to attach it to. To allow both the actual device driver and lightnvm sysfs entries to coexist, the device driver tracks the lifetime of the nvm_dev structure. This patch refactors NVMe and null_blk to handle the lifetime of struct nvm_dev, which eliminates the need for struct gendisk when a lightnvm compatible device is provided. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	blk-mq: register device instead of disk	Matias Bjørling
	Enable devices without a gendisk instance to register itself with blk-mq and expose the associated multi-queue sysfs entries. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	null_blk: refactor to support non-gendisk devices	Matias Bjørling
	With LightNVM enabled devices, the gendisk structure is not exposed to the user. This hides the device driver specific sysfs entries, and prevents binding of LightNVM geometry information to the device. Refactor the device registration process, so that gendisk and non-gendisk devices are easily managed. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	nvme: refactor namespaces to support non-gendisk devices	Matias Bjørling
	With LightNVM enabled namespaces, the gendisk structure is not exposed to the user. This prevents LightNVM users from accessing the NVMe device driver specific sysfs entries, and LightNVM namespace geometry. Refactor the revalidation process, so that a namespace, instead of a gendisk, is revalidated. This later allows patches to wire up the sysfs entries up to a non-gendisk namespace. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21	lightnvm: NVM should depend on HAS_DMA	Geert Uytterhoeven
	If NO_DMA=y: drivers/built-in.o: In function `nvme_nvm_dev_dma_free': lightnvm.c:(.text+0x23df1a): undefined reference to `dma_pool_free' drivers/built-in.o: In function `nvme_nvm_dev_dma_alloc': lightnvm.c:(.text+0x23df38): undefined reference to `dma_pool_alloc' drivers/built-in.o: In function `nvme_nvm_destroy_dma_pool': lightnvm.c:(.text+0x23df4c): undefined reference to `dma_pool_destroy' drivers/built-in.o: In function `nvme_nvm_create_dma_pool': lightnvm.c:(.text+0x23df7e): undefined reference to `dma_pool_create' and ERROR: "dma_pool_destroy" [drivers/nvme/host/nvme-core.ko] undefined! ERROR: "dma_pool_free" [drivers/nvme/host/nvme-core.ko] undefined! ERROR: "dma_pool_alloc" [drivers/nvme/host/nvme-core.ko] undefined! ERROR: "dma_pool_create" [drivers/nvme/host/nvme-core.ko] undefined! Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-19	sbitmap: initialize weight to zero	Colin Ian King
	Variable weight is not being initialized to zero before it is used to compute the weight sum. Ensure it is initialized to zero. Found with static analysis with cppcheck: [lib/sbitmap.c:177]: (error) Uninitialized variable: weight Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17	sbitmap: don't update the allocation hint on clear after resize	Omar Sandoval
	If we have a bunch of high-numbered bits allocated and then we resize the struct sbitmap_queue, when those bits get cleared, we'll update the hint and then have to re-randomize it repeatedly. Avoid that by checking that the cleared bit is still a valid hint. No measurable performance difference in the common case. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17	sbitmap: re-initialize allocation hints after resize	Omar Sandoval
	After a struct sbitmap_queue is resized smaller, the allocation hints may still be set to bits beyond the new depth of the bitmap. This means that, for example, if the number of blk-mq tags is reduced through sysfs, more requests than the nominal queue depth may be in flight. It's tempting to fix this at resize time by doing a one-time reinitialization of the hints, but this can race with __sbitmap_queue_get() updating the hint. Instead, check the hint before we use it. This caused no measurable performance difference in my synthetic benchmarks. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17	sbitmap: randomize initial alloc_hint values	Omar Sandoval
	In order to get good cache behavior from a sbitmap, we want each CPU to stick to its own cacheline(s) as much as possible. This might happen naturally as the bitmap gets filled up and the alloc_hint values spread out, but we really want this behavior from the start. blk-mq apparently intended to do this, but the code to do this was never wired up. Get rid of the dead code and make it part of the sbitmap library. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17	sbitmap: push alloc policy into sbitmap_queue	Omar Sandoval
	Again, there's no point in passing this in every time. Make it part of struct sbitmap_queue and clean up the API. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17	sbitmap: push per-cpu last_tag into sbitmap_queue	Omar Sandoval
	Allocating your own per-cpu allocation hint separately makes for an awkward API. Instead, allocate the per-cpu hint as part of the struct sbitmap_queue. There's no point for a struct sbitmap_queue without the cache, but you can still use a bare struct sbitmap. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17	sbitmap: allocate wait queues on a specific node	Omar Sandoval
	The original bt_alloc() we converted from was using kzalloc(), not kzalloc_node(), to allocate the wait queues. This was probably an oversight, so fix it for sbitmap_queue_init_node(). Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17	blk-mq: abstract tag allocation out into sbitmap library	Omar Sandoval
	This is a generally useful data structure, so make it available to anyone else who might want to use it. It's also a nice cleanup separating the allocation logic from the rest of the tag handling logic. The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only selected by CONFIG_BLOCK for now. This should be a complete noop functionality-wise. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-16	blk-mq: account higher order dispatch	Jens Axboe
	We currently account a '0' dispatch, and anything above that still falls below the range set by BLK_MQ_MAX_DISPATCH_ORDER. If we dispatch more, we don't account it. Change the last bucket to be inclusive of anything above the range we track, and have the sysfs file reflect that by including a '+' in the output: $ cat /sys/block/nvme0n1/mq/0/dispatched 0 1006 1 20229 2 1 4 0 8 0 16 0 32+ 0 Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
2016-09-14	blk-mq: introduce blk_mq_delay_kick_requeue_list()	Mike Snitzer
	blk_mq_delay_kick_requeue_list() provides the ability to kick the q->requeue_list after a specified time. To do this the request_queue's 'requeue_work' member was changed to a delayed_work. blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued requests while it doesn't make sense to immediately requeue them (e.g. when all paths in a DM multipath have failed). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: remove IOPRIO_BITS	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	bio.h: remove a very outdated comment	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: remove bio_destructor_t	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: Improve bio_set_op_attrs() robustness	Bart Van Assche
	Since REQ_OP_BITS == 3 and __REQ_NR_BITS == 30 it is not that hard to pass an op_flags argument to bio_set_op_attrs() that is larger than the number of bits reserved for the op_flags argument. Complain if this happens. Additionally, ensure that negative arguments trigger a complaint (1 << ... is signed while 1U << ... is unsigned; adding 0U to an integer expression causes it to be promoted to an unsigned type). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block, dm-crypt, btrfs: Introduce bio_flags()	Bart Van Assche
	Introduce the bio_flags() macro. Ensure that the second argument of bio_set_op_attrs() only contains flags and no operation. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Chris Mason <clm@fb.com> (maintainer:BTRFS FILE SYSTEM) Cc: Josef Bacik <jbacik@fb.com> (maintainer:BTRFS FILE SYSTEM) Cc: Mike Snitzer <snitzer@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: Document that bio_op() uses the data type of bio.bi_opf	Bart Van Assche
	Make it clear that the sizeof(unsigned int) expression in BIO_OP_SHIFT refers to the bi_opf member of struct bio. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: remove remnant refs to hardsect	Linus Walleij
	commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1 "block: Do away with the notion of hardsect_size" removed the notion of "hardware sector size" from the kernel in favor of logical block size, but references remain in comments and documentation. Update the remaining sites mentioning hardsect. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: remove blk_mq_alloc_single_hw_queue() prototype	Linus Walleij
	The blk_mq_alloc_single_hw_queue() is a prototype artifact that should have been removed with commit cdef54dd85ad66e77262ea57796a3e81683dd5d6 "blk-mq: remove alloc_hctx and free_hctx methods" where the last users of it were deleted. Fixes: cdef54dd85ad ("blk-mq: remove alloc_hctx and free_hctx methods") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block_dev: remove DAX leftovers	Christoph Hellwig
	DAX support for block devices was removed in commits 03cdad ("block: disable block device DAX by default") and 99a01cd ("block: remove BLK_DEV_DAX config option"), but we still kept a call to dax_do_io and some uneeded i_flags manipulations introduced in commit bbab37 ("block: Add support for DAX reads/writes to block devices"). Remove those leftovers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: enable zeroing of io_poll statistics	Stephen Bates
	Allow the io_poll statistics to be zeroed to make for easier logging of polling event. Signed-off-by: Stephen Bates <sbates@raithlin.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14	block: add poll_considered statistic	Stephen Bates
	In order to help determine the effectiveness of polling in a running system it is usful to determine the ratio of how often the poll function is called vs how often the completion is checked. For this reason we add a poll_considered variable and add it to the sysfs entry for io_poll. Signed-off-by: Stephen Bates <sbates@raithlin.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-08	nbd: allow block mq to deal with timeouts	Josef Bacik
	Instead of rolling our own timer, just utilize the blk mq req timeout and do the disconnect if any of our commands timeout. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-08	nbd: use flags instead of bool	Josef Bacik
	In preparation for some future changes, change a few of the state bools over to normal bits to set/clear properly. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-08	nbd: don't shutdown sock with irq's disabled	Josef Bacik
	We hit a warning when shutting down the nbd connection because we have irq's disabled. We don't really need to do the shutdown under the lock, just clear the nbd->sock. So do the shutdown outside of the irq. This gets rid of the warning. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-08	nbd: convert to blkmq	Josef Bacik
	This moves NBD over to using blkmq, which allows us to get rid of the NBD wide queue lock and the async submit kthread. We will start with 1 hw queue for now, but I plan to add multiple tcp connection support in the future and we'll fix how we set the hwqueue's. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-29	mtip32xx: mark symbols static where possible	Baoyou Xie
	We get 1 warning when biuld kernel with W=1: drivers/block/mtip32xx/mtip32xx.c:3689:6: warning: no previous prototype for 'mtip_block_release' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks it 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-29	blk-mq: prefetch request in blk_mq_tag_to_rq()	Jens Axboe
	When drivers or the core calls this function, they usually dereference the request shortly there after. Prefetch the first cache line. Profiling IO workloads shows that this is the most common cache miss on the block side of things. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-29	blk-mq: improve layout of blk_mq_hw_ctx	Jens Axboe
	Various cache line optimizations: - Move delay_work towards the end. It's huge, and we don't use it a lot (only SCSI). - Move the atomic state into the same cacheline as the the dispatch list and lock. - Rearrange a few members to pack it better. - Shrink the max-order for dispatch accounting from 10 to 7. This means that ->dispatched[] and ->run now take up their own cacheline. This shrinks struct blk_mq_hw_ctx down to 8 cachelines. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-29	blk-mq: turn hctx->run_work into a regular work struct	Jens Axboe
	We don't need the larger delayed work struct, since we always run it immediately. Signed-off-by: Jens Axboe <axboe@fb.com>