summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-26IB/hfi1: Move driver out of stagingDennis Dalessandro
The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Do not free hfi1 cdev parent structure earlyDennis Dalessandro
The deletion of a cdev is not a fence for holding off references to the structure. The driver attempts to delete the cdev and then proceeds to free the parent structure, the hfi1_devdata, or dd. This can potentially lead to a kernel panic in situations where a user has an FD for the cdev open, and the pci device gets removed. If the user then closes the FD there will be a NULL dereference when trying to do put on the cdev's kobject. Fix this by pointing the cdev's kobject.parent at a new kobject embedded in its parent structure. Also take a reference when the device is opened and put it back when it is closed. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Add trace message in user IOCTL handlingDennis Dalessandro
Add a trace message to HFI1s user IOCTL handling. This allows debugging of which IOCTLs are being handled by the driver. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove write(), use ioctl() for user cmdsDennis Dalessandro
Remove the write() handler for user space commands now that ioctl handling is available. User apps will need to change to use ioctl from this point forward. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Add ioctl() interface for user commandsDennis Dalessandro
IOCTL is more suited to what user space commands need to do than the write() interface. Add IOCTL definitions for all existing write commands and the handling for those. The write() interface will be removed in a follow on patch. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove unused user commandDennis Dalessandro
The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no reason to live in the driver. Remove it. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove snoop/diag interfaceDennis Dalessandro
The snoop/diag interface is better served by an implementation which is more general and usable by other drivers perhaps. Go ahead and remove the code now and get rid of the char dev. We can put the feature back when we have a more agreeable solution. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove EPROM functionality from data deviceDennis Dalessandro
Remove EPROM handling from the cdev which is used for user application data traffic. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove UI char deviceDennis Dalessandro
Remove UI char device which exposes direct access to registers for user space. This was put in to aid in debugging the hardware. We are looking into alternatives means of providing the same functionality. This removes another char device from HFI1's footprint. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove multiple device cdevDennis Dalessandro
hfi1 current exports a cdev that can be used to target all of the hfi's in the system. However there is a problem with this approach in that the devices could be on different subnets. This is a problem that user space can figure out and explicitly tell the driver on which device to create a context. Remove the multi-purpose cdev leaving a dedicated cdev for each port. Also remove the striping capability that is dependent upon the user choosing the multi-purpose cdev. It is now up to user space to determine how to stripe contexts. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove anti-pattern in cdev initDennis Dalessandro
Remove the usage of an anti-pattern goto in hfi1_cdev_init to improve code readability. Suggested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Fix bug that blocks process on exit after port bounceJianxin Xiong
During the processing of a user SDMA request, if there was an error before the request counter was increased, the state of the packet queue could be updated incorrectly, causing the counter to underflow. As the result, the process could get stuck later since the counter could never get back to 0. This patch adds a condition to guard the packet queue update so that the counter is only decreased if it has been increased before the error happens. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/qib: Remove unused qib_7322_intr_msgs[]Jubin John
Building the qib driver with gcc version 6.1.0 raises the following build warning: drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning: 'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=] static const struct qib_hwerror_msgs qib_7322_intr_msgs[] = { ^~~~~~~~~~~~~~~~~~ Remove the unused qib_7322_intr_msgs[] Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Remove unnecessary commentIra Weiny
This comment was old, the MTU enums have been defined. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Fix sdma_event_names[] build warningJubin John
sdma_event_names[] is only used within CONFIG_SDMA_VERBOSITY ifdefs, so when CONFIG_SDMA_VERBOSITY is disabled, it results in the following 0-day build warning: >> drivers/infiniband/hw/hfi1/sdma.c:137:27: warning: 'sdma_event_names' >> defined but not used [-Wunused-const-variable=] static const char * const sdma_event_names[] = { ^~~~~~~~~~~~~~~~ This occurs on the following compiler: compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 For more information check: https://lists.01.org/pipermail/kbuild-all/2016-May/020060.html Fix this warning by defining sdma_event_name[] only within the CONFIG_SDMA_VERBOSITY ifdefs. Reported-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdmavt: Use kzalloc_nodeJubin John
Use kzalloc_node instead of kzalloc for rdmavt memory region segment allocation to optimize for performance on NUMA platforms. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdmavt: Insure QP vmalloc variants zero memoryMike Marciniszyn
The usage of the various vmalloc APIs do not consistently zero memory when allocating the swqe. Insure zeroing variants are used. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Fix an interval RB node reference count leakMitko Haralanov
Commit e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which may cause corruption") introduced a bug which may cause a reference count of a interval RB node to be leaked in the case where an SDMA transfer from that node completes at the same time as the node is being extended. If a node is being extended, it is first removed from the RB tree in order to be processed without the risk of an invalidation event removing the node at the same time. If a SDMA completion happens during that time, the completion handler will fail to find the node in the RB tree and, therefore, fail to correctly decrement its refcount. This leaves the node in the tree and its pages pinned for the duration of the user process. To prevent this from happening the io vector adds a reference to the RB node, which is used during the SDMA completion instead of looking up the node in the RB tree. This change adds a performance improvement as a side effect by avoiding the RB tree lookup. Fixes: e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which may cause corruption") Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26blk-mq: clear q->mq_ops if init failMing Lin
blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops was not cleared when blk_mq_init_allocated_queue() fails. Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because: - q->all_q_node is not added to all_q_list yet - q->tag_set is NULL - hctx was not setup yet or already freed Fixed it by clearing q->mq_ops on error path. Signed-off-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-26pnfs: pnfs_update_layout needs to consider if strict iomode checking is onTom Haynes
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically support enforcing that a IOMODE_RW segment will not allow READ I/O. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-26nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and ↵Tom Haynes
reading is disabled Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-26Documentation/hwmon: Update links in max34440Glenn Dayton
It appears the website for maxim-ic.com changed to maximintegrated.com. Signed-off-by: Glenn Dayton <glenn.dayton24@gmail.com> Signed-off-by: Jean Delvare <jdelvare@suse.de>
2016-05-26hwmon: (emc2103) Fix typo in MODULE_PARM_DESCDan Carpenter
"apd" was intended here instead of "init". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jean Delvare <jdelvare@suse.de>
2016-05-26restore killability of old mutex_lock_killable(&inode->i_mutex) usersAl Viro
The ones that are taking it exclusive, that is... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-26add down_write_killable_nested()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-26update D/f/directory-lockingAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-25Revert "mtd: atmel_nand: Support variable RB_EDGE interrupts"Wenyou Yang
This reverts commit 5ddc7bd43ccc ("mtd: atmel_nand: Support variable RB_EDGE interrupts") Because for current SoCs, the RB_EDGE3(i.e. bit 27) of HSMC_SR register does not exist, the RB_EDGE0 (i.e. bit 24) is the ready/busy line edge status bit. It is a datasheet bug. Cc: <stable@vger.kernel.org> Fixes: commit 5ddc7bd43ccc ("mtd: atmel_nand: Support variable RB_EDGE interrupts") Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2016-05-25Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: EFI, entry code, pkeys and MPX fixes, TASK_SIZE cleanups and a tsc frequency table fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Switch from TASK_SIZE to TASK_SIZE_MAX in the page fault code x86/fsgsbase/64: Use TASK_SIZE_MAX for FSBASE/GSBASE upper limits x86/mm/mpx: Work around MPX erratum SKD046 x86/entry/64: Fix stack return address retrieval in thunk x86/efi: Fix 7-parameter efi_call()s x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys x86/tsc: Add missing Cherrytrail frequency to the table
2016-05-25Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: one for a lost wakeup, the other to fix the compiler optimizing out preempt operations on ARM64 (and possibly other non-x86 architectures)" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix remote wakeups sched/preempt: Fix preempt_count manipulations
2016-05-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Mostly tooling and PMU driver fixes, but also a number of late updates such as the reworking of the call-chain size limiting logic to make call-graph recording more robust, plus tooling side changes for the new 'backwards ring-buffer' extension to the perf ring-buffer" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) perf record: Read from backward ring buffer perf record: Rename variable to make code clear perf record: Prevent reading invalid data in record__mmap_read perf evlist: Add API to pause/resume perf trace: Use the ptr->name beautifier as default for "filename" args perf trace: Use the fd->name beautifier as default for "fd" args perf report: Add srcline_from/to branch sort keys perf evsel: Record fd into perf_mmap perf evsel: Add overwrite attribute and check write_backward perf tools: Set buildid dir under symfs when --symfs is provided perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced perf annotate: Sort list of recognised instructions perf annotate: Fix identification of ARM blt and bls instructions perf tools: Fix usage of max_stack sysctl perf callchain: Stop validating callchains by the max_stack sysctl perf trace: Fix exit_group() formatting perf top: Use machine->kptr_restrict_warned perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1 perf machine: Do not bail out if not managing to read ref reloc symbol perf/x86/intel/p4: Trival indentation fix, remove space ...
2016-05-26Yama: fix double-spinlock and user access in atomic contextJann Horn
Commit 8a56038c2aef ("Yama: consolidate error reporting") causes lockups when someone hits a Yama denial. Call chain: process_vm_readv -> process_vm_rw -> process_vm_rw_core -> mm_access -> ptrace_may_access task_lock(...) is taken __ptrace_may_access -> security_ptrace_access_check -> yama_ptrace_access_check -> report_access -> kstrdup_quotable_cmdline -> get_cmdline -> access_process_vm -> get_task_mm task_lock(...) is taken again task_lock(p) just calls spin_lock(&p->alloc_lock), so at this point, spin_lock() is called on a lock that is already held by the current process. Also: Since the alloc_lock is a spinlock, sleeping inside security_ptrace_access_check hooks is probably not allowed at all? So it's not even possible to print the cmdline from in there because that might involve paging in userspace memory. It would be tempting to rewrite ptrace_may_access() to drop the alloc_lock before calling the LSM, but even then, ptrace_may_access() itself might be called from various contexts in which you're not allowed to sleep; for example, as far as I understand, to be able to hold a reference to another task, usually an RCU read lock will be taken (see e.g. kcmp() and get_robust_list()), so that also prohibits sleeping. (And using e.g. FUSE, a user can cause pagefault handling to take arbitrary amounts of time - see https://bugs.chromium.org/p/project-zero/issues/detail?id=808.) Therefore, AFAIK, in order to print the name of a process below security_ptrace_access_check(), you'd have to either grab a reference to the mm_struct and defer the access violation reporting or just use the "comm" value that's stored in kernelspace and accessible without big complications. (Or you could try to use some kind of atomic remote VM access that fails if the memory isn't paged in, similar to copy_from_user_inatomic(), and if necessary fall back to comm, but that'd be kind of ugly because the comm/cmdline choice would look pretty random to the user.) Fix it by deferring reporting of the access violation until current exits kernelspace the next time. v2: Don't oops on PTRACE_TRACEME, call report_access under task_lock(current). Also fix nonsensical comment. And don't use GPF_ATOMIC for memory allocation with no locks held. This patch is tested both for ptrace attach and ptrace traceme. Fixes: 8a56038c2aef ("Yama: consolidate error reporting") Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-05-25Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool build fix from Ingo Molnar: "An libtool fix for older libelf versions" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Allow building with older libelf
2016-05-26ceph: fix wake_up_session_cb()Yan, Zheng
We should reset i_requested_max_size before waking the waiters. (zero i_requested_max_size make waiter re-request the max size) Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: don't use truncate_pagecache() to invalidate read cacheYan, Zheng
truncate_pagecache() drops dirty pages, it's dangerous to use it to invalidate read cache. Besides, we shouldn't start invalidating read cache while there are buffer writers. Because buffer writers may add dirty pages later. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: SetPageError() for writeback pages if writepages failsYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: handle interrupted ceph_writepage()Yan, Zheng
writepage() can be interrupted when it's called by direct memory reclaimer (the direct memory relaimer is killed). To avoid lossing data, we redirty the page. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: make ceph_update_writeable_page() uninterruptibleYan, Zheng
ceph_update_writeable_page() is used by ceph_write_begin(). It beaks atomicity of write operation if it's interruptible. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26libceph: make ceph_osdc_wait_request() uninterruptibleYan, Zheng
Ceph_osdc_wait_request() is used when cephfs issues sync IO. In most cases, the sync IO should be uninterruptible. The fix is use killale wait function in ceph_osdc_wait_request(). Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: handle -EAGAIN returned by ceph_update_writeable_page()Yan, Zheng
when ceph_update_writeable_page() return -EAGAIN, caller should lock the page and call ceph_update_writeable_page() again. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEMYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: block non-fatal signals for fault/page_mkwriteYan, Zheng
Fault and page_mkwrite are supposed to be uninterruptable. But they call ceph functions that are interruptible. So they should block signals before calling functions that are interruptible Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: make logical calculation functions return boolZhang Zhuoyu
This patch makes serverl logical caculation functions return bool to improve readability due to these particular functions only using 0/1 as their return value. No functional change. Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
2016-05-26ceph: tolerate bad i_size for symlink inodeYan, Zheng
A mds bug can cause symlink's size to be truncated to zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: improve fragtree change detectionYan, Zheng
check if number of splits in i_fragtree is equal to number of splits in mds reply Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: keep leaf frag when updating fragtreeYan, Zheng
Nodes in i_fragtree are sorted according to ceph_compare_frag(). It means frag node in i_fragtree always follow its direct parent node. To check if a leaf node is valid, we just need to check if it's child of previous split node. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: fix dir_auth check in ceph_fill_dirfrag()Yan, Zheng
-1 is CDIR_AUTH_PARENT, it means dir's auth mds is the same as inode's auth mds Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: don't assume frag tree splits in mds reply are sortedYan, Zheng
The algorithm that updates i_fragtree relies on that the frag tree splits in mds reply are of the same order of i_fragtree. This is not true because current MDS encodes frag tree splits in ascending order of (unsigned)frag_t. But nodes in i_fragtree are sorted according to ceph_frag_compare(). The fix is sort the frag tree splits first, then updates i_fragtree. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: fix inode reference leakYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: using hash value to compose dentry offsetYan, Zheng
If MDS sorts dentries in dirfrag in hash order, we use hash value to compose dentry offset. dentry offset is: (0xff << 52) | ((24 bits hash) << 28) | (the nth entry hash hash collision) This offset is stable across directory fragmentation. This alos means there is no need to reset readdir offset if directory get fragmented in the middle of readdir. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: don't forbid marking directory complete after forward seekYan, Zheng
Forward seek within same frag does not update fi->last_name, it will not affect contents of later readdir reply. So there is no need to forbid marking directory complete Signed-off-by: Yan, Zheng <zyan@redhat.com>