diff options
author | Jeff Layton <jlayton@kernel.org> | 2024-10-02 17:27:18 -0400 |
---|---|---|
committer | Christian Brauner <brauner@kernel.org> | 2024-10-07 12:48:56 +0200 |
commit | 4e40eff0b5737c0de39e1ae5812509efbc0b986e (patch) | |
tree | 954d9a8b22da5082909cf0fc64fd142ddb32ef5c /tools/perf/scripts/python/export-to-postgresql.py | |
parent | 98f7e32f20d28ec452afb208f9cffc08448a2652 (diff) |
fs: add infrastructure for multigrain timestamps
The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide when to invalidate the cache. Even with NFSv4, a lot of
exported filesystems don't properly support a change attribute and are
subject to the same problems with timestamp granularity. Other
applications have similar issues with timestamps (e.g backup
applications).
If fine-grained timestamps were always used, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
What is needed is a way to only use fine-grained timestamps when they
are being actively queried. Use the (unused) top bit in
inode->i_ctime_nsec as a flag that indicates whether the current
timestamps have been queried via stat() or the like. When it's set,
allow the update to use a fine-grained timestamp iff it's necessary to
make the ctime show a different value.
If it has been queried, then first see whether the current coarse time
is later than the existing ctime. If it is, accept that value. If it
isn't, then get a fine-grained timestamp and attempt to stamp the inode
ctime with that value. If that races with another concurrent stamp, then
abandon the update and take the new value without retrying.
Filesystems can opt into this by setting the FS_MGTIME fstype flag.
Others should be unaffected (other than being subject to the same floor
value as multigrain filesystems).
Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241002-mgtime-v10-3-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Diffstat (limited to 'tools/perf/scripts/python/export-to-postgresql.py')
0 files changed, 0 insertions, 0 deletions