Age | Commit message (Collapse) | Author |
|
This tracer was temporarily removed in 6416669 (workqueue:
temporarily remove workqueue tracing, 2010-06-29) but never
reinstated after concurrency managed workqueues were completed.
For almost two years it hasn't been compilable so it seems nobody
is using it. Delete it.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
With memcg converted, cgroup_subsys->populate() doesn't have any user
left. Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
|
keyctl_session_to_parent(task) sets ->replacement_session_keyring,
it should be processed and cleared by key_replace_session_keyring().
However, this task can fork before it notices TIF_NOTIFY_RESUME and
the new child gets the bogus ->replacement_session_keyring copied by
dup_task_struct(). This is obviously wrong and, if nothing else, this
leads to put_cred(already_freed_cred).
change copy_creds() to clear this member. If copy_process() fails
before this point the wrong ->replacement_session_keyring doesn't
matter, exit_creds() won't be called.
Cc: <stable@vger.kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch fixes the irq_domain_mapping debugfs output to pad pointer
values with leading zeros so that pointer values are displayed
correctly. Otherwise you get output similar to "0x 5e0000000000000".
Also, when the irq_domain is set to 'null'
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: David Daney <david.daney@cavium.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
The actual name of the irq_domain mapping debugfs file is
"irq_domain_mapping" not "virq_mapping".
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
In commit 4bbdd45a (irq_domain/powerpc: eliminate irq_map; use
irq_alloc_desc() instead) code was added that ignores error returns
from irq_alloc_desc_from() by (silently) casting the return value to
unsigned. The negitive value error return now suddenly looks like a
valid irq number.
Commits cc79ca69 (irq_domain: Move irq_domain code from powerpc to
kernel/irq) and 1bc04f2c (irq_domain: Add support for base irq and
hwirq in legacy mappings) move this code to its current location in
irqdomain.c
The result of all of this is a null pointer dereference OOPS if one of
the error cases is hit.
The fix: Don't cast away the negativeness of the return value and then
check for errors.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
[grant.likely: dropped addition of new 'irq' variable]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
tick_broadcast_switch_to_oneshot()
In the commit 77b0d60c5adf39c74039e2142a1d3cd1e4d53799,
"clockevents: Leave the broadcast device in shutdown mode when not needed",
we were bailing out too quickly in tick_broadcast_switch_to_oneshot(),
with out tracking the broadcast device mode change to 'TICKDEV_MODE_ONESHOT'.
This breaks the platforms which need broadcast device oneshot services during
deep idle states. tick_broadcast_oneshot_control() thinks that it is
in periodic mode and fails to take proper decisions based on the
CLOCK_EVT_NOTIFY_BROADCAST_[ENTER, EXIT] notifications during deep
idle entry/exit.
Fix this by tracking the broadcast device mode as 'TICKDEV_MODE_ONESHOT',
before leaving the broadcast HW device in shutdown mode if there are no active
requests for the moment.
Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1334011304.12400.81.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
David pointed out, that WARN_ONCE() to report usage of an deprecated
misfeature make folks unhappy. Use printk_once() instead.
Andrew told me to stop grumbling and to remove the silly typecast
while touching the file.
Reported-by: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Merge with latest Linus' tree, as I have incoming patches
that fix code that is newer than current HEAD of for-next.
Conflicts:
drivers/net/ethernet/realtek/r8169.c
|
|
A suspended VM can cause spurious soft lockup warnings. To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so. When the watchdog is reset successfully, clear the guest
paused flag.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Modify alloc_uid to take a kuid and make the user hash table global.
Stop holding a reference to the user namespace in struct user_struct.
This simplifies the code and makes the per user accounting not
care about which user namespace a uid happens to appear in.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Start distinguishing between internal kernel uids and gids and
values that userspace can use. This is done by introducing two
new types: kuid_t and kgid_t. These types and their associated
functions are infrastructure are declared in the new header
uidgid.h.
Ultimately there will be a different implementation of the mapping
functions for use with user namespaces. But to keep it simple
we introduce the mapping functions first to separate the meat
from the mechanical code conversions.
Export overflowuid and overflowgid so we can use from_kuid_munged
and from_kgid_munged in modular code.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
This represents a change in strategy of how to handle user namespaces.
Instead of tagging everything explicitly with a user namespace and bulking
up all of the comparisons of uids and gids in the kernel, all uids and gids
in use will have a mapping to a flat kuid and kgid spaces respectively. This
allows much more of the existing logic to be preserved and in general
allows for faster code.
In this new and improved world we allow someone to utiliize capabilities
over an inode if the inodes owner mapps into the capabilities holders user
namespace and the user has capabilities in their user namespace. Which
is simple and efficient.
Moving the fs uid comparisons to be comparisons in a flat kuid space
follows in later patches, something that is only significant if you
are using user namespaces.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
With a user_ns reference in struct cred the only user of the user namespace
reference in struct user_struct is to keep the uid hash table alive.
The user_namespace reference in struct user_struct will be going away soon, and
I have removed all of the references. Rename the field from user_ns to _user_ns
so that the compiler can verify nothing follows the user struct to the user
namespace anymore.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
I am about to remove the struct user_namespace reference from struct user_struct.
So keep an explicit track of the parent user namespace.
Take advantage of this new reference and replace instances of user_ns->creator->user_ns
with user_ns->parent.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
struct user_struct will shortly loose it's user_ns reference
so make the cred user_ns reference a proper reference complete
with reference counting.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Optimize performance and prepare for the removal of the user_ns reference
from user_struct. Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
In struct cred the user member is and has always been declared struct user_struct *user.
At most a constant struct cred will have a constant pointer to non-constant user_struct
so remove this unnecessary cast.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer fixlet from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
sysctl: fix write access to dmesg_restrict/kptr_restrict
|
|
Fix tick_nohz_restart() to not use a stale ktime_t "now" value when
calling tick_do_update_jiffies64(now).
If we reach this point in the loop it means that we crossed a tick
boundary since we grabbed the "now" timestamp, so at this point "now"
refers to a time in the old jiffy, so using the old value for "now" is
incorrect, and is likely to give us a stale jiffies value.
In particular, the first time through the loop the
tick_do_update_jiffies64(now) call is always a no-op, since the
caller, tick_nohz_restart_sched_tick(), will have already called
tick_do_update_jiffies64(now) with that "now" value.
Note that tick_nohz_stop_sched_tick() already uses the correct
approach: when we notice we cross a jiffy boundary, grab a new
timestamp with ktime_get(), and *then* update jiffies.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1332875377-23014-1-git-send-email-ncardwell@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Replace "mutex" with "semaphore" in down_trylock comment
Signed-off-by: Lucia Rosculete <luciarosculete@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Merge batch of fixes from Andrew Morton:
"The simple_open() cleanup was held back while I wanted for laggards to
merge things.
I still need to send a few checkpoint/restore patches. I've been
wobbly about merging them because I'm wobbly about the overall
prospects for success of the project. But after speaking with Pavel
at the LSF conference, it sounds like they're further toward
completion than I feared - apparently davem is at the "has stopped
complaining" stage regarding the net changes. So I need to go back
and re-review those patchs and their (lengthy) discussion."
* emailed from Andrew Morton <akpm@linux-foundation.org>: (16 patches)
memcg swap: use mem_cgroup_uncharge_swap fix
backlight: add driver for DA9052/53 PMIC v1
C6X: use set_current_blocked() and block_sigmask()
MAINTAINERS: add entry for sparse checker
MAINTAINERS: fix REMOTEPROC F: typo
alpha: use set_current_blocked() and block_sigmask()
simple_open: automatically convert to simple_open()
scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
libfs: add simple_open()
hugetlbfs: remove unregister_filesystem() when initializing module
drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
fs/xattr.c:setxattr(): improve handling of allocation failures
fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
sysrq: use SEND_SIG_FORCED instead of force_sig()
proc: fix mount -t proc -o AAA
|
|
Many users of debugfs copy the implementation of default_open() when
they want to support a custom read/write function op. This leads to a
proliferation of the default_open() implementation across the entire
tree.
Now that the common implementation has been consolidated into libfs we
can replace all the users of this function with simple_open().
This replacement was done with the following semantic patch:
<smpl>
@ open @
identifier open_f != simple_open;
identifier i, f;
@@
-int open_f(struct inode *i, struct file *f)
-{
(
-if (i->i_private)
-f->private_data = i->i_private;
|
-f->private_data = i->i_private;
)
-return 0;
-}
@ has_open depends on open @
identifier fops;
identifier open.open_f;
@@
struct file_operations fops = {
...
-.open = open_f,
+.open = simple_open,
...
};
</smpl>
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit bfdc0b4 adds code to restrict access to dmesg_restrict,
however, it incorrectly alters kptr_restrict rather than
dmesg_restrict.
The original patch from Richard Weinberger
(https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as
expected, and so the patch seems to have been misapplied.
This adds the CAP_SYS_ADMIN check to both dmesg_restrict and
kptr_restrict, since both are sensitive.
Reported-by: Phillip Lougher <plougher@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull KGDB/KDB regression fixes from Jason Wessel:
- Fix a Smatch warning that appeared in the 3.4 merge window
- Fix kgdb test suite with SMP for all archs without HW single stepping
- Fix kgdb sw breakpoints with CONFIG_DEBUG_RODATA=y limitations on x86
- Fix oops on kgdb test suite with CONFIG_DEBUG_RODATA
- Fix kgdb test suite with SMP for all archs with HW single stepping
* tag 'for_linus-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()
kgdb,debug_core: pass the breakpoint struct instead of address and memory
kgdbts: (2 of 2) fix single step awareness to work correctly with SMP
kgdbts: (1 of 2) fix single step awareness to work correctly with SMP
kgdbts: Fix kernel oops with CONFIG_DEBUG_RODATA
kdb: Fix smatch warning on dbg_io_ops->is_console
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
- Patch series that hopefully fixes races between the freezer and
request_firmware() and request_firmware_nowait() for good, with two
cleanups from Stephen Boyd on top.
- Runtime PM fix from Alan Stern preventing tasks from getting stuck
indefinitely in the runtime PM wait queue.
- Device PM QoS update from MyungJoo Ham introducing a new variant of
pm_qos_update_request() allowing the callers to specify a timeout.
* tag 'pm-for-3.4-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / QoS: add pm_qos_update_request_timeout() API
firmware_class: Move request_firmware_nowait() to workqueues
firmware_class: Reorganize fw_create_instance()
PM / Sleep: Mitigate race between the freezer and request_firmware()
PM / Sleep: Move disabling of usermode helpers to the freezer
PM / Hibernate: Disable usermode helpers right before freezing tasks
firmware_class: Do not warn that system is not ready from async loads
firmware_class: Split _request_firmware() into three functions, v2
firmware_class: Rework usermodehelper check
PM / Runtime: don't forget to wake up waitqueue on failure
|
|
This merges some of the fixes from Paul Gortmaker for the header file
cleanup fallout.
Some of the patches are going through arch maintainer trees, and David
Howells suggested another be done differently, but this at least fixes a
few cases.
* emailed from Paul Gortmaker <paul.gortmaker@windriver.com>:
asm-generic: add linux/types.h to cmpxchg.h
firewire: restore the device.h include in linux/firewire.h
frv: fix warnings in mb93090-mb00/pci-dma.c about implicit EXPORT_SYMBOL
parisc: fix missing cmpxchg file error from system.h split
blackfin: fix cmpxchg build fails from system.h fallout
avr32: fix build failures from mis-naming of atmel_nand.h
ARM: mach-msm: fix compile fail from system.h fallout
irq_work: fix compile failure on MIPS from system.h split
|
|
Pull crypto fixes from Herbert Xu:
- Fix for CPU hotplug hang in padata.
- Avoid using cpu_active inappropriately in pcrypt and padata.
- Fix for user-space algorithm lookup hang with IV generators.
- Fix for netlink dump of algorithms where stuff went missing due to
incorrect calculation of message size.
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: user - Fix size of netlink dump message
crypto: user - Fix lookup of algorithms with IV generator
crypto: pcrypt - Use the online cpumask as the default
padata: Fix cpu hotplug
padata: Use the online cpumask as the default
padata: Add a reference to the api documentation
|
|
Pull cpumask cleanups from Rusty Russell:
"(Somehow forgot to send this out; it's been sitting in linux-next, and
if you don't want it, it can sit there another cycle)"
I'm a sucker for things that actually delete lines of code.
Fix up trivial conflict in arch/arm/kernel/kprobes.c, where Rusty fixed
a user of &cpu_online_map to be cpu_online_mask, but that code got
deleted by commit b21d55e98ac2 ("ARM: 7332/1: extract out code patch
function from kprobes").
* tag 'for-linus' of git://github.com/rustyrussell/linux:
cpumask: remove old cpu_*_map.
documentation: remove references to cpu_*_map.
drivers/cpufreq/db8500-cpufreq: remove references to cpu_*_map.
remove references to cpu_*_map in arch/
|
|
Builds of the MIPS platform ip32_defconfig fails as of commit
0195c00244dc ("Merge tag 'split-asm_system_h ...") because MIPS xchg()
macro uses BUILD_BUG_ON and it was moved in commit b81947c646bf
("Disintegrate asm/system.h for MIPS").
The root cause is that the system.h split wasn't tested on a baseline
with commit 6c03438edeb5 ("kernel.h: doesn't explicitly use bug.h, so
don't include it.")
Since this file uses BUG code in several other places besides the xchg
call, simply make the inclusion explicit.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cgroup/for-3.5 contains the following changes which blk-cgroup needs
to proceed with the on-going cleanup.
* Dynamic addition and removal of cftypes to make config/stat file
handling modular for policies.
* cgroup removal update to not wait for css references to drain to fix
blkcg removal hang caused by cfq caching cfqgs.
Pull in cgroup/for-3.5 into block/for-3.5/core. This causes the
following conflicts in block/blk-cgroup.c.
* 761b3ef50e "cgroup: remove cgroup_subsys argument from callbacks"
conflicts with blkiocg_pre_destroy() addition and blkiocg_attach()
removal. Resolved by removing @subsys from all subsys methods.
* 676f7c8f84 "cgroup: relocate cftype and cgroup_subsys definitions in
controllers" conflicts with ->pre_destroy() and ->attach() updates
and removal of modular config. Resolved by dropping forward
declarations of the methods and applying updates to the relocated
blkio_subsys.
* 4baf6e3325 "cgroup: convert all non-memcg controllers to the new
cftype interface" builds upon the previous item. Resolved by adding
->base_cftypes to the relocated blkio_subsys.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Currently, cgroup removal tries to drain all css references. If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir). To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior. This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch. Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
|
|
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.
This is currently achieved by dropping the refcnt to zero iff it has
only the base ref. If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css. If not, the
base ref is restored. While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.
This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.
This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative. After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.
css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages. While
at it, relocate and reformat comments in cgroup_has_css_refs().
This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
Implement cgroup_rm_cftypes() which removes an array of cftypes from a
subsystem. It can be called whether the target subsys is attached or
not. cgroup core will remove the specified file from all existing
cgroups.
This will be used to improve sub-subsys modularity and will be helpful
for unified hierarchy.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
This patch adds cfent (cgroup file entry) which is the association
between a cgroup and a file. This is in-cgroup representation of
files under a cgroup directory. This simplifies walking walking
cgroup files and thus cgroup_clear_directory(), which is now
implemented in two parts - cgroup_rm_file() and a loop around it.
cgroup_rm_file() will be used to implement cftype removal and cfent is
scheduled to serve cgroup specific per-file data (e.g. for sysfs-like
"sever" semantics).
v2: - cfe was freed from cgroup_rm_file() which led to use-after-free
if the file had openers at the time of removal. Moved to
cgroup_diput().
- cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs
wasn't empty after removing all files. This triggered
spuriously if some files were open during directory clearing.
Removed.
v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be
spuriously triggered for root cgroups because they don't go
through cgroup_clear_directory() on unmount. Don't trigger WARN
for root cgroups.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
|
|
Move the two macros upwards as they'll be used earlier in the file.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
No controller is using cgroup_add_files[s](). Unexport them, and
convert cgroup_add_files() to handle NULL entry terminated array
instead of taking count explicitly and continue creation on failure
for internal use.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
net_cls and device controllers to use the new cftype based interface.
Termination entry is added to cftype arrays and populate callbacks are
replaced with cgroup_subsys->base_cftypes initializations.
This is functionally identical transformation. There shouldn't be any
visible behavior change.
memcg is rather special and will be converted separately.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vivek Goyal <vgoyal@redhat.com>
|
|
Now that cftype can express whether a file should only be on root,
cft_release_agent can be merged into the base files cftypes array.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
Currently, cgroup directories are populated by subsys->populate()
callback explicitly creating files on each cgroup creation. This
level of flexibility isn't needed or desirable. It provides largely
unused flexibility which call for abuses while severely limiting what
the core layer can do through the lack of structure and conventions.
Per each cgroup file type, the only distinction that cgroup users is
making is whether a cgroup is root or not, which can easily be
expressed with flags.
This patch introduces cgroup_add_cftypes(). These deal with cftypes
instead of individual files - controllers indicate that certain types
of files exist for certain subsystem. Newly added CFTYPE_*_ON_ROOT
flags indicate whether a cftype should be excluded or created only on
the root cgroup.
cgroup_add_cftypes() can be called any time whether the target
subsystem is currently attached or not. cgroup core will create files
on the existing cgroups as necessary.
Also, cgroup_subsys->base_cftypes is added to ease registration of the
base files for the subsystem. If non-NULL on subsys init, the cftypes
pointed to by ->base_cftypes are automatically registered on subsys
init / load.
Further patches will convert the existing users and remove the file
based interface. Note that this interface allows dynamic addition of
files to an active controller. This will be used for sub-controller
modularity and unified hierarchy in the longer term.
This patch implements the new mechanism but doesn't apply it to any
user.
v2: replaced DECLARE_CGROUP_CFTYPES[_COND]() with
cgroup_subsys->base_cftypes, which works better for cgroup_subsys
which is loaded as module.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
Build a list of all cgroups anchored at cgroupfs_root->allcg_list and
going through cgroup->allcg_node. The list is protected by
cgroup_mutex and will be used to improve cgroup file handling.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
cgroup_populate_dir() currently clears all files and then repopulate
the directory; however, the clearing part is only useful when it's
called from cgroup_remount(). Relocate the invocation to
cgroup_remount().
This is to prepare for further cgroup file handling updates.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
|
|
This patch marks the following features for deprecation.
* Rebinding subsys by remount: Never reached useful state - only works
on empty hierarchies.
* release_agent update by remount: release_agent itself will be
replaced with conventional fsnotify notification.
v2: Lennart pointed out that "name=" is necessary for mounts w/o any
controller attached. Drop "name=" deprecation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq()
sched: Fix __schedule_bug() output when called from an interrupt
sched/arch: Introduce the finish_arch_post_lock_switch() scheduler callback
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates and fixes from Ingo Molnar:
"It's mostly fixes, but there's also two late items:
- preliminary GTK GUI support for perf report
- PMU raw event format descriptors in sysfs, to be parsed by tooling
The raw event format in sysfs is a new ABI. For example for the 'CPU'
PMU we have:
aldebaran:~> ll /sys/bus/event_source/devices/cpu/format/*
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/any
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/cmask
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/edge
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/event
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/inv
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/offcore_rsp
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/pc
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/umask
those lists of fields contain a specific format:
aldebaran:~> cat /sys/bus/event_source/devices/cpu/format/offcore_rsp
config1:0-63
So, those who wish to specify raw events can now use the following
event format:
-e cpu/cmask=1,event=2,umask=3
Most people will not want to specify any events (let alone raw
events), they'll just use whatever default event the tools use.
But for more obscure PMU events that have no cross-architecture
generic events the above syntax is more usable and a bit more
structured than specifying hex numbers."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
perf tools: Remove auto-generated bison/flex files
perf annotate: Fix off by one symbol hist size allocation and hit accounting
perf tools: Add missing ref-cycles event back to event parser
perf annotate: addr2line wants addresses in same format as objdump
perf probe: Finder fails to resolve function name to address
tracing: Fix ent_size in trace output
perf symbols: Handle NULL dso in dso__name_len
perf symbols: Do not include libgen.h
perf tools: Fix bug in raw sample parsing
perf tools: Fix display of first level of callchains
perf tools: Switch module.h into export.h
perf: Move mmap page data_head offset assertion out of header
perf: Fix mmap_page capabilities and docs
perf diff: Fix to work with new hists design
perf tools: Fix modifier to be applied on correct events
perf tools: Fix various casting issues for 32 bits
perf tools: Simplify event_read_id exit path
tracing: Fix ftrace stack trace entries
tracing: Move the tracing_on/off() declarations into CONFIG_TRACING
perf report: Add a simple GTK2-based 'perf report' browser
...
|
|
This option has been selected from arch code as it was assumed that
it's necessary to support oneshot mode clockevent devices. But it's
just a core internal helper to compile tick-oneshot.c if NOHZ or
HIG_RES_TIMERS are selected.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Maintain a per-mm counter: number of uprobes that are inserted
on this process address space.
This counter can be used at probe hit time to determine if we
need a lookup in the uprobes rbtree. Everytime a probe gets
inserted successfully, the probe count is incremented and
everytime a probe gets removed, the probe count is decremented.
The new uprobe_munmap hook ensures the count is correct on a
unmap or remap of a region. We expect that once a
uprobe_munmap() is called, the vma goes away. So
uprobe_unregister() finding a probe to unregister would either
mean unmap event hasnt occurred yet or a mmap event on the same
executable file occured after a unmap event.
Additionally, uprobe_mmap hook now also gets called:
a. on every executable vma that is COWed at fork.
b. a vma of interest is newly mapped; breakpoint insertion also
happens at the required address.
On process creation, make sure the probes count in the child is
set correctly.
Special cases that are taken care include:
a. mremap
b. VM_DONTCOPY vmas on fork()
c. insertion/removal races in the parent during fork().
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120330182646.10018.85805.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Uprobes executes the original instruction at a probed location
out of line. For this, we allocate a page (per mm) upon the
first uprobe hit, in the process user address space, divide it
into slots that are used to store the actual instructions to be
singlestepped. These slots are known as xol (execution out of
line) slots.
Care is taken to ensure that the allocation is in an unmapped
area as close to the top of the user address space as possible,
with appropriate permission settings to keep selinux like
frameworks happy.
Upon a uprobe hit, a free slot is acquired, and is released
after the singlestep completes.
Lots of improvements courtesy suggestions/inputs from Peter and
Oleg.
[ Folded a fix for build issue on powerpc fixed and reported by
Stephen Rothwell. ]
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120330182631.10018.48175.sendpatchset@srdronam.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The function for_each_cpu_mask() expects a *pointer* to struct
cpumask as its second argument, whereas select_fallback_rq()
passes the value itself.
And moreover, for_each_cpu_mask() has been marked as obselete
in include/linux/cpumask.h. So move to the more appropriate
for_each_cpu() variant.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Dave Jones <davej@redhat.com>
Cc: Liu Chuansheng <chuansheng.liu@intel.com>
Cc: vapier@gentoo.org
Cc: rusty@rustcorp.com.au
Link: http://lkml.kernel.org/r/4F75BED4.9050005@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|