Commit Graph

48907 Commits

Author SHA1 Message Date
Linus Torvalds
437538267b fbdev updates for 3.4
It includes:
 - drivers for Samsung Exynos MIPI DSI and display port
 - i740fb to support those old Intel chips
 - large updates to OMAP, viafb and sh_mobile_lcdcfb
 - some updates to s3c-fb and udlfb, few patches to others
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPa2qaAAoJECSVL5KnPj1PoT8P/0pcaOEGNhBlATgTppNm4RQs
 CPFwWQZSlvd6/3HDkTy3FDWzNMNlY+SriBdei42kSKh66nNHwL2Os8QYWF1qXhUA
 0tvINk7A7SpcqG8p8kpBaA7/jZxlfLEhHLQC48lu4hP2YC93AXv8Pyi/C8Dnvhka
 Jwq/puNT9G3Rmis8MWmI2gI7FQKl9CpQZOCodVzWJCmczQVI/QPHTg6+VJ0gkO3s
 YfUuGvoPSHpU7kUzgcoZX5kjYD0ZsGRKVcn4yKKosyPzmZLC6/q92pSjOvGr82+s
 wuVIMshgMdcvrxcoNdakKPvBFqMhfeMYiSYqvUr3yR2yKls/d5RDleIjND8YivzX
 wX6RAS2XUSBS76MGNHZrFo6z4CsHCvkN5cr/4hAz/SGCDXiQGD1JimuWHUaX4Pyj
 xKY20m08RoqQoMgt8+4V6xxc/Gi0TTkoMoo12OqtRssXzb/5BRgVrEh1PKgjE7Az
 tr1QlXnDhsMy9N0L+jNQxd9kwQrpxmtX8kXZjahgGtAo/nxCXUMVVfQDPxugzCJu
 GxSbAf8BRv+GBHTQSsn6LvlqhS60gZG5Yqphol88zDMPo2T8XLn8Kk7xUGUrnQYk
 ENxr9ui1rVJmfb/3bR6kqs6pWsuD7IATtno6XW3t4ASL+2HI/xEIs/zx3VxKdRDU
 eP+p5Bi46FdwQfnPKPsN
 =uO8m
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-updates-for-3.4' of git://github.com/schandinat/linux-2.6

Pull fbdev updates for 3.4 from Florian Tobias Schandinat:
 - drivers for Samsung Exynos MIPI DSI and display port
 - i740fb to support those old Intel chips
 - large updates to OMAP, viafb and sh_mobile_lcdcfb
 - some updates to s3c-fb and udlfb, few patches to others

Fix up conflicts in drivers/video/udlfb.c due to Key Sievers' fix making
it in twice.

* tag 'fbdev-updates-for-3.4' of git://github.com/schandinat/linux-2.6: (156 commits)
  Revert "video:uvesafb: Fix oops that uvesafb try to execute NX-protected page"
  OMAPDSS: register dss drivers in module init
  video: pxafb: add clk_prepare/clk_unprepare calls
  fbdev: bfin_adv7393fb: Drop needless include
  fbdev: sh_mipi_dsi: add extra phyctrl for sh_mipi_dsi_info
  fbdev: remove dependency of FB_SH_MOBILE_MERAM from FB_SH_MOBILE_LCDC
  Revert "MAINTAINERS: add entry for exynos mipi display drivers"
  fbdev: da8xx: add support for SP10Q010 display
  fbdev: da8xx:: fix reporting of the display timing info
  drivers/video/pvr2fb.c: ensure arguments to request_irq and free_irq are compatible
  OMAPDSS: APPLY: fix clearing shadow dirty flag with manual update
  fbdev: sh_mobile_meram: Implement system suspend/resume
  fbdev: sh_mobile_meram: Remove unneeded sanity checks
  fbdev: sh_mobile_meram: Don't perform update in register operation
  arm: mach-shmobile: Constify sh_mobile_meram_cfg structures
  fbdev: sh_mobile_lcdc: Don't store copy of platform data
  fbdev: sh_mobile_meram: Remove unused sh_mobile_meram_icb_cfg fields
  arm: mach-shmobile: Don't set MERAM ICB numbers in platform data
  fbdev: sh_mobile_meram: Allocate ICBs automatically
  fbdev: sh_mobile_meram: Use genalloc to manage MERAM allocation
  ...
2012-03-22 20:43:40 -07:00
Linus Torvalds
9586c959bf Things are really quieting down with the regmap API, while we're still
seeing a trickle of new features coming in they're getting much smaller
 than they were.  It's also nice to have some features which support
 other subsystems building infrastructure on top of regmap.  Highlights
 include:
 
 - Support for padding between the register and the value when
   interacting with the device, sometimes needed for fast interfaces.
 - Support for applying register updates to the device when restoring the
   register state.  This is intended to be used to apply updates supplied by
   manufacturers for tuning the performance of the device (many of which
   are to undocumented registers which aren't otherwise covered).
 - Support for multi-register operations on cached registers.
 - Support for syncing only part of the register cache.
 - Stubs and parameter query functions intended to make it easier for other
   subsystems to build infrastructure on top of the regmap API.
 
 plus a few driver updates making use of the new features which it was
 easier to merge via this tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPYJuOAAoJEBus8iNuMP3dauMP/1mYgILz0lpRHjGmUF86vQre
 AcualwUE4UY/WacyUkke72kxa9jcznwzbFjKKNSvL3rLnNy+QPY8Z9v6zBDL90or
 D9Ok8nRVRldIIDlDE708b10AP9sDSB25ra9IVVPzOEX/0NKoE+Y7ZkXcn0s3zGgI
 Y+bLwd1uufFopMpV3m5gXipi1/+PEK+jO7q6vgdUp3C1TcMzOqSyCg+uuHWffHGp
 iO/1XzdxNGx9BTDO/XDEqxUMRnjsQg/VS9JN3CMz8gXwxXD3zrWB/9+SMIfDb5Iy
 /iXqc58uJ6PTY87t5q9TEGyRKo0Xj7NEPnW4isXg/3r0UUb8kls3frXKigtLEUb7
 wnwQD/GCRvXOTbC6TUkFDiZ3OX1qLmnk8YMQ6xhQlbNGM7jJfzj/fFiwBdre58BC
 iKPdF9gfL/gyH5yefySau/YeYqJUbVLzdOAfYVDkjApmQJv67CrPPd96xAsEsTFU
 YojkF9NcapBnk6Vs4adzjxD1YCTThaXnFtUSu/bBNZu1xNFD12TORl5fs0OedUe8
 zvPMZEEKrE5CxHhQNB6j2Z0zajNOgsh183mNSr2VJK1vI4o4pY7MBENYYPzFiPB4
 BfX8KFftxu8O50OVZnweZ80LKVZ9fAo57oWlgR8lfaEbetjY0WdRYOyDT8w5jrtW
 nU+mtlQLc5SmugTs+CiD
 =4Eo9
 -----END PGP SIGNATURE-----

Merge tag 'regmap-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "Things are really quieting down with the regmap API, while we're still
  seeing a trickle of new features coming in they're getting much
  smaller than they were.  It's also nice to have some features which
  support other subsystems building infrastructure on top of regmap.
  Highlights include:

  - Support for padding between the register and the value when
    interacting with the device, sometimes needed for fast interfaces.
  - Support for applying register updates to the device when restoring
    the register state.  This is intended to be used to apply updates
    supplied by manufacturers for tuning the performance of the device
    (many of which are to undocumented registers which aren't otherwise
    covered).
  - Support for multi-register operations on cached registers.
  - Support for syncing only part of the register cache.
  - Stubs and parameter query functions intended to make it easier for
    other subsystems to build infrastructure on top of the regmap API.

  plus a few driver updates making use of the new features which it was
  easier to merge via this tree."

* tag 'regmap-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (41 commits)
  regmap: Fix future missing prototype of devres_alloc() and friends
  regmap: Rejig struct declarations for stubbed API
  regmap: Fix rbtree block base in sync
  regcache: Make sure we sync register 0 in an rbtree cache
  regmap: delete unused module.h from drivers/base/regmap files
  regmap: Add stub for regcache_sync_region()
  mfd: Improve performance of later WM1811 revisions
  regmap: Fix x86_64 breakage
  regmap: Allow drivers to sync only part of the register cache
  regmap: Supply ranges to the sync operations
  regmap: Add tracepoints for cache only and cache bypass
  regmap: Mark the cache as clean after a successful sync
  regmap: Remove default cache sync implementation
  regmap: Skip hardware defaults for LZO caches
  regmap: Expose the driver name in debugfs
  mfd: wm8400: Convert to devm_regmap_init_i2c()
  mfd: wm831x: Convert to devm_regmap_init()
  mfd: wm8994: Convert to devm_regmap_init()
  mfd/ASoC: Convert WM8994 driver to use regmap patches
  mfd: Add __devinit and __devexit annotations in wm8994
  ...
2012-03-22 20:33:14 -07:00
Linus Torvalds
34699403e9 IEEE 1394 (FireWire) subsystem updates post v3.3:
- Some SBP-2 initiator fixes, side product from ongoing work on a target.
   - Reintroduction of an isochronous I/O feature of the older ieee1394 driver
     stack (flush buffer completions); it was evidently rarely used but not
     actually unused.  Matching libraw1394 code is already available.
   - Be sure to prefix all kernel log messages with device name or card name,
     and other logging related cleanups.
   - Misc other small cleanups, among them a small API change that affects
     sound/firewire/ too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.17 (GNU/Linux)
 
 iQIcBAABAgAGBQJPa5I1AAoJEHnzb7JUXXnQvUoQAMl9PhUk5ZFhWp0AOnQ4uLhI
 lEfRnUp94kGBdazBhxM9wtAwZRAeXUev/JyxwymMKSG40dMTbuxqRcs71v6a+ifd
 VqNctL0yUncrOw/92l+TG2t/hWttB4u+dTKYX2U5yza42+uUHWMZb7MzmV+qVYc8
 H+NR71WLQM4wkWdX8LBxmdeAOm0X635cjKsC/5FX9dws7q1ebSoxs4q4iIaGR7W8
 ETWx5lh/UVyR7c9T+VIr0jfQWdsm2IcmHr/+nldlesePZ1gRjIEi69ErEnGxTkGe
 NLPwt9lWuFXgWWHBON7C/rLmBA+NSys9lbvRAsPXrb3GpOKlde81c7U7Kr/kmEkh
 hB9oM2Qh0A/7sglvIZiDUP565lqOAbXSJzziG3+0XgOP2zsxukm5gSecF8qM8tHY
 IDwN05R9+nc26NA5TOfaRWx08n9SqTxq4V326oz9WMuK4bosCEfg4dvMwyMK/V3i
 AyipAl2YYIG/2JURMFcGSKbw33dBw3mRsS8XG3MXwzagUMw/8tSyZKQIwF9qO4si
 69QV7+CJoEfbJiLJMZJnKrRjfU+ZVRNA/xFuHUmhpmvYIbN8iJVGpGZABfXBUcH0
 c1+qX9zE4NEAUEylbgn5raYSY6otF51O8QJzQOn2HRddBQSDpEwhkOGVfZ7zcSLH
 sjAOn9qLIMHnrxUXxBDP
 =oWbr
 -----END PGP SIGNATURE-----

Merge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull IEEE 1394 (FireWire) subsystem updates post v3.3 from Stefan Richter:

 - Some SBP-2 initiator fixes, side product from ongoing work on a target.

 - Reintroduction of an isochronous I/O feature of the older ieee1394 driver
   stack (flush buffer completions); it was evidently rarely used but not
   actually unused.  Matching libraw1394 code is already available.

 - Be sure to prefix all kernel log messages with device name or card name,
   and other logging related cleanups.

 - Misc other small cleanups, among them a small API change that affects
   sound/firewire/ too. Clemens Ladisch is aware of it.

* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: (26 commits)
  firewire: allow explicit flushing of iso packet completions
  firewire: prevent dropping of completed iso packet header data
  firewire: ohci: factor out iso completion flushing code
  firewire: ohci: simplify iso header pointer arithmetic
  firewire: ohci: optimize control bit checks
  firewire: ohci: remove unused excess_bytes field
  firewire: ohci: copy_iso_headers(): make comment match the code
  firewire: cdev: fix IR multichannel event documentation
  firewire: ohci: fix too-early completion of IR multichannel buffers
  firewire: ohci: move runtime debug facility out of #ifdef
  firewire: tone down some diagnostic log messages
  firewire: sbp2: replace a GFP_ATOMIC allocation
  firewire: sbp2: Fix SCSI sense data mangling
  firewire: sbp2: Ignore SBP-2 targets on the local node
  firewire: sbp2: Take into account Unit_Unique_ID
  firewire: nosy: Use the macro DMA_BIT_MASK().
  firewire: core: convert AR-req handler lock from _irqsave to _bh
  firewire: core: fix race at address_handler unregistration
  firewire: core: remove obsolete comment
  firewire: core: prefix log messages with card name
  ...
2012-03-22 20:31:15 -07:00
Linus Torvalds
7fc86a7908 Pinctrl updates for v3.4:
- Switches the PXA 168, 910 and MMP over to use pinctrl
 - Locking revamped
 - Massive refactorings...
 - Reform the driver API to use multiple states
 - Support pin config in the mapping tables
 - Pinctrl drivers for the nVidia Tegra series
 - Generic pin config support lib for simple pin controllers
 - Implement pin config for the U300
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPazYTAAoJEEEQszewGV1z09IQAIcsnFO5s7XfvfXhwNCliiKp
 myVMJdYcx8GSOg5Oif9uektcBi0gB2ppegxVYDxDq8FZqZQ4d3pYDZel3tDBGKJI
 9ZXemh1iIeXwLZBXSENxn0NZokAZVDiyBqi50zsINV+riFfyzL9Ai7o3YkEx9xBG
 nJ6S9wthzeF2qF8/NBvMYgCefZXIKJ8miwVrb9MusWUBfeAY7F8jVNlK9Asp9ruv
 XEC+WcrWRttB/8+5z3FkPPtEnJ+0Y2p6e6+hn8Cel0UWb4+WDijumO+hTyIiNYSc
 ryzFMqX18L8taaDhxQZ1kRDL5S7Qu7N3HYu+4wBvPI3D81MroS5frJbUwPcD8WsK
 Pw/NZtAwJhp4B9/LVOOoAyBoRTfeIwrbp+ou2zF5m70hpBD5D1mN/2xVBBQgjm9T
 oY9jNoJ2PhZP74fO4jdf4iMW1j5ZmzwxLXlpT3vGJeONIUg2hE0FJjzoeOjRsUtO
 iC0m7Q1AbJpIcjJlcK+gV/ZclaOWy4iDXDXFs4A3XA+ryf/ylDL5G+c/1Qu6CMyF
 4AhKs78Fg7agtVYX/QYuie9BbasRjipxiPc0t3/46Abs7WRzvpmBWWktFE4yy980
 93jLyog0VROHIOl974Fn0GrNJeNUTavc5DEd32zlc7AiAibOzyuRztwJ5GZbuZAN
 SE0VyQBIIdkaF51XV+IO
 =Gg7v
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl updates for v3.4 from Linus Walleij (*):
 - Switches the PXA 168, 910 and MMP over to use pinctrl
 - Locking revamped
 - Massive refactorings...
 - Reform the driver API to use multiple states
 - Support pin config in the mapping tables
 - Pinctrl drivers for the nVidia Tegra series
 - Generic pin config support lib for simple pin controllers
 - Implement pin config for the U300

* tag 'pinctrl-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (48 commits)
  ARM: u300: configure some pins as an example
  pinctrl: support pinconfig on the U300
  pinctrl/coh901: use generic pinconf enums and parameters
  pinctrl: introduce generic pin config
  pinctrl: fix error path in pinconf_map_to_setting()
  pinctrl: allow concurrent gpio and mux function ownership of pins
  pinctrl: forward-declare struct device
  pinctrl: split pincontrol states into its own header
  pinctrl: include machine header to core.h
  ARM: tegra: Select PINCTRL Kconfig variables
  pinctrl: add a driver for NVIDIA Tegra
  pinctrl: Show selected function and group in pinmux-pins debugfs
  pinctrl: enhance mapping table to support pin config operations
  pinctrl: API changes to support multiple states per device
  pinctrl: add usecount to pins for muxing
  pinctrl: refactor struct pinctrl handling in core.c vs pinmux.c
  pinctrl: fix and simplify locking
  pinctrl: fix the pin descriptor kerneldoc
  pinctrl: assume map table entries can't have a NULL name field
  pinctrl: introduce PINCTRL_STATE_DEFAULT, define hogs as that state
  ...

(*) What is it with all these Linuses these days? There's a Linus at
    google too.  Some day I will get myself my own broadsword, and run
    around screaming "There can be only one".

    I used to be _special_ dammit. Snif.
2012-03-22 20:25:50 -07:00
Linus Torvalds
7bfe0e66d5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
 "- we finally merged driver for USB version of Synaptics touchpads
    (I guess most commonly found in IBM/Lenovo keyboard/touchpad combo);

   - a bunch of new drivers for embedded platforms (Cypress
     touchscreens, DA9052 OnKey, MAX8997-haptic, Ilitek ILI210x
     touchscreens, TI touchscreen);

   - input core allows clients to specify desired clock source for
     timestamps on input events (EVIOCSCLOCKID ioctl);

   - input core allows querying state of all MT slots for given event
     code via EVIOCGMTSLOTS ioctl;

   - various driver fixes and improvements."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits)
  Input: ili210x - add support for Ilitek ILI210x based touchscreens
  Input: altera_ps2 - use of_match_ptr()
  Input: synaptics_usb - switch to module_usb_driver()
  Input: convert I2C drivers to use module_i2c_driver()
  Input: convert SPI drivers to use module_spi_driver()
  Input: omap4-keypad - move platform_data to <linux/platform_data>
  Input: kxtj9 - who_am_i check value and initial data rate fixes
  Input: add driver support for MAX8997-haptic
  Input: tegra-kbc - revise device tree support
  Input: of_keymap - add device tree bindings for simple key matrices
  Input: wacom - fix physical size calculation for 3rd-gen Bamboo
  Input: twl4030-vibra - really switch from #if to #ifdef
  Input: hp680_ts_input - ensure arguments to request_irq and free_irq are compatible
  Input: max8925_onkey - avoid accessing input device too early
  Input: max8925_onkey - allow to be used as a wakeup source
  Input: atmel-wm97xx - convert to dev_pm_ops
  Input: atmel-wm97xx - set driver owner
  Input: add cyttsp touchscreen maintainer entry
  Input: cyttsp - remove useless checks in cyttsp_probe()
  Input: usbtouchscreen - add support for Data Modul EasyTouch TP 72037
  ...
2012-03-22 20:20:18 -07:00
Linus Torvalds
d4c6fa73fe Features:
- PV multiconsole support, so that there can be hvc1, hvc2, etc;
  - P-state and C-state power management driver that uploads said
    power management data to the hypervisor. It also inhibits cpufreq
    scaling drivers to load so that only the hypervisor can make power
    management decisions - fixing a weird perf bug.
  - Function Level Reset (FLR) support in the Xen PCI backend.
 Fixes:
  - Kconfig dependencies for Xen PV keyboard and video
  - Compile warnings and constify fixes
  - Change over to use percpu_xxx instead of this_cpu_xxx
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZ0qkAAoJEFjIrFwIi8fJjCgH/jeJ39E8ML8DP9tCS2HQnMqM
 uTEjLcqvoJ7sEhHvtBLPeG2p0jyBvOWjLbSc7P8nESBAMPvSYol8L6WqfWrdSU4r
 lHrma2sg9UYzRog5NyxAgkp7bBsBBFOnhVL3Cxb5Ig78cPWzeSWGpqGZ8M/d51Wf
 1iE0tHuU4DpN+fg1SZqPqEm8ecEJ/eSrVTnyTx/Qo2Ak+Zw98SqzX7SV5lo8mudd
 WFL1F2K9FyTNk79ndGhqFt36x6nEbFgMLbmCDWumLuWN6bMd1Uq0wNkCqW4F1h28
 3yqnY+rfQh4y3eXK1B9nttCUTs+/66U5ZWrT6B1IJumGTAIqcWfgeUX/Vn/HVC4=
 =tfMc
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen updates from Konrad Rzeszutek Wilk:
 "which has three neat features:

   - PV multiconsole support, so that there can be hvc1, hvc2, etc; This
     can be used in HVM and in PV mode.

   - P-state and C-state power management driver that uploads said power
     management data to the hypervisor.  It also inhibits cpufreq
     scaling drivers to load so that only the hypervisor can make power
     management decisions - fixing a weird perf bug.

     There is one thing in the Kconfig that you won't like: "default y
     if (X86_ACPI_CPUFREQ = y || X86_POWERNOW_K8 = y)" (note, that it
     all depends on CONFIG_XEN which depends on CONFIG_PARAVIRT which by
     default is off).  I've a fix to convert that boolean expression
     into "default m" which I am going to post after the cpufreq git
     pull - as the two patches to make this work depend on a fix in Dave
     Jones's tree.

   - Function Level Reset (FLR) support in the Xen PCI backend.

  Fixes:

   - Kconfig dependencies for Xen PV keyboard and video
   - Compile warnings and constify fixes
   - Change over to use percpu_xxx instead of this_cpu_xxx"

Fix up trivial conflicts in drivers/tty/hvc/hvc_xen.c due to changes to
a removed commit.

* tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
  xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.
  xen: constify all instances of "struct attribute_group"
  xen/xenbus: ignore console/0
  hvc_xen: introduce HVC_XEN_FRONTEND
  hvc_xen: implement multiconsole support
  hvc_xen: support PV on HVM consoles
  xenbus: don't free other end details too early
  xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
  xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
  xenbus: address compiler warnings
  xen: use this_cpu_xxx replace percpu_xxx funcs
  xen/pciback: Support pci_reset_function, aka FLR or D3 support.
  pci: Introduce __pci_reset_function_locked to be used when holding device_lock.
  xen: Utilize the restore_msi_irqs hook.
2012-03-22 20:16:14 -07:00
Linus Torvalds
aab008db80 Cleanups: rename of flush to invalidate, moving reporting of statistics
into debugfs, and use __read_mostly as neccessary.
 Also add a MAINTAINER file for cleancache API files.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZ1sjAAoJEFjIrFwIi8fJCSkIALqBBCkuPSusvdebT59Oq145
 EW2eEwJzVft0vlvS/IPl+W37TH5VWhBVCW8frAYMpLaY5X/W1ZwzFpx76T4acAiM
 wkXwsyYXs6N13OOFoH5gmf3cwproAioRxbEeALucxeMR4rK5Yw+oZXT+hSu2KaLh
 1LtHyx+u+NLb0QOseAuDcmyjY9r6aLBA1HMcVD2z+4UW1n/9NWexQP3ShYP9uMDs
 GnyDzZ8vWXTcoc4Auj0rpaNsT5d47ltGegKASZmmvS3QyMHbZ4sk2HECnQY4wSee
 alPJwDyb6mOJmQ3e4s940onjfZPgd8/cW/DVEUrveH+dz6Eqjxqz41dVRVXiLz8=
 =tUNC
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm

Pull cleancache changes from Konrad Rzeszutek Wilk:
 "This has some patches for the cleancache API that should have been
  submitted a _long_ time ago.  They are basically cleanups:

   - rename of flush to invalidate

   - moving reporting of statistics into debugfs

   - use __read_mostly as necessary.

  Oh, and also the MAINTAINERS file change.  The files (except the
  MAINTAINERS file) have been in #linux-next for months now.  The late
  addition of MAINTAINERS file is a brain-fart on my side - didn't
  realize I needed that just until I was typing this up - and I based
  that patch on v3.3 - so the tree is on top of v3.3."

* tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
  MAINTAINERS: Adding cleancache API to the list.
  mm: cleancache: Use __read_mostly as appropiate.
  mm: cleancache: report statistics via debugfs instead of sysfs.
  mm: zcache/tmem/cleancache: s/flush/invalidate/
  mm: cleancache: s/flush/invalidate/
2012-03-22 19:52:47 -07:00
Linus Torvalds
09fa302261 Merge branch 'drm-radeon-sitn-support' of git://people.freedesktop.org/~airlied/linux
Pull radeon southern islands / trinity support from Dave Airlie:
 "This is support from AMD for their newest GPU and APUs.  The products
  called RadeonHD 7xxx, and the Trinity APU series.

  This did come in a bit late, due to some over-complicated AMD internal
  review process, which from the outside seems unnecessary once the
  company has decided it wants to support open source.  However as I
  said previously I'd rather not put the people who've got this hw for 3
  months now being forced to use fglrx on it if there is open code.

  Its pretty well self contained and just plugs into the driver in
  various places."

* 'drm-radeon-sitn-support' of git://people.freedesktop.org/~airlied/linux: (48 commits)
  drm/radeon/kms: update duallink checks for DCE6
  drm/radeon/kms: add trinity pci ids
  drm/radeon/kms: add radeon_asic struct for trinity
  drm/radeon/kms: add support for ucode loading on trinity (v2)
  drm/radeon/kms/vm: set vram base offset properly for TN
  drm/radeon/kms: Update evergreen functions for trinity
  drm/radeon/kms: cayman gpu init updates for trinity
  drm/radeon/kms: Add checks for TN in the DP bridge code
  drm/radeon/kms/DCE6.1: ss is not supported on the internal pplls
  drm/radeon/kms: disable PPLL0 on DCE6.1 when not in use
  drm/radeon/kms: Adjust pll picker for DCE6.1
  drm/radeon/kms: DCE6.1 disp eng pll updates
  drm/radeon/kms: DCE6.1 watermark updates for TN
  drm/radeon/kms: no support for internal thermal sensor on TN yet
  drm/radeon/kms: add trinity (TN) chip family
  drm/radeon/kms: Add SI pci ids
  drm/radeon: Update radeon_info_ioctl for SI. (v2)
  drm/radeon/kms: add radeon_asic struct for SI
  drm/radeon/kms: add support for compute rings in CS ioctl on SI
  drm/radeon/kms: fill in startup/shutdown callbacks for SI
  ...
2012-03-22 13:23:46 -07:00
Linus Torvalds
be53bfdb80 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm main changes from Dave Airlie:
 "This is the main drm pull request, I'm probably going to send two more
  smaller ones, will explain below.

  This contains a patch that is also in the fbdev tree, but it should be
  the same patch, it added an API for hot unplugging framebuffer
  devices, and I need that API for a new driver.

  It also contains some changes to the i2c tree which Jean has acked,
  and one change to moorestown platform stuff in x86.

  Highlights:
   - new drivers: UDL driver for USB displaylink devices, kms only,
     should support correct hotplug operations.
   - core: i2c speedups + better hotplug support, EDID overriding via
     firmware interface - allows user to load a firmware for a broken
     monitor/kvm from userspace, it even has documentation for it.
   - exynos: new HDMI audio + hdmi 1.4 + virtual output driver
   - gma500: code cleanup
   - radeon: cleanups, CS optimisations, streamout support and pageflip
     fix
   - nouveau: NVD9 displayport support + more reclocking work
   - i915: re-enabling GMBUS, finish gpu patch (might help hibernation
     who knows), missed irq fixes, stencil tiling fixes, interlaced
     support, aliasesd PPGTT support for SNB/IVB, swizzling for SNB/IVB,
     semaphore fixes

  As well as the usual bunch of cleanups and fixes all over the place.

  I've got two things I'd like to merge a bit later:

   a) AMD support for all their new radeonhd 7000 series GPU and APUs.
      AMD dropped this a bit late due to insane internal review
      processes, (please AMD just follow Intel and let open source guys
      ship stuff early) however I don't want to penalise people who own
      this hardware (since its been on sale for 3-4 months and GPU hw
      doesn't exactly have a lifetime in years) and consign them to
      using closed drivers for longer than necessary.  The changes are
      well contained and just plug into the driver new gpu functionality
      so they should be fairly regression proof.  I just want to give
      them a bit of a run on the hw AMD kindly sent me.

   b) drm prime/dma-buf interface code.  This is just infrastructure
      code to expose the dma-buf stuff to drm drivers and to userspace.
      I'm not planning on pushing any driver support in this cycle
      (except maybe exynos), but I'd like to get the infrastructure code
      in so for the next cycle I can start getting the driver support
      into the individual drivers.  We have started driver support for
      i915, nouveau and udl along with I think exynos and omap in
      staging.  However this code relies on the dma-buf tree being
      pulled into your tree first since it needs the latest interfaces
      from that tree.  I'll push to get that tree sent asap.

  (oh and any warnings you see in i915 are gcc's fault from what anyone
  can see)."

Fix up trivial conflicts in arch/x86/platform/mrst/mrst.c due to the new
msic_thermal_platform_data() thermal function being added next to the
tc35876x_platform_data() i2c device function..

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (326 commits)
  drm/i915: use DDC_ADDR instead of hard-coding it
  drm/radeon: use DDC_ADDR instead of hard-coding it
  drm: remove unneeded redefinition of DDC_ADDR
  drm/exynos: added virtual display driver.
  drm: allow loading an EDID as firmware to override broken monitor
  drm/exynos: enable hdmi audio feature
  drm/exynos: add default pixel format for plane
  drm/exynos: cleanup exynos_hdmi.h
  drm/exynos: add is_local member in exynos_drm_subdrv struct
  drm/exynos: add subdrv open/close functions
  drm/exynos: remove module of exynos drm subdrv
  drm/exynos: release pending pageflip events when closed
  drm/exynos: added new funtion to get/put dma address.
  drm/exynos: update gem and buffer framework.
  drm/exynos: added mode_fixup feature and code clean.
  drm/exynos: add HDMI version 1.4 support
  drm/exynos: remove exynos_mixer.h
  gma500: Fix mmap frambuffer
  drm/radeon: Drop radeon_gem_object_(un)pin.
  drm/radeon: Restrict offset for legacy display engine.
  ...
2012-03-22 13:08:22 -07:00
Linus Torvalds
b2094ef840 Updates of sound stuff for 3.4-rc1
Here is the first big update chunk of sound stuff for 3.4-rc1.
 
 In the common sound infrastructure, there are a few changes for
 dynamic PCM support (used in ASoC) and a few clean-ups.  Majority of
 changes are found, as usual, in HD-audio and ASoC.
 
 Some highlights of HD-audio changes:
 - All the long-standing static quirk codes for Realtek codec were
 finally removed by fixing and extending the Realtek auto-parser.
 
 - The mute-LED control is standardized over all HD-audio codec
   drivers using the extended vmaster hook.
 
 - The vmaster slave mixer elements are initialized to 0dB as default
   so that the user won't be annoyed by the silent output after
   updates, e.g. due to the additions of new elements.
 
 - Other many fix-ups for the misc HD-audio devices.
 
 In the ASoC side, this is a very active release, including a quite a
 few framework enhancements.  Some highlights:
 
 - Support for widgets not associated with a CODEC, an important part
   of the dynamic PCM framework.
 
 - A library factoring out the common code shared by dmaengine based
   DMA drivers contributed by Lars-Peter Clausen.  This will save a lot
   of code and make it much easier to deploy enhancements to
   dmaengine.
 
 - Support for binary controls, used for providing runtime
   configuration of algorithm coefficients.
 
 - A new DAPM widget type for regulator supplies allowing drivers for
   devices that can power down unused supplies while active to do
   without any per-driver code.
 
 - DAPM widgets for DAIs, initially giving a speed boost for playback
   startup and shutdown and also the basis for CODEC<->CODEC DAI link
   support.
 
 - Support for specifying the number of significant bits on audio
   interfaces, useful for allowing applications to know how much effort
   to put into generating data for a larger sample format.
 
 - Conversion of the FSI driver used on some SH processors to
   DMAEngine.
 
 - Conversion of EP93xx drivers to DMAEngine.
 
 - New CODEC drivers for Maxim MAX9768 and Wolfson Microelectronics
   WM2200.
 
 - Move audmux driver from arc/arm to sound/soc
 
 - McBSP move from arch/ to sound/ and updates
 
 Also, a few small updates and fixes for other drivers like au88x0,
 ymfpci, USB 6fire, USB usx2yaudio are included.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJPawmuAAoJEGwxgFQ9KSmkdR0QALDDI/eUo4il40C33Gt/OSDz
 Wkz+FOwp2ddbIqHc0n41sV7+FV1MuyzQ37uSw8zhK724Vd4tCqX6O5K1uvS1iSbh
 jxHsy5XtfvCs2cjb61H6N+rfWcqC69gGhpc1mLoelj/PYl7S2iV5xOgTr4trVk2Y
 UN7Y13b4hzZvubRUozoTldaIgdhrj8D8KO7qQNYehyG19b4bJ00Rk4K5JdjsFwbE
 dTGl/ZUv50Fnx6PAWwqzh1a3cPabHA1TZDiKQM2nuE91e/Ecs4c7t1CRvW8m8mlr
 u4D4N8PJcCN4SPDd2YuVBgan4SQ0kxKTaup11bHSvAWai2zPX5xMB1yoJNxgjSMt
 5NHrGdR4+lQEVlBVXe2sWb4/3vE2kr2dtcGGR/FBFJTuLWDFFtRcnxeQJI8qRNUw
 UdwDuGXdActoc1cZz2dsKvXMOs0TKT6OCdQH+dHBglW/W8wMkVocZclUgbQM66/X
 gwvk0jfZ9p3UcKnYt3RkxiXQvAJsr8v0HhYcKvQCFhJArZufdeRHB7LCVRTm692Y
 /BKZgK4QHxtGw3Yc7emYidKeRSP1ml5QlvC4zMIoGqiahqa8LI8Qcb5knvIEmU8q
 kY5k0fVP+paf0dceAVXyFZsRB9AyX2eUdufDPifXtydQZgj4o9A7Sy/teWl77EgF
 Mafq4QUzo1U4i8JpAM4d
 =5FJq
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull updates of sound stuff from Takashi Iwai:
 "Here is the first big update chunk of sound stuff for 3.4-rc1.

  In the common sound infrastructure, there are a few changes for
  dynamic PCM support (used in ASoC) and a few clean-ups.  Majority of
  changes are found, as usual, in HD-audio and ASoC.

  Some highlights of HD-audio changes:

   - All the long-standing static quirk codes for Realtek codec were
     finally removed by fixing and extending the Realtek auto-parser.

   - The mute-LED control is standardized over all HD-audio codec
     drivers using the extended vmaster hook.

   - The vmaster slave mixer elements are initialized to 0dB as default
     so that the user won't be annoyed by the silent output after
     updates, e.g.  due to the additions of new elements.

   - Other many fix-ups for the misc HD-audio devices.

  In the ASoC side, this is a very active release, including a quite a
  few framework enhancements.  Some highlights:

   - Support for widgets not associated with a CODEC, an important part
     of the dynamic PCM framework.

   - A library factoring out the common code shared by dmaengine based
     DMA drivers contributed by Lars-Peter Clausen.  This will save a
     lot of code and make it much easier to deploy enhancements to
     dmaengine.

   - Support for binary controls, used for providing runtime
     configuration of algorithm coefficients.

   - A new DAPM widget type for regulator supplies allowing drivers for
     devices that can power down unused supplies while active to do
     without any per-driver code.

   - DAPM widgets for DAIs, initially giving a speed boost for playback
     startup and shutdown and also the basis for CODEC<->CODEC DAI link
     support.

   - Support for specifying the number of significant bits on audio
     interfaces, useful for allowing applications to know how much
     effort to put into generating data for a larger sample format.

   - Conversion of the FSI driver used on some SH processors to
     DMAEngine.

   - Conversion of EP93xx drivers to DMAEngine.

   - New CODEC drivers for Maxim MAX9768 and Wolfson Microelectronics
     WM2200.

   - Move audmux driver from arc/arm to sound/soc

   - McBSP move from arch/ to sound/ and updates

  Also, a few small updates and fixes for other drivers like au88x0,
  ymfpci, USB 6fire, USB usx2yaudio are included."

* tag 'sound-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (446 commits)
  ASoC: wm8994: Provide VMID mode control and fix default sequence
  ASoC: wm8994: Add missing break in resume
  ASoC: wm_hubs: Don't actively manage LINEOUT_VMID_BUF
  ASoC: pxa-ssp: atomically set stream active masks
  ASoC: fsl: p1022ds: tell the WM8776 codec driver that it's the master
  ASoC: Samsung: Added to support mono recording
  ALSA: hda - Fix build with CONFIG_PM=n
  ALSA: au88x0 - Avoid possible Oops at unbinding
  ALSA: usb-audio - Fix build error by consitification of rate list
  ASoC: core: Fix obscure leak of runtime array
  ALSA: pcm - Avoid GFP_ATOMIC in snd_pcm_link()
  ALSA: pcm: Constify the list in snd_pcm_hw_constraint_list
  ASoC: wm8996: Add 44.1kHz support
  ALSA: hda - Fix build of patch_sigmatel.c without CONFIG_SND_HDA_POWER_SAVE
  ASoC: mx27vis-aic32x4: Convert it to platform driver
  ALSA: hda - fix printing of high HDMI sample rates
  ALSA: ymfpci - Fix legacy registers on S3/S4 resume
  ALSA: control - Fixe a trailing white space error
  ALSA: hda - Add expose_enum_ctl flag to snd_hda_add_vmaster_hook()
  ALSA: hda - Add "Mute-LED Mode" enum control
  ...
2012-03-22 13:00:13 -07:00
Linus Torvalds
424a6f6ef9 SCSI updates on 20120319
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZxSnAAoJEDeqqVYsXL0M0Y4IAMX0vrTVZbg6psA5/gMcWGRP
 CkFXEQ8n0PL2SCaj6BoDqamJFe5Nc7dnqxM0fGawB4S9vr3rHhiOlwO+NbV9zFYC
 2skBTpeL3sjgtN/jTBdfeeAa7xTYpu/XGyei0NS1A5c2AyMVXV0uYV2s4VNZxe44
 tVIn1OEzM2giZ9EB1OZslDMrg5XXm3MBIUECP0LbWUhBm/35caSFKzMXRwhh7WiK
 +AVmc2AZYtdEwuknDyiH7KlsaoB3vGL9pPrAUJzIgEhy2pOo2A7W72HfA4Fj+y6a
 uF9HBS5zciMp1+sGWry62AjNbWgin9BRlozBEO/lJhIfMGDV1nXEIJsOkOgkdoE=
 =1TxB
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

SCSI updates from James Bottomley:
 "The update includes the usual assortment of driver updates (lpfc,
  qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge
  amount of infrastructure work in the SAS library and transport class
  as well as an iSCSI update.  There's also a new SCSI based virtio
  driver."

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits)
  [SCSI] qla4xxx: Update driver version to 5.02.00-k15
  [SCSI] qla4xxx: trivial cleanup
  [SCSI] qla4xxx: Fix sparse warning
  [SCSI] qla4xxx: Add support for multiple session per host.
  [SCSI] qla4xxx: Export CHAP index as sysfs attribute
  [SCSI] scsi_transport: Export CHAP index as sysfs attribute
  [SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
  [SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
  [SCSI] pm8001: fix endian issue with code optimization.
  [SCSI] pm8001: Fix possible racing condition.
  [SCSI] pm8001: Fix bogus interrupt state flag issue.
  [SCSI] ipr: update PCI ID definitions for new adapters
  [SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
  [SCSI] isci: improvements in driver unloading routine
  [SCSI] isci: improve phy event warnings
  [SCSI] isci: debug, provide state-enum-to-string conversions
  [SCSI] scsi_transport_sas: 'enable' phys on reset
  [SCSI] libsas: don't recover end devices attached to disabled phys
  [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
  [SCSI] libsas: set attached device type and target protocols for local phys
  ...
2012-03-22 12:55:29 -07:00
Linus Torvalds
1ab142d499 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "This contains the usual set of updates and bugfixes to target-core +
  existing fabric module code, along with a handful of the patches
  destined for v3.3 stable.

  It also contains the necessary target-core infrastructure pieces
  required to run using tcm_qla2xxx.ko WWPNs with the new Qlogic Fibre
  Channel fabric module currently queued in target-pending/for-next-merge,
  and coming for round 2.

  The highlights for this series include:

   - Add target_submit_tmr() helper function for fabric task management
     (andy)
   - Convert tcm_fc to use target_submit_tmr() (andy)
   - Replace target core various cmd flags with a transport state (hch)
   - Convert loopback to use workqueue submission (hch)
   - Convert target core to use array_zalloc for tpg_lun_list (joern)
   - Convert target core to use array_zalloc for device_list (joern)
   - Add target core support for TMR_ABORT_TASK (nab)
   - Add target core se_sess->sess_kref + get/put helpers (nab)
   - Add target core se_node_acl->acl_kref for ->acl_free_comp usage
     (nab)
   - Convert iscsi-target to use target_put_session + sess_kref (nab)
   - Fix tcm_fc fc_exch memory leak in ft_send_resp_status (nab)
   - Fix ib_srpt srpt_handle_cmd send_ioctx->ioctx_kref leak on
     exception (nab)
   - Fix target core up handling of short INQUIRY buffers (roland)
   - Untangle target-core front-end and back-end meanings of max_sectors
     attribute (roland)
   - Set loopback residual field for SCSI commands (roland)
   - Fix target-core 16-bit target ports for SET TARGET PORT GROUPS
     emulation (roland)

  Thanks again to Andy, Christoph, Joern, Roland, and everyone who has
  contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (64 commits)
  ib_srpt: Fix srpt_handle_cmd send_ioctx->ioctx_kref leak on exception
  loopback: Fix transport_generic_allocate_tasks error handling
  iscsi-target: remove improper externs
  iscsi-target: Remove unused variables in iscsi_target_parameters.c
  target: remove obvious warnings
  target: Use array_zalloc for device_list
  target: Use array_zalloc for tpg_lun_list
  target: Fix sense code for unsupported SERVICE ACTION IN
  target: Remove hack to make READ CAPACITY(10) lie if thin provisioning is enabled
  target: Bump core version to v4.1.0-rc2-ml + fabric versions
  tcm_fc: Fix fc_exch memory leak in ft_send_resp_status
  target: Drop unused legacy target_core_fabric_ops API callers
  iscsi-target: Convert to use target_put_session + sess_kref
  target: Convert se_node_acl->acl_group removal to use ->acl_kref
  target: Add se_node_acl->acl_kref for ->acl_free_comp usage
  target: Add se_node_acl->acl_free_comp for NodeACL release path
  target: Add se_sess->sess_kref + get/put helpers
  target: Convert session_lock to irqsave
  target: Fix typo in drivers/target
  iscsi-target: Fix dynamic -> explict NodeACL pointer reference
  ...
2012-03-22 12:38:04 -07:00
Linus Torvalds
267d7b23dd md updates for 3.4
Mostly tidying up code in preparation for some bigger changes
 next time.
 A few bug fixes tagged for -stable.
 
 Main functionality change is that some RAID10 arrays can now
 grow to use extra space that may have been made available on the
 individual devices.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIVAwUAT2bLBjnsnt1WYoG5AQKN3xAAv1UlR5Kem5WN7Ex4lmR9xj3lr9dbURYT
 TtvrUuCy3pYYWdTuijb+IBqkbODF0kPDHIhUiBx9fXUfMavkp/b9heXS/vJ3pcH4
 1j99NUbOGL/AylD1TPRV9TQxGTKhEjK3n26bY0t/amLc92bWJaytMO1B9cz38LN+
 qx6ufpIepz4DPXXtPYpnkBR4cZ6L4/ZXQvjf5BqG6WfKwc+0Nyncg8ipYEqhBWy7
 R7ztF5yPo0yl96Wopa2KG91OroWflmyZo1DNYcbUbKtbNGGtYC92GFadOH+wNupM
 FnmXv10ivfVGU5w4SpshAwOg+4OSUqmWNsBxUhpYbf8ChbN+lOl0VZdH6UBxo19D
 3SqZWT/yz4I4HYd5rtr35MXFdOeBNM++CHQs4F68BLA0B6OcHfWsA9bvly2tnBVx
 iEBFPd277qWztUr8m6yz7AFf/0dgyXuIhuB3d7IkVrG5yG3FX6hPi2T0FSA33qMx
 Lwi5w6O4DREg5tG09xEYEnXgXe+PnB8HsKb1U/m76XMQ0UScvX6dLA6934Vg+DCv
 xf+AYqob0Tc/Op5I7h2PbVXq7DciNXwlX1WvM0m+TEaV+3fl1FB0VsCcANAV6JVn
 uRLmvtePQRt0hxAog2p7OsumVnxMhbuEo5h8rJMKWM7IbhueKNoz+gBwpcFLzBmY
 ygWc4peLQpE=
 =MGuM
 -----END PGP SIGNATURE-----

Merge tag 'md-3.4' of git://neil.brown.name/md

Pull md updates for 3.4 from Neil Brown:
 "Mostly tidying up code in preparation for some bigger changes next
  time.

  A few bug fixes tagged for -stable.

  Main functionality change is that some RAID10 arrays can now grow to
  use extra space that may have been made available on the individual
  devices."

Fixed up trivial conflicts with the k[un]map_atomic() cleanups in
drivers/md/bitmap.c.

* tag 'md-3.4' of git://neil.brown.name/md: (22 commits)
  md: Add judgement bb->unacked_exist in function md_ack_all_badblocks().
  md: fix clearing of the 'changed' flags for the bad blocks list.
  md/bitmap: discard CHUNK_BLOCK_SHIFT macro
  md/bitmap: remove unnecessary indirection when allocating.
  md/bitmap: remove some pointless locking.
  md/bitmap: change a 'goto' to a normal 'if' construct.
  md/bitmap: move printing of bitmap status to bitmap.c
  md/bitmap: remove some unused noise from bitmap.h
  md/raid10 - support resizing some RAID10 arrays.
  md/raid1: handle merge_bvec_fn in member devices.
  md/raid10: handle merge_bvec_fn in member devices.
  md: add proper merge_bvec handling to RAID0 and Linear.
  md: tidy up rdev_for_each usage.
  md/raid1,raid10: avoid deadlock during resync/recovery.
  md/bitmap: ensure to load bitmap when creating via sysfs.
  md: don't set md arrays to readonly on shutdown.
  md: allow re-add to failed arrays.
  md/raid5: use atomic_dec_return() instead of atomic_dec() and atomic_read().
  md: Use existed macros instead of numbers
  md/raid5: removed unused 'added_devices' variable.
  ...
2012-03-22 12:29:50 -07:00
Linus Torvalds
2390481546 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar.

Removes the Moorestown platform that nobody ever used.

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform: Move APIC ID validity check into platform APIC code
  x86/olpc/xo15/sci: Enable lid close wakeup control
  x86/geode/net5501: Add platform driver for Soekris Engineering net5501
  x86/geode/alix2: Supplement driver to include GPIO button support
  x86/mid/powerbtn: Use MSIC read/write instead of ipc_scu
  x86/mid/thermal: Turn off thermistor
  x86/mid/thermal: Add msic_thermal alias
  x86/mid/thermal: Convert to use Intel MSIC API
  x86/mid/scu_ipc: Remove Moorestown support
  x86/mid: Kill off Moorestown
  x86/mrst: Add msic_thermal platform support
  x86/config: Select MSIC MFD driver on Intel Medfield platform
  x86/mid: Remove Intel Moorestown
  x86/mrst: Set ISA bus type for fake MP IRQs
  x86/ioapic: Use legacy_pic to set correct gsi-irq mapping
2012-03-22 09:43:22 -07:00
Linus Torvalds
754b980077 Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MCE changes from Ingo Molnar.

* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix return value of mce_chrdev_read() when erst is disabled
  x86/mce: Convert static array of pointers to per-cpu variables
  x86/mce: Replace hard coded hex constants with symbolic defines
  x86/mce: Recognise machine check bank signature for data path error
  x86/mce: Handle "action required" errors
  x86/mce: Add mechanism to safely save information in MCE handler
  x86/mce: Create helper function to save addr/misc when needed
  HWPOISON: Add code to handle "action required" errors.
  HWPOISON: Clean up memory_failure() vs. __memory_failure()
2012-03-22 09:42:04 -07:00
Linus Torvalds
f06fc0c0de Merge branch 'x86-eficross-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/eficross (booting 32/64-bit kernel from 64/32-bit EFI) from Ingo Molnar

* 'x86-eficross-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Allow basic init with mixed 32/64-bit efi/kernel
  x86, efi: Add basic error handling
  x86, efi: Cleanup config table walking
  x86, efi: Convert printk to pr_*()
  x86, efi: Refactor efi_init() a bit
2012-03-22 09:31:31 -07:00
Linus Torvalds
e17fdf5c67 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asm changes from Ingo Molnar

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Include probe_roms.h in probe_roms.c
  x86/32: Print control and debug registers for kerenel context
  x86: Tighten dependencies of CPU_SUP_*_32
  x86/numa: Improve internode cache alignment
  x86: Fix the NMI nesting comments
  x86-64: Improve insn scheduling in SAVE_ARGS_IRQ
  x86-64: Fix CFI annotations for NMI nesting code
  bitops: Add missing parentheses to new get_order macro
  bitops: Optimise get_order()
  bitops: Adjust the comment on get_order() to describe the size==0 case
  x86/spinlocks: Eliminate TICKET_MASK
  x86-64: Handle byte-wise tail copying in memcpy() without a loop
  x86-64: Fix memcpy() to support sizes of 4Gb and above
  x86-64: Fix memset() to support sizes of 4Gb and above
  x86-64: Slightly shorten copy_page()
2012-03-22 09:13:24 -07:00
Linus Torvalds
95211279c5 Merge branch 'akpm' (Andrew's patch-bomb)
Merge first batch of patches from Andrew Morton:
 "A few misc things and all the MM queue"

* emailed from Andrew Morton <akpm@linux-foundation.org>: (92 commits)
  memcg: avoid THP split in task migration
  thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
  memcg: clean up existing move charge code
  mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
  mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
  mm/memcontrol.c: s/stealed/stolen/
  memcg: fix performance of mem_cgroup_begin_update_page_stat()
  memcg: remove PCG_FILE_MAPPED
  memcg: use new logic for page stat accounting
  memcg: remove PCG_MOVE_LOCK flag from page_cgroup
  memcg: simplify move_account() check
  memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  memcg: kill dead prev_priority stubs
  memcg: remove PCG_CACHE page_cgroup flag
  memcg: let css_get_next() rely upon rcu_read_lock()
  cgroup: revert ss_id_lock to spinlock
  idr: make idr_get_next() good for rcu_read_lock()
  memcg: remove unnecessary thp check in page stat accounting
  memcg: remove redundant returns
  memcg: enum lru_list lru
  ...
2012-03-22 09:04:48 -07:00
Linus Torvalds
5375871d43 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc merge from Benjamin Herrenschmidt:
 "Here's the powerpc batch for this merge window.  It is going to be a
  bit more nasty than usual as in touching things outside of
  arch/powerpc mostly due to the big iSeriesectomy :-) We finally got
  rid of the bugger (legacy iSeries support) which was a PITA to
  maintain and that nobody really used anymore.

  Here are some of the highlights:

   - Legacy iSeries is gone.  Thanks Stephen ! There's still some bits
     and pieces remaining if you do a grep -ir series arch/powerpc but
     they are harmless and will be removed in the next few weeks
     hopefully.

   - The 'fadump' functionality (Firmware Assisted Dump) replaces the
     previous (equivalent) "pHyp assisted dump"...  it's a rewrite of a
     mechanism to get the hypervisor to do crash dumps on pSeries, the
     new implementation hopefully being much more reliable.  Thanks
     Mahesh Salgaonkar.

   - The "EEH" code (pSeries PCI error handling & recovery) got a big
     spring cleaning, motivated by the need to be able to implement a
     new backend for it on top of some new different type of firwmare.

     The work isn't complete yet, but a good chunk of the cleanups is
     there.  Note that this adds a field to struct device_node which is
     not very nice and which Grant objects to.  I will have a patch soon
     that moves that to a powerpc private data structure (hopefully
     before rc1) and we'll improve things further later on (hopefully
     getting rid of the need for that pointer completely).  Thanks Gavin
     Shan.

   - I dug into our exception & interrupt handling code to improve the
     way we do lazy interrupt handling (and make it work properly with
     "edge" triggered interrupt sources), and while at it found & fixed
     a wagon of issues in those areas, including adding support for page
     fault retry & fatal signals on page faults.

   - Your usual random batch of small fixes & updates, including a bunch
     of new embedded boards, both Freescale and APM based ones, etc..."

I fixed up some conflicts with the generalized irq-domain changes from
Grant Likely, hopefully correctly.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits)
  powerpc/ps3: Do not adjust the wrapper load address
  powerpc: Remove the rest of the legacy iSeries include files
  powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
  init: Remove CONFIG_PPC_ISERIES
  powerpc: Remove FW_FEATURE ISERIES from arch code
  tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable
  powerpc/spufs: Fix double unlocks
  powerpc/5200: convert mpc5200 to use of_platform_populate()
  powerpc/mpc5200: add options to mpc5200_defconfig
  powerpc/mpc52xx: add a4m072 board support
  powerpc/mpc5200: update mpc5200_defconfig to fit for charon board
  Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup
  powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board
  powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
  MAINTAINERS: Update PowerPC 4xx tree
  powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
  powerpc: document the FSL MPIC message register binding
  powerpc: add support for MPIC message register API
  powerpc/fsl: Added aliased MSIIR register address to MSI node in dts
  powerpc/85xx: mpc8548cds - add 36-bit dts
  ...
2012-03-21 18:55:10 -07:00
Linus Torvalds
ad12ab259d Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull gfs2 changes from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Change truncate page allocation to be GFP_NOFS
  GFS2: call gfs2_write_alloc_required for each chunk
  GFS2: Clean up log flush header writing
  GFS2: Remove a __GFP_NOFAIL allocation
  GFS2: Flush pending glock work when evicting an inode
  GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd
  GFS2: Eliminate sd_rindex_mutex
  GFS2: Unlock rindex mutex on glock error
  GFS2: Make bd_cmp() static
  GFS2: Sort the ordered write list
  GFS2: FITRIM ioctl support
  GFS2: Move two functions from log.c to lops.c
  GFS2: glock statistics gathering
2012-03-21 18:00:03 -07:00
Naoya Horiguchi
d8c37c4806 thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
These macros will be used in a later patch, where all usages are expected
to be optimized away without #ifdef CONFIG_TRANSPARENT_HUGEPAGE.  But to
detect unexpected usages, we convert the existing BUG() to BUILD_BUG().

[akpm@linux-foundation.org: fix build in mm/pgtable-generic.c]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:02 -07:00
KAMEZAWA Hiroyuki
4331f7d339 memcg: fix performance of mem_cgroup_begin_update_page_stat()
mem_cgroup_begin_update_page_stat() should be very fast because it's
called very frequently.  Now, it needs to look up page_cgroup and its
memcg....this is slow.

This patch adds a global variable to check "any memcg is moving or not".
With this, the caller doesn't need to visit page_cgroup and memcg.

Here is a test result.  A test program makes page faults onto a file,
MAP_SHARED and makes each page's page_mapcount(page) > 1, and free the
range by madvise() and page fault again.  This program causes 26214400
times of page fault onto a file(size was 1G.) and shows shows the cost of
mem_cgroup_begin_update_page_stat().

Before this patch for mem_cgroup_begin_update_page_stat()

    [kamezawa@bluextal test]$ time ./mmap 1G

    real    0m21.765s
    user    0m5.999s
    sys     0m15.434s

    27.46%     mmap  mmap               [.] reader
    21.15%     mmap  [kernel.kallsyms]  [k] page_fault
     9.17%     mmap  [kernel.kallsyms]  [k] filemap_fault
     2.96%     mmap  [kernel.kallsyms]  [k] __do_fault
     2.83%     mmap  [kernel.kallsyms]  [k] __mem_cgroup_begin_update_page_stat

After this patch

    [root@bluextal test]# time ./mmap 1G

    real    0m21.373s
    user    0m6.113s
    sys     0m15.016s

In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away.

Note: we may be able to remove this optimization in future if
      we can get pointer to memcg directly from struct page.

[akpm@linux-foundation.org: don't return a void]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Greg Thelen <gthelen@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:02 -07:00
KAMEZAWA Hiroyuki
2ff76f1193 memcg: remove PCG_FILE_MAPPED
With the new lock scheme for updating memcg's page stat, we don't need a
flag PCG_FILE_MAPPED which was duplicated information of page_mapped().

[hughd@google.com: cosmetic fix]
[hughd@google.com: add comment to MEM_CGROUP_CHARGE_TYPE_MAPPED case in __mem_cgroup_uncharge_common()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Greg Thelen <gthelen@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
KAMEZAWA Hiroyuki
89c06bd52f memcg: use new logic for page stat accounting
Now, page-stat-per-memcg is recorded into per page_cgroup flag by
duplicating page's status into the flag.  The reason is that memcg has a
feature to move a page from a group to another group and we have race
between "move" and "page stat accounting",

Under current logic, assume CPU-A and CPU-B.  CPU-A does "move" and CPU-B
does "page stat accounting".

When CPU-A goes 1st,

            CPU-A                           CPU-B
                                    update "struct page" info.
    move_lock_mem_cgroup(memcg)
    see pc->flags
    copy page stat to new group
    overwrite pc->mem_cgroup.
    move_unlock_mem_cgroup(memcg)
                                    move_lock_mem_cgroup(mem)
                                    set pc->flags
                                    update page stat accounting
                                    move_unlock_mem_cgroup(mem)

stat accounting is guarded by move_lock_mem_cgroup() and "move" logic
(CPU-A) doesn't see changes in "struct page" information.

But it's costly to have the same information both in 'struct page' and
'struct page_cgroup'.  And, there is a potential problem.

For example, assume we have PG_dirty accounting in memcg.
PG_..is a flag for struct page.
PCG_ is a flag for struct page_cgroup.
(This is just an example. The same problem can be found in any
 kind of page stat accounting.)

	  CPU-A                               CPU-B
      TestSet PG_dirty
      (delay)                        TestClear PG_dirty
                                     if (TestClear(PCG_dirty))
                                          memcg->nr_dirty--
      if (TestSet(PCG_dirty))
          memcg->nr_dirty++

Here, memcg->nr_dirty = +1, this is wrong.  This race was reported by Greg
Thelen <gthelen@google.com>.  Now, only FILE_MAPPED is supported but
fortunately, it's serialized by page table lock and this is not real bug,
_now_,

If this potential problem is caused by having duplicated information in
struct page and struct page_cgroup, we may be able to fix this by using
original 'struct page' information.  But we'll have a problem in "move
account"

Assume we use only PG_dirty.

         CPU-A                   CPU-B
    TestSet PG_dirty
    (delay)                    move_lock_mem_cgroup()
                               if (PageDirty(page))
                                      new_memcg->nr_dirty++
                               pc->mem_cgroup = new_memcg;
                               move_unlock_mem_cgroup()
    move_lock_mem_cgroup()
    memcg = pc->mem_cgroup
    new_memcg->nr_dirty++

accounting information may be double-counted.  This was original reason to
have PCG_xxx flags but it seems PCG_xxx has another problem.

I think we need a bigger lock as

     move_lock_mem_cgroup(page)
     TestSetPageDirty(page)
     update page stats (without any checks)
     move_unlock_mem_cgroup(page)

This fixes both of problems and we don't have to duplicate page flag into
page_cgroup.  Please note: move_lock_mem_cgroup() is held only when there
are possibility of "account move" under the system.  So, in most path,
status update will go without atomic locks.

This patch introduces mem_cgroup_begin_update_page_stat() and
mem_cgroup_end_update_page_stat() both should be called at modifying
'struct page' information if memcg takes care of it.  as

     mem_cgroup_begin_update_page_stat()
     modify page information
     mem_cgroup_update_page_stat()
     => never check any 'struct page' info, just update counters.
     mem_cgroup_end_update_page_stat().

This patch is slow because we need to call begin_update_page_stat()/
end_update_page_stat() regardless of accounted will be changed or not.  A
following patch adds an easy optimization and reduces the cost.

[akpm@linux-foundation.org: s/lock/locked/]
[hughd@google.com: fix deadlock by avoiding stat lock when anon]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Greg Thelen <gthelen@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
KAMEZAWA Hiroyuki
312734c04e memcg: remove PCG_MOVE_LOCK flag from page_cgroup
PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
pc->mem_cgroup and page statistics accounting per memcg.  This lock helps
to avoid the race but the race is very rare because moving tasks between
cgroup is not a usual job.  So, it seems using 1bit per page is too
costly.

This patch changes this lock as per-memcg spinlock and removes
PCG_MOVE_LOCK.

If smaller lock is required, we'll be able to add some hashes but I'd like
to start from this.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
Konstantin Khlebnikov
a710920cae memcg: kill dead prev_priority stubs
This code was removed in 25edde0332 ("vmscan: kill prev_priority
completely")

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
KAMEZAWA Hiroyuki
b24028572f memcg: remove PCG_CACHE page_cgroup flag
We record 'the page is cache' with the PCG_CACHE bit in page_cgroup.
Here, "CACHE" means anonymous user pages (and SwapCache).  This doesn't
include shmem.

Considering callers, at charge/uncharge, the caller should know what the
page is and we don't need to record it by using one bit per page.

This patch removes PCG_CACHE bit and make callers of
mem_cgroup_charge_statistics() to specify what the page is.

About page migration: Mapping of the used page is not touched during migra
tion (see page_remove_rmap) so we can rely on it and push the correct
charge type down to __mem_cgroup_uncharge_common from end_migration for
unused page.  The force flag was misleading was abused for skipping the
needless page_mapped() / PageCgroupMigration() check, as we know the
unused page is no longer mapped and cleared the migration flag just a few
lines up.  But doing the checks is no biggie and it's not worth adding
another flag just to skip them.

[akpm@linux-foundation.org: checkpatch fixes]
[hughd@google.com: fix PageAnon uncharging]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
Hugh Dickins
42aee6c495 cgroup: revert ss_id_lock to spinlock
Commit c1e2ee2dc4 ("memcg: replace ss->id_lock with a rwlock") has now
been seen to cause the unfair behavior we should have expected from
converting a spinlock to an rwlock: softlockup in cgroup_mkdir(), whose
get_new_cssid() is waiting for the wlock, while there are 19 tasks using
the rlock in css_get_next() to get on with their memcg workload (in an
artificial test, admittedly).  Yet lib/idr.c was made suitable for RCU
way back: revert that commit, restoring ss->id_lock to a spinlock.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
Hugh Dickins
31a79235fc memcg: replace MEM_CONT by MEM_RES_CTLR
Correct an #endif comment in memcontrol.h from MEM_CONT to MEM_RES_CTLR.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:00 -07:00
Kautuk Consul
8d13bddd11 page_alloc.c: remove add_from_early_node_map()
add_from_early_node_map() is unused.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:00 -07:00
Steven Truelove
40716e2924 hugetlbfs: fix alignment of huge page requests
When calling shmget() with SHM_HUGETLB, shmget aligns the request size to
PAGE_SIZE, but this is not sufficient.

Modify hugetlb_file_setup() to align requests to the huge page size, and
to accept an address argument so that all alignment checks can be
performed in hugetlb_file_setup(), rather than in its callers.  Change
newseg() and mmap_pgoff() to match the new prototype and eliminate a now
redundant alignment check.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Steven Truelove <steven.truelove@utoronto.ca>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
David Rientjes
05af2e104a mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat()
sync_mm_rss() can only be used for current to avoid race conditions in
iterating and clearing its per-task counters.  Remove the task argument
for it and its helper function, __sync_task_rss_stat(), to avoid thinking
it can be used safely for anything other than current.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
David Gibson
90481622d7 hugepages: fix use after free bug in "quota" handling
hugetlbfs_{get,put}_quota() are badly named.  They don't interact with the
general quota handling code, and they don't much resemble its behaviour.
Rather than being about maintaining limits on on-disk block usage by
particular users, they are instead about maintaining limits on in-memory
page usage (including anonymous MAP_PRIVATE copied-on-write pages)
associated with a particular hugetlbfs filesystem instance.

Worse, they work by having callbacks to the hugetlbfs filesystem code from
the low-level page handling code, in particular from free_huge_page().
This is a layering violation of itself, but more importantly, if the
kernel does a get_user_pages() on hugepages (which can happen from KVM
amongst others), then the free_huge_page() can be delayed until after the
associated inode has already been freed.  If an unmount occurs at the
wrong time, even the hugetlbfs superblock where the "quota" limits are
stored may have been freed.

Andrew Barry proposed a patch to fix this by having hugepages, instead of
storing a pointer to their address_space and reaching the superblock from
there, had the hugepages store pointers directly to the superblock,
bumping the reference count as appropriate to avoid it being freed.
Andrew Morton rejected that version, however, on the grounds that it made
the existing layering violation worse.

This is a reworked version of Andrew's patch, which removes the extra, and
some of the existing, layering violation.  It works by introducing the
concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
finite logical pool of hugepages to allocate from.  hugetlbfs now creates
a subpool for each filesystem instance with a page limit set, and a
pointer to the subpool gets added to each allocated hugepage, instead of
the address_space pointer used now.  The subpool has its own lifetime and
is only freed once all pages in it _and_ all other references to it (i.e.
superblocks) are gone.

subpools are optional - a NULL subpool pointer is taken by the code to
mean that no subpool limits are in effect.

Previous discussion of this bug found in:  "Fix refcounting in hugetlbfs
quota handling.". See:  https://lkml.org/lkml/2011/8/11/28 or
http://marc.info/?l=linux-mm&m=126928970510627&w=1

v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
alloc_huge_page() - since it already takes the vma, it is not necessary.

Signed-off-by: Andrew Barry <abarry@cray.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
David Gibson
a1d776ee31 hugetlb: cleanup hugetlb.h
Make a couple of small cleanups to linux/include/hugetlb.h.  The
set_file_hugepages() function, which was not used anywhere is removed,
and the hugetlbfs_config and hugetlbfs_inode_info structures with its
HUGETLBFS_I helper function are moved into inode.c, the only place they
were used.

These structures are really linked to the hugetlbfs filesystem
specifically not to hugepage mm handling in general, so they belong in
the filesystem code not in a generally available header.

It would be nice to move the hugetlbfs_sb_info (superblock) structure in
there as well, but it's currently needed in a number of places via the
hstate_vma() and hstate_inode().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andrew Barry <abarry@cray.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
Mel Gorman
cc9a6c8776 cpuset: mm: reduce large amounts of memory barrier related damage v3
Commit c0ff7453bb ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths.  This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32.  The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side.  This is much cheaper on some architectures, including x86.  The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk.  If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
actual results were

                             3.3.0-rc3          3.3.0-rc3
                             rc3-vanilla        nobarrier-v2r1
    Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
    Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
    Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
    Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
    Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
    Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
    Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
    Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
    Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
    Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
    Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
    Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
    Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
    Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
    Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds)             135.68    132.17
    User+Sys Time Running Test (seconds)         164.2    160.13
    Total Elapsed Time (seconds)                123.46    120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected).  The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals.  The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data.  In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
David Rientjes
e845e19936 mm, memcg: pass charge order to oom killer
The oom killer typically displays the allocation order at the time of oom
as a part of its diangostic messages (for global, cpuset, and mempolicy
ooms).

The memory controller may also pass the charge order to the oom killer so
it can emit the same information.  This is useful in determining how large
the memory allocation is that triggered the oom killer.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
Konstantin Khlebnikov
f0cb3c76ae mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug
This cpu hotplug hook was accidentally removed in commit 00a62ce91e
("mm: fix Committed_AS underflow on large NR_CPUS environment")

The visible effect of this accident: some pages are borrowed in per-cpu
page-vectors.  Truncate can deal with it, but these pages cannot be
reused while this cpu is offline.  So this is like a temporary memory
leak.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:58 -07:00
Dean Nelson
385de35722 thp: allow a hwpoisoned head page to be put back to LRU
Andrea Arcangeli pointed out to me that a check in __memory_failure()
which was intended to prevent THP tail pages from being checked for the
absence of the PG_lru flag (something that is always the case), was also
preventing THP head pages from being checked.

A THP head page could actually benefit from the call to shake_page() by
ending up being put back to a LRU, provided it had been waiting in a
pagevec array.

Andrea suggested that the "!PageTransCompound(p)" in the if-statement
should be replaced by a "!PageTransTail(p)", thus allowing THP head pages
to be checked and possibly shaken.

Signed-off-by: Dean Nelson <dnelson@redhat.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:58 -07:00
David Rientjes
08ab9b10d4 mm, oom: force oom kill on sysrq+f
The oom killer chooses not to kill a thread if:

 - an eligible thread has already been oom killed and has yet to exit,
   and

 - an eligible thread is exiting but has yet to free all its memory and
   is not the thread attempting to currently allocate memory.

SysRq+F manually invokes the global oom killer to kill a memory-hogging
task.  This is normally done as a last resort to free memory when no
progress is being made or to test the oom killer itself.

For both uses, we always want to kill a thread and never defer.  This
patch causes SysRq+F to always kill an eligible thread and can be used to
force a kill even if another oom killed thread has failed to exit.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:58 -07:00
Siddhesh Poyarekar
b76437579d procfs: mark thread stack correctly in proc/<pid>/maps
Stack for a new thread is mapped by userspace code and passed via
sys_clone.  This memory is currently seen as anonymous in
/proc/<pid>/maps, which makes it difficult to ascertain which mappings
are being used for thread stacks.  This patch uses the individual task
stack pointers to determine which vmas are actually thread stacks.

For a multithreaded program like the following:

	#include <pthread.h>

	void *thread_main(void *foo)
	{
		while(1);
	}

	int main()
	{
		pthread_t t;
		pthread_create(&t, NULL, thread_main, NULL);
		pthread_join(t, NULL);
	}

proc/PID/maps looks like the following:

    00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since
the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but
that is not always a reliable way to find out which vma is a thread
stack.  Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same
content.

With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
as the task would see it' and hence, only the vma that that task uses as
stack is marked as [stack].  All other 'stack' vmas are marked as
anonymous memory.  /proc/PID/maps acts as a thread group level view,
where all thread stack vmas are marked as [stack:TID] where TID is the
process ID of the task that uses that vma as stack, while the process
stack is marked as [stack].

So /proc/PID/maps will look like this:

    00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Thus marking all vmas that are used as stacks by the threads in the
thread group along with the process stack.  The task level maps will
however like this:

    00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

where only the vma that is being used as a stack by *that* task is
marked as [stack].

Analogous changes have been made to /proc/PID/smaps,
/proc/PID/numa_maps, /proc/PID/task/TID/smaps and
/proc/PID/task/TID/numa_maps. Relevant snippets from smaps and
numa_maps:

    [siddhesh@localhost ~ ]$ pgrep a.out
    1441
    [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack"
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack"
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack"
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack"
    7f8a44492000 default stack:1442 anon=2 dirty=2 N0=2
    7fff6273a000 default stack anon=3 dirty=3 N0=3
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack"
    7f8a44492000 default stack anon=2 dirty=2 N0=2
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack"
    7fff6273a000 default stack anon=3 dirty=3 N0=3

[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:58 -07:00
Xiao Guangrong
978ea78b65 rmap: remove __anon_vma_link() declaration
This declaration is not used anymore, remove it.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:57 -07:00
Konstantin Khlebnikov
ce1744f4ed mm: replace PAGE_MIGRATION with IS_ENABLED(CONFIG_MIGRATION)
Since commit 2a11c8ea20 ("kconfig: Introduce IS_ENABLED(),
IS_BUILTIN() and IS_MODULE()") there is a generic grep-friendly method
for checking config options in C expressions.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:57 -07:00
Naoya Horiguchi
e873c49fbf pagemap: export KPF_THP
This flag shows that a given page is a subpage of a transparent hugepage.
It helps us debug and test the kernel by showing physical address of thp.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:57 -07:00
Naoya Horiguchi
025c5b2451 thp: optimize away unnecessary page table locking
Currently when we check if we can handle thp as it is or we need to split
it into regular sized pages, we hold page table lock prior to check
whether a given pmd is mapping thp or not.  Because of this, when it's not
"huge pmd" we suffer from unnecessary lock/unlock overhead.  To remove it,
this patch introduces a optimized check function and replace several
similar logics with it.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:57 -07:00
Rik van Riel
aff622495c vmscan: only defer compaction for failed order and higher
Currently a failed order-9 (transparent hugepage) compaction can lead to
memory compaction being temporarily disabled for a memory zone.  Even if
we only need compaction for an order 2 allocation, eg.  for jumbo frames
networking.

The fix is relatively straightforward: keep track of the highest order at
which compaction is succeeding, and only defer compaction for orders at
which compaction is failing.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:56 -07:00
Rik van Riel
7be62de99a vmscan: kswapd carefully call compaction
With CONFIG_COMPACTION enabled, kswapd does not try to free contiguous
free pages, even when it is woken for a higher order request.

This could be bad for eg.  jumbo frame network allocations, which are done
from interrupt context and cannot compact memory themselves.  Higher than
before allocation failure rates in the network receive path have been
observed in kernels with compaction enabled.

Teach kswapd to defragment the memory zones in a node, but only if
required and compaction is not deferred in a zone.

[akpm@linux-foundation.org: reduce scope of zones_need_compaction]
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:56 -07:00
Rik van Riel
67f96aa252 mm: make swapin readahead skip over holes
Ever since abandoning the virtual scan of processes, for scalability
reasons, swap space has been a little more fragmented than before.  This
can lead to the situation where a large memory user is killed, swap space
ends up full of "holes" and swapin readahead is totally ineffective.

On my home system, after killing a leaky firefox it took over an hour to
page just under 2GB of memory back in, slowing the virtual machines down
to a crawl.

This patch makes swapin readahead simply skip over holes, instead of
stopping at them.  This allows the system to swap things back in at rates
of several MB/second, instead of a few hundred kB/second.

The checks done in valid_swaphandles are already done in
read_swap_cache_async as well, allowing us to remove a fair amount of
code.

[akpm@linux-foundation.org: fix it for page_cluster >= 32]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Adrian Drzewiecki <z@drze.net>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:56 -07:00
Konstantin Khlebnikov
69c978232a mm: make get_mm_counter static-inline
Make get_mm_counter() always static inline, it is simple enough for that.
And remove unused set_mm_counter()

bloat-o-meter:

add/remove: 0/1 grow/shrink: 4/12 up/down: 99/-341 (-242)
function                                     old     new   delta
try_to_unmap_one                             886     952     +66
sys_remap_file_pages                        1214    1230     +16
dup_mm                                      1684    1700     +16
do_exit                                     2277    2278      +1
zap_page_range                               208     205      -3
unmap_region                                 304     296      -8
static.oom_kill_process                      554     546      -8
try_to_unmap_file                           1716    1700     -16
getrusage                                    925     909     -16
flush_old_exec                              1704    1688     -16
static.dump_header                           416     390     -26
acct_update_integrals                        218     187     -31
do_task_stat                                2986    2954     -32
get_mm_counter                                34       -     -34
xacct_add_tsk                                371     334     -37
task_statm                                   172     118     -54
task_mem                                     383     323     -60

try_to_unmap_one() grows because update_hiwater_rss() now completely inline.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:55 -07:00
Andrea Arcangeli
1a5a9906d4 mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode
In some cases it may happen that pmd_none_or_clear_bad() is called with
the mmap_sem hold in read mode.  In those cases the huge page faults can
allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
false positive from pmd_bad() that will not like to see a pmd
materializing as trans huge.

It's not khugepaged causing the problem, khugepaged holds the mmap_sem
in write mode (and all those sites must hold the mmap_sem in read mode
to prevent pagetables to go away from under them, during code review it
seems vm86 mode on 32bit kernels requires that too unless it's
restricted to 1 thread per process or UP builds).  The race is only with
the huge pagefaults that can convert a pmd_none() into a
pmd_trans_huge().

Effectively all these pmd_none_or_clear_bad() sites running with
mmap_sem in read mode are somewhat speculative with the page faults, and
the result is always undefined when they run simultaneously.  This is
probably why it wasn't common to run into this.  For example if the
madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
fault, the hugepage will not be zapped, if the page fault runs first it
will be zapped.

Altering pmd_bad() not to error out if it finds hugepmds won't be enough
to fix this, because zap_pmd_range would then proceed to call
zap_pte_range (which would be incorrect if the pmd become a
pmd_trans_huge()).

The simplest way to fix this is to read the pmd in the local stack
(regardless of what we read, no need of actual CPU barriers, only
compiler barrier needed), and be sure it is not changing under the code
that computes its value.  Even if the real pmd is changing under the
value we hold on the stack, we don't care.  If we actually end up in
zap_pte_range it means the pmd was not none already and it was not huge,
and it can't become huge from under us (khugepaged locking explained
above).

All we need is to enforce that there is no way anymore that in a code
path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
can run into a hugepmd.  The overhead of a barrier() is just a compiler
tweak and should not be measurable (I only added it for THP builds).  I
don't exclude different compiler versions may have prevented the race
too by caching the value of *pmd on the stack (that hasn't been
verified, but it wouldn't be impossible considering
pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
and there's no external function called in between pmd_trans_huge and
pmd_none_or_clear_bad).

		if (pmd_trans_huge(*pmd)) {
			if (next-addr != HPAGE_PMD_SIZE) {
				VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
				split_huge_page_pmd(vma->vm_mm, pmd);
			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
				continue;
			/* fall through */
		}
		if (pmd_none_or_clear_bad(pmd))

Because this race condition could be exercised without special
privileges this was reported in CVE-2012-1179.

The race was identified and fully explained by Ulrich who debugged it.
I'm quoting his accurate explanation below, for reference.

====== start quote =======
      mapcount 0 page_mapcount 1
      kernel BUG at mm/huge_memory.c:1384!

    At some point prior to the panic, a "bad pmd ..." message similar to the
    following is logged on the console:

      mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).

    The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
    the page's PMD table entry.

        143 void pmd_clear_bad(pmd_t *pmd)
        144 {
    ->  145         pmd_ERROR(*pmd);
        146         pmd_clear(pmd);
        147 }

    After the PMD table entry has been cleared, there is an inconsistency
    between the actual number of PMD table entries that are mapping the page
    and the page's map count (_mapcount field in struct page). When the page
    is subsequently reclaimed, __split_huge_page() detects this inconsistency.

       1381         if (mapcount != page_mapcount(page))
       1382                 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
       1383                        mapcount, page_mapcount(page));
    -> 1384         BUG_ON(mapcount != page_mapcount(page));

    The root cause of the problem is a race of two threads in a multithreaded
    process. Thread B incurs a page fault on a virtual address that has never
    been accessed (PMD entry is zero) while Thread A is executing an madvise()
    system call on a virtual address within the same 2 MB (huge page) range.

               virtual address space
              .---------------------.
              |                     |
              |                     |
            .-|---------------------|
            | |                     |
            | |                     |<-- B(fault)
            | |                     |
      2 MB  | |/////////////////////|-.
      huge <  |/////////////////////|  > A(range)
      page  | |/////////////////////|-'
            | |                     |
            | |                     |
            '-|---------------------|
              |                     |
              |                     |
              '---------------------'

    - Thread A is executing an madvise(..., MADV_DONTNEED) system call
      on the virtual address range "A(range)" shown in the picture.

    sys_madvise
      // Acquire the semaphore in shared mode.
      down_read(&current->mm->mmap_sem)
      ...
      madvise_vma
        switch (behavior)
        case MADV_DONTNEED:
             madvise_dontneed
               zap_page_range
                 unmap_vmas
                   unmap_page_range
                     zap_pud_range
                       zap_pmd_range
                         //
                         // Assume that this huge page has never been accessed.
                         // I.e. content of the PMD entry is zero (not mapped).
                         //
                         if (pmd_trans_huge(*pmd)) {
                             // We don't get here due to the above assumption.
                         }
                         //
                         // Assume that Thread B incurred a page fault and
             .---------> // sneaks in here as shown below.
             |           //
             |           if (pmd_none_or_clear_bad(pmd))
             |               {
             |                 if (unlikely(pmd_bad(*pmd)))
             |                     pmd_clear_bad
             |                     {
             |                       pmd_ERROR
             |                         // Log "bad pmd ..." message here.
             |                       pmd_clear
             |                         // Clear the page's PMD entry.
             |                         // Thread B incremented the map count
             |                         // in page_add_new_anon_rmap(), but
             |                         // now the page is no longer mapped
             |                         // by a PMD entry (-> inconsistency).
             |                     }
             |               }
             |
             v
    - Thread B is handling a page fault on virtual address "B(fault)" shown
      in the picture.

    ...
    do_page_fault
      __do_page_fault
        // Acquire the semaphore in shared mode.
        down_read_trylock(&mm->mmap_sem)
        ...
        handle_mm_fault
          if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
              // We get here due to the above assumption (PMD entry is zero).
              do_huge_pmd_anonymous_page
                alloc_hugepage_vma
                  // Allocate a new transparent huge page here.
                ...
                __do_huge_pmd_anonymous_page
                  ...
                  spin_lock(&mm->page_table_lock)
                  ...
                  page_add_new_anon_rmap
                    // Here we increment the page's map count (starts at -1).
                    atomic_set(&page->_mapcount, 0)
                  set_pmd_at
                    // Here we set the page's PMD entry which will be cleared
                    // when Thread A calls pmd_clear_bad().
                  ...
                  spin_unlock(&mm->page_table_lock)

    The mmap_sem does not prevent the race because both threads are acquiring
    it in shared mode (down_read).  Thread B holds the page_table_lock while
    the page's map count and PMD table entry are updated.  However, Thread A
    does not synchronize on that lock.

====== end quote =======

[akpm@linux-foundation.org: checkpatch fixes]
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>		[2.6.38+]
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:54 -07:00
Linus Torvalds
e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00