remove lots of trailing spaces

whitespace only changes
author: Wladimir J. van der Laan <laanwj@gmail.com> 2013-08-23 18:03:14 +0200
committer: Wladimir J. van der Laan <laanwj@gmail.com> 2013-08-23 18:03:14 +0200
commit: c51729b9c122e6169103be1a0f0a133ba2bcbef6 (patch)
tree: f30c3c7002e35b8a6121e5ba00020a001a215162
parent: d9dcbafc88dd396d1e7e3b84c9ed37b4afdbc1aa (diff)
42 files changed, 435 insertions, 435 deletions
diff --git a/README.md b/README.md
index 196209d..a129955 100644
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@ Introduction
 
 Project Etnaviv is an open source user-space driver for the Vivante GCxxx series of embedded GPUs.
 
-A Mesa fork with the etnaviv driver can be found in the [laanwj/mesa](https://github.com/laanwj/mesa) repository. 
+A Mesa fork with the etnaviv driver can be found in the [laanwj/mesa](https://github.com/laanwj/mesa) repository.
 At the moment, this driver provides OpenGL ES 1.0 and 2.0 accelerated rendering direct to framebuffer (fbdev).
 This driver has been used to run glquake and d2x and other GLES games so it should be fairly stable.
 There may still be quite a few rendering bugs, specific bug reports are very welcome.
@@ -26,7 +26,7 @@ MIPS-based:
 
 See also [wikipedia](https://en.wikipedia.org/wiki/Vivante_Corporation).
 
-For the Vivante GPUs on many platforms feature bits have been determined, these can be found in 
+For the Vivante GPUs on many platforms feature bits have been determined, these can be found in
 [doc/gpus_comparison.html](http://dev.visucore.com/etna_viv/gpus_comparison.html).
 
 Compatibility
@@ -45,8 +45,8 @@ The 2D demos do work for GC2000 and should work on all other known Vivante chips
 Building
 =========
 
-The build process is made more complicated by the existence of many different kernel drivers, with their subtly 
-different interface (different headers, different offsets for fields, different management of context, and so on). 
+The build process is made more complicated by the existence of many different kernel drivers, with their subtly
+different interface (different headers, different offsets for fields, different management of context, and so on).
 These values for environment variable `GCABI` are supported out of the box:
 
 - `dove`: Marvell Dove, newer drivers (0.8.0.3184)
@@ -87,8 +87,8 @@ very new drivers there is an ioctl call `gcvHAL_VERSION` that returns the major,
 Linux cross compile
 --------------------
 
-For non-Android Linux ARM cross compile, create a script like this (example for CuBox) to set up the build environment. 
-When command stream dumping the egl demos, don't forget to also copy the EGL/GLES2/KDR headers from 
+For non-Android Linux ARM cross compile, create a script like this (example for CuBox) to set up the build environment.
+When command stream dumping the egl demos, don't forget to also copy the EGL/GLES2/KDR headers from
 some place and put them in a directory `include` under the location
 where the script is installed, and get the `libEGL.so` and `libGLESv2.so` from the device into `lib`:
 
@@ -129,7 +129,7 @@ environment variables, for example like this:
     #export GCABI="v4"
     export GCABI="arnova"
 
-To build the egl samples (for command stream interception), you need to copy `libEGL_VIVANTE.so` `libGLESv2_VIVANTE.so` from 
+To build the egl samples (for command stream interception), you need to copy `libEGL_VIVANTE.so` `libGLESv2_VIVANTE.so` from
 the device `/system/lib/egl` to `native/lib/egl`. This is not needed if you just want to build the `replay`, `etna` or `fb`
 tests, which do not rely in any way on the userspace blob.
 
@@ -137,7 +137,7 @@ tests, which do not rely in any way on the userspace blob.
 Contents
 ==========
 
-The repository contains various tools and documentation related to figuring out how to 
+The repository contains various tools and documentation related to figuring out how to
 program Vivante GCxxx GPU cores.
 
 Framebuffer tests
@@ -153,8 +153,8 @@ To exercise the initial-stage driver there are a few framebuffer tests in:
 
     native/fb/
 
-These demos do double-buffered animated rendering of 1000 frames to the framebuffer using 
-the proof-of-concept `etna` rendering and command stream building API. The goal of this API is to provide a Gallium-like 
+These demos do double-buffered animated rendering of 1000 frames to the framebuffer using
+the proof-of-concept `etna` rendering and command stream building API. The goal of this API is to provide a Gallium-like
 low-level interface to the Vivante hardware while abstracting away kernel interface details.
 
 - `companion_cube`: Rotating "weighted companion cube", using array or indexed rendering. Exercised in this demo:
@@ -167,9 +167,9 @@ low-level interface to the Vivante hardware while abstracting away kernel interf
   - Double-buffered rendering to framebuffer
   - MSAA (off / 2X / 4X)
 
-- `mip_cube_state`: Rotating cube with a mipmapped texture loaded from a `dds` file provided on the command line. One 
-  of the example textures have a different color and number on each mipmap level, to explicitly show interpolation 
-  between mipmap levels as the surface 
+- `mip_cube_state`: Rotating cube with a mipmapped texture loaded from a `dds` file provided on the command line. One
+  of the example textures have a different color and number on each mipmap level, to explicitly show interpolation
+  between mipmap levels as the surface
   goes nearer or farther from the camera.
 
   - Mipmapping
@@ -199,7 +199,7 @@ libetnaviv
 ------------
 
 Low-level command buffer handling library and register definition headers.
-This library completely wraps the kernel interface, to isolate clients (such as Mesa) 
+This library completely wraps the kernel interface, to isolate clients (such as Mesa)
 from the madness of kernel-specific headers and defines.
 
 - ioctl (kernel interface) wrapping
@@ -232,18 +232,18 @@ Commands:
 - gpu-dis
 
   Disassemble the currently bound fragment and vertex shaders.
-    
+
 - gpu-trace
 
-  Trace and dump all submitted command buffers. This is similar to dumping to FDR 
-  (using one of the hook mechanisms) and then running `dump_cmdstream`, however this 
+  Trace and dump all submitted command buffers. This is similar to dumping to FDR
+  (using one of the hook mechanisms) and then running `dump_cmdstream`, however this
   works on the fly.
 
   Along with each command the physical address is printed, this should come in handy for
   searching back the physical address that the GPU is stuck on
   according to the kernel.
 
-   Usage: 
+   Usage:
       gpu-trace <on|off>      Enable/disable cmdbuffer trace
       gpu-trace stop <on|off> Enable/disable stopping on every commit
       gpu-trace output stdout Set tracing output to stdout (default)
@@ -252,7 +252,7 @@ Commands:
 These commands automatically find the gallium pipe and screen from the current Mesa
 context.
 
-State map 
+State map
 ----------
 
 Map of documentation for known render state and registers. Mapped in rules-ng-ng (envytools) format:
@@ -268,10 +268,10 @@ Other scattered bits of documentation about the hardware and ISA can be found in
 ISA documentation
 ------------------
 
-Vivante has a unified, fixed-size, predictable instruction format with explicit inputs 
-and outputs. This does simplify code generation, compared to a weird flow 
+Vivante has a unified, fixed-size, predictable instruction format with explicit inputs
+and outputs. This does simplify code generation, compared to a weird flow
 pipe system like the Mali 200/400.
-Shader (both vertex and fragment) instruction set description in rules-ng-ng format can be found here: 
+Shader (both vertex and fragment) instruction set description in rules-ng-ng format can be found here:
 
     rnndb/isa.xml
 
@@ -295,9 +295,9 @@ There is also an assembler, which accepts the same syntax that is produced by th
 Command stream format
 -----------------------
 
-Like other modern GPUs, the primary means of programming the chip is through a command stream 
+Like other modern GPUs, the primary means of programming the chip is through a command stream
 interpreted by a DMA engine. This "Front End" takes care of distributing state changes through
-the individual modules of the GPU, kicking off primitive rendering, synchronization, 
+the individual modules of the GPU, kicking off primitive rendering, synchronization,
 and also supports basic flow control (branch, call, return).
 
 Most of the relevant bits of this command stream have been deciphered.
@@ -309,17 +309,17 @@ The command stream format represented in rules-ng-ng XML format can be found her
 Command stream interception
 ----------------------------
 
-A significant part of reverse engineering was done by intercepting command streams while running GL simple demos. 
+A significant part of reverse engineering was done by intercepting command streams while running GL simple demos.
 `viv_hook` is a library to intercept and log the traffic between `libGAL` (the Vivante user space blob) and the kernel
 driver / hardware.
-    
+
 This library uses ELF hooks to intercept only system calls such as `ioctl` and `mmap` coming from the driver, not from
 other parts of the application, unlike more crude hacks using `LD_PRELOAD`.
 
-At the beginning of the program call `the_hook`, at the end of the program call `end_hook` to finalize 
+At the beginning of the program call `the_hook`, at the end of the program call `end_hook` to finalize
 and flush buffers. This should even work for native android applications that fork from the zygote.
 
-The raw binary structures interchanged with the kernel are written to disk in a `.fdr` file, along 
+The raw binary structures interchanged with the kernel are written to disk in a `.fdr` file, along
 with updates to video memory, to be parsed by the accompanying command stream dumper and other tools.
 
 Command stream dumper
@@ -342,7 +342,7 @@ Extract areas of video memory, images, and command buffers at certain points of
 Replay tests
 --------------
 
-The replay tests replay the command stream and ioctl commands of the EGL demos, to get the same output. 
+The replay tests replay the command stream and ioctl commands of the EGL demos, to get the same output.
 
 They can be found in:
 
@@ -369,7 +369,7 @@ The headers and implementation files for the Vivante GPL kernel drivers are also
 
     kernel_drivers/
 
-Four GPL kernel driver versions, `gc600_driver_dove`, `v2` and `v4` and `imx6`, are provided. They are useful in understanding the kernel 
+Four GPL kernel driver versions, `gc600_driver_dove`, `v2` and `v4` and `imx6`, are provided. They are useful in understanding the kernel
 interface, and the hardware at a basic level.
 
 As open source drivers for the kernel are available, there are currently no plans to write a DRM/DRI kernel driver for Vivante.
@@ -380,8 +380,8 @@ Envytools fork
 ---------------
 
 [Envytools](https://github.com/pathscale/envytools) is a set of tools aimed at developers of the open source
-NVIDIA driver Nouveau, however some parts such as rnndb can be applied more generally. The repository 
-contains a slightly modified subset of envytools for header generation from 
+NVIDIA driver Nouveau, however some parts such as rnndb can be applied more generally. The repository
+contains a slightly modified subset of envytools for header generation from
 the state / command stream / ISA rnndb files, so they can be used from the C code (etna), build with
 
     cd envytools
@@ -417,4 +417,4 @@ Thanks
 
 - Luc Verhaegen (libv) of Lima project (basic framework, general idea)
 - Nouveau developers (rnndb, envytools)
- 
+
diff --git a/doc/2d.md b/doc/2d.md
index e09fe76..8a35c67 100644
--- a/doc/2d.md
+++ b/doc/2d.md
@@ -39,7 +39,7 @@ Filter blits are also available as 2D commands, but I was unable to get this to
 Video rasterizer
 -----------------
 
-The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary 
+The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary
 9-tap separable filter with 5 bit subpixel precision,
 
 It supports the following top-level commands:
@@ -160,7 +160,7 @@ These are the input bit for the ROPs, per ROP type:
     bit 1 source
     bit 2 pattern
     bit "3" foreground/background (`ROP_FG` / `ROP_BG`)
-    
+
 ROP3/4 examples:
 
     10101010  0xaa   destination
diff --git a/doc/blob_extensions.md b/doc/blob_extensions.md
index 6683798..0b32e7f 100644
--- a/doc/blob_extensions.md
+++ b/doc/blob_extensions.md
@@ -8,56 +8,56 @@ VERSION 4.6.9:1478, PLATFORM Android
 
 EGL Extensions:
 
-    EGL_KHR_reusable_sync 
-    EGL_KHR_fence_sync 
-    EGL_KHR_image_base 
-    EGL_KHR_image_pixmap 
-    EGL_KHR_image 
-    EGL_KHR_gl_texture_2D_image 
-    EGL_KHR_gl_texture_cubmap_image 
-    EGL_KHR_gl_renderbuffer_image 
-    EGL_KHR_lock_surface 
-    EGL_ANDROID_image_native_buffer 
-    EGL_ANDROID_swap_rectangle 
+    EGL_KHR_reusable_sync
+    EGL_KHR_fence_sync
+    EGL_KHR_image_base
+    EGL_KHR_image_pixmap
+    EGL_KHR_image
+    EGL_KHR_gl_texture_2D_image
+    EGL_KHR_gl_texture_cubmap_image
+    EGL_KHR_gl_renderbuffer_image
+    EGL_KHR_lock_surface
+    EGL_ANDROID_image_native_buffer
+    EGL_ANDROID_swap_rectangle
     EGL_ANDROID_blob_cache
     EGL_ANDROID_recordable
 
 GLES2 Extensions:
 
-    GL_OES_compressed_ETC1_RGB8_texture 
-    GL_OES_compressed_paletted_texture 
-    GL_OES_EGL_image 
-    GL_OES_depth24 
-    GL_OES_element_index_uint 
-    GL_OES_fbo_render_mipmap 
-    GL_OES_fragment_precision_high 
-    GL_OES_rgb8_rgba8 
-    GL_OES_stencil1 
-    GL_OES_stencil4 
-    GL_OES_texture_npot 
-    GL_OES_vertex_half_float 
-    GL_OES_depth_texture 
-    GL_OES_packed_depth_stencil 
-    GL_OES_standard_derivatives 
-    GL_OES_get_program_binary 
-    GL_EXT_texture_format_BGRA8888 
-    GL_IMG_read_format 
-    GL_EXT_blend_minmax 
-    GL_EXT_read_format_bgra 
-    GL_EXT_multi_draw_arrays 
-    GL_APPLE_texture_format_BGRA8888 
-    GL_APPLE_texture_max_level 
-    GL_ARM_rgba8 
-    GL_EXT_frag_depth 
-    GL_VIV_shader_binary 
-    GL_VIV_timestamp 
-    GL_OES_mapbuffer 
-    GL_OES_EGL_image_external 
-    GL_EXT_texture_compression_dxt1 
-    GL_EXT_texture_compression_s3tc 
-    GL_IMG_texture_compression_pvrtc 
-    GL_EXT_discard_framebuffer 
-    GL_OES_vertex_type_10_10_10_2 
-    GL_EXT_texture_type_2_10_10_10_REV 
+    GL_OES_compressed_ETC1_RGB8_texture
+    GL_OES_compressed_paletted_texture
+    GL_OES_EGL_image
+    GL_OES_depth24
+    GL_OES_element_index_uint
+    GL_OES_fbo_render_mipmap
+    GL_OES_fragment_precision_high
+    GL_OES_rgb8_rgba8
+    GL_OES_stencil1
+    GL_OES_stencil4
+    GL_OES_texture_npot
+    GL_OES_vertex_half_float
+    GL_OES_depth_texture
+    GL_OES_packed_depth_stencil
+    GL_OES_standard_derivatives
+    GL_OES_get_program_binary
+    GL_EXT_texture_format_BGRA8888
+    GL_IMG_read_format
+    GL_EXT_blend_minmax
+    GL_EXT_read_format_bgra
+    GL_EXT_multi_draw_arrays
+    GL_APPLE_texture_format_BGRA8888
+    GL_APPLE_texture_max_level
+    GL_ARM_rgba8
+    GL_EXT_frag_depth
+    GL_VIV_shader_binary
+    GL_VIV_timestamp
+    GL_OES_mapbuffer
+    GL_OES_EGL_image_external
+    GL_EXT_texture_compression_dxt1
+    GL_EXT_texture_compression_s3tc
+    GL_IMG_texture_compression_pvrtc
+    GL_EXT_discard_framebuffer
+    GL_OES_vertex_type_10_10_10_2
+    GL_EXT_texture_type_2_10_10_10_REV
     GL_EXT_texture_filter_anisotropic
 
diff --git a/doc/hardware.md b/doc/hardware.md
index b36ce9a..8c55c39 100644
--- a/doc/hardware.md
+++ b/doc/hardware.md
@@ -8,8 +8,8 @@ Major optional blocks: each of these can be present or not depending on the spec
 - 3D engine
 - VG engine
 
-Some SoCs have multiple GPU cores, and have distributed the blocks mentioned above over the cores (I suppose 
-for extra parallelism and/or granularity in power switching). For example the Marvell Armada 620 has a GC2000 
+Some SoCs have multiple GPU cores, and have distributed the blocks mentioned above over the cores (I suppose
+for extra parallelism and/or granularity in power switching). For example the Marvell Armada 620 has a GC2000
 with only the 3D engine as well as a GC300 with only the 2D engine. Similarly, the Freescale i.mx6 SoC has a
 GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
 
@@ -28,7 +28,7 @@ GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
 Feature bits
 =================
 
-Which features are supported on a certain Vivante core is not only determined by the model number 
+Which features are supported on a certain Vivante core is not only determined by the model number
 (which AFAIK mainly determines the performance), but specified by a combination of factors:
 
  1) Chip features and minor feature flags
@@ -37,7 +37,7 @@ Which features are supported on a certain Vivante core is not only determined by
  4) Chip revision of the form 0x1234
 
 All of these are available in read-only registers on the hardware. On most cases it suffices to check the feature flags as
-Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (even these sometimes 
+Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (even these sometimes
 have their own feature bit).
 
 For an overview of the feature bits see the enumerations in `state.xml`.
@@ -104,7 +104,7 @@ Thread walker = Rectangle walker? (seems to have to do with OpenCL)
 [1] http://www.vivantecorp.com/Vivante_GC320_Technical_Reference_Manual_V1.0_A.pdf
 [2] http://2012ftf.ccidnet.com/pdf/0049.pdf
 
-Connections 
+Connections
 -------------
 Connections between the different modules follow the OpenGL pipeline design [3].
 
@@ -123,7 +123,7 @@ See also [1]
 - SE determines rasterization starting point for each primitive, and also culls based on trivial rejection
 - RA performs per-tile, per-subtile, per-quad and per-pixel clipping
 
-  [1] METHOD FOR DISTRIBUTED CLIPPING OUTSIDE OF VIEW VOLUME 
+  [1] METHOD FOR DISTRIBUTED CLIPPING OUTSIDE OF VIEW VOLUME
     http://www.freepatentsonline.com/y2010/0271370.html
   [2] Efficient tile-based rasterization
     http://www.google.com/patents/US8009169
@@ -133,7 +133,7 @@ See also [1]
 Command stream
 -------------------
 
-Commands and data are sent to the GPU through the FE (Front End interface). The 
+Commands and data are sent to the GPU through the FE (Front End interface). The
 command stream of the front-end interface has a specific format described in this section.
 
 Overall format
@@ -154,11 +154,11 @@ Opcodes
     00111 Wait ([15-0] count)
     01000 Link ([15-0] number of bytes, arg address)
     01001 Stall (argument seems same format as state 0380C)
-    01010 Call 
+    01010 Call
     01011 Return
     01101 Chip select
 
-Arguments are always padded to 2 32-bit words. Number of argument words depends on the opcode, and 
+Arguments are always padded to 2 32-bit words. Number of argument words depends on the opcode, and
 sometimes on the first word of the command.
 
 See `cmdstream.xml` for detailed overview of commands and arguments. The most commonly used command is
@@ -184,20 +184,20 @@ The following sequence of states is common:
     GL.SEMAPHORE_TOKEN := FROM=RA,TO=PE
     GL.STALL_TOKEN := FROM=RA,TO=PE
 
-The first state load arms the semaphore, the second one stalls the FROM module until the TO module has raised its semaphore. In 
-this example it stalls the rasterizer until the pixel engine has completed the commands up until now. 
+The first state load arms the semaphore, the second one stalls the FROM module until the TO module has raised its semaphore. In
+this example it stalls the rasterizer until the pixel engine has completed the commands up until now.
 
 The `STALL` command is used to stall the command queue until the semaphore has been received. The stall command has
-one argument that has the same format as the `_TOKEN` states above, except that the FROM module is always the FE. 
+one argument that has the same format as the `_TOKEN` states above, except that the FROM module is always the FE.
 
 Within the 3D engine, not many explicit synchronization points appear to be needed. Some exceptions:
 
-- The blob issues a semaphore and stall from RA to PE when 
+- The blob issues a semaphore and stall from RA to PE when
 
   - Changing depth configuration in PE
   - Sometimes when changing stencil config in PE
 
-- The blob issues a just a semaphore from RA to PE, and a stall before drawing a primitive when 
+- The blob issues a just a semaphore from RA to PE, and a stall before drawing a primitive when
 
   - Tile status address/configuration changes
   - Clearing depth
@@ -210,9 +210,9 @@ XXX (cwabbott) usually, isa's have some sort of texture barrier or sync operatio
 
 Resolve
 -----------
-The resolve module is a copy and fill engine. It can copy blocks of pixels from one GPU address to another, 
-optionally tiling/detiling, converting between pixel formats, or scaling down by a factor of 2. The source and 
-destination address can be the same to fill in tiles that were not touched during the rendering process 
+The resolve module is a copy and fill engine. It can copy blocks of pixels from one GPU address to another,
+optionally tiling/detiling, converting between pixel formats, or scaling down by a factor of 2. The source and
+destination address can be the same to fill in tiles that were not touched during the rendering process
 (according to the Tile Status, see below) with the background color.
 
 The RS and PE (drawing) share one set of pixel pipes. They will never be active concurrently (AFAIK).
@@ -244,18 +244,18 @@ the GPU to hang mysteriously on rendering.
 Shader ISA
 ================
 
-Vivante GPUs have a unified shader ISA, this means that vertex and pixel shaders share the same 
+Vivante GPUs have a unified shader ISA, this means that vertex and pixel shaders share the same
 instruction set. See `isa.xml` and `isa.md` for details of the instructions, this section only provides a high-level overview.
 
-- Each instruction consists of 4 32-bit words. These have a fixed format, with bitfields 
+- Each instruction consists of 4 32-bit words. These have a fixed format, with bitfields
 that have a meaning which differs only very little per opcode. Which of these fields is used (which operands) does differ per opcode.
 
 - Four-component SIMD processor (for most of the instructions)
 
-- Older GPUs have floating point operations only, the newer ones have support for integer operations in the context of OpenCL. 
+- Older GPUs have floating point operations only, the newer ones have support for integer operations in the context of OpenCL.
   The split is around GC1000, though this being Vivante there is likely some feature bit for it.
 
-- Instructions can have up to three source operands (`SRC0_*`, `SRC1_*`, `SRC2_*`), and one destination operand (`DST_`). 
+- Instructions can have up to three source operands (`SRC0_*`, `SRC1_*`, `SRC2_*`), and one destination operand (`DST_`).
    In addition to that, there is a specific operand for texture sampling (`TEX_*`).
 
 - Operands can have these properties:
@@ -269,7 +269,7 @@ that have a meaning which differs only very little per opcode. Which of these fi
 
 - Registers:
   - `N` four-component float temporary registers `tX` (actual number depends on the hardware, maximum seems to be 64 for all
-      vivante GPUs I've encountered up until now), but like with other GPUs using more registers will likely restrict 
+      vivante GPUs I've encountered up until now), but like with other GPUs using more registers will likely restrict
       the available paralellism)
   - `1` four-component address register `a0`
 
@@ -285,16 +285,16 @@ of the framebuffer using the `FBIOGET_VSCREENINFO` and `FBIOGET_FSCREENINFO` ioc
 This physical address can then directly be used as target address for a resolve operation, just like when copying
 to a normal bitmap.
 
-Even though it would save a resolve operation it is not useful to use the physical address of the frame buffer 
+Even though it would save a resolve operation it is not useful to use the physical address of the frame buffer
 directly for rendering, as it only possible to render to tiled and supertiled surfaces, and (afaik) no display controller
 supports scan out from tiled formats.
 
 In many cases there is more framebuffer memory than that which is used for the current screen, which causes larger virtual resolution
-to be returned than the physical resolution. Double-buffering is achieved by changing the y-offset within that virtual frame buffer. 
+to be returned than the physical resolution. Double-buffering is achieved by changing the y-offset within that virtual frame buffer.
 
 Operations
 ========================
-An attempt to figure out which operations can be triggered in the hardware, and what state is used to specify 
+An attempt to figure out which operations can be triggered in the hardware, and what state is used to specify
 their operation.
 
 - RS: Kick off resolve by writing a value with bit 0 set to `RS_KICKER`. State used:
@@ -328,8 +328,8 @@ Programming pecularities
   support. The blob driver uses it for some states (viewport scaling, offset, scissor, ...)
   but not others (uniforms etc).
 
-- It is quite easy to hang the GPU when making a minor programming mistake. 
-  When the GPU is stuck it is possible to submit command buffers, however nothing gets drawn and nothing 
+- It is quite easy to hang the GPU when making a minor programming mistake.
+  When the GPU is stuck it is possible to submit command buffers, however nothing gets drawn and nothing
   ever finishes.
 
   Ways I've already made it crash:
@@ -339,11 +339,11 @@ Programming pecularities
   - Sending 3D commands in the 2D pipe instead of 3D pipe (then using a signal waiting for them to complete)
   - Wrong length of shader
   - Texture sampling without properly setup texture units
-  - `SE_SCISSOR`: setting SCISSOR bottom/right to `(x<<16)|5` instead of `(x<<16)-1` causes crashes for higher resolutions 
+  - `SE_SCISSOR`: setting SCISSOR bottom/right to `(x<<16)|5` instead of `(x<<16)-1` causes crashes for higher resolutions
     such as 1920x1080 on GC600. I don't know why, maybe some buffer or cache overflow. The rockchip vivante driver always uses |5 AFAIK,
     this offset appears to be different per specific chip/revision.
 
-  This may be a (kernel) driver problem. It is possible to reset the GPU from user space with an ioctl, but 
+  This may be a (kernel) driver problem. It is possible to reset the GPU from user space with an ioctl, but
   this usually is not enough to make it un-stuck. It would probably be a better solution to introduce a kernel-based timeout
   instead of relying on userspace to be 100% correct (may exist on v4?).
 
@@ -355,10 +355,10 @@ When the mask bit belonging to a group of state bits is *set* on a state write,
 state bits will be unaffected. If the mask bit is *unset*, the state bits will be written.
 
 This allows setting state per group of bits. For example, it allows setting only
-the destination alpha function (`ALPHA_CONFIG.DST_FUNC_ALPHA`) without affecting the 
+the destination alpha function (`ALPHA_CONFIG.DST_FUNC_ALPHA`) without affecting the
 other bits in that state word.
 
-If masking functionality is not desired, simply keep all the `_MASK` bits at zero and write all 
+If masking functionality is not desired, simply keep all the `_MASK` bits at zero and write all
 bits at once. This is what I used in `etna_pipe`, as I keep track of all state myself.
 
 Texture tiling
@@ -380,7 +380,7 @@ Supertiling
 ![supertile ordering](images/supertile.png)
 
 It appears that the blob always pads render buffers pixel sizes to a multiple of 64, ie, a width of 400 becomes 448 and 800 becomes 832.
-This is because the render buffer is also tiled, albeit differently than the 4x4 tiling format of the textures. 
+This is because the render buffer is also tiled, albeit differently than the 4x4 tiling format of the textures.
 On a fine level, every tile is the same as for normal tiled surfaces:
 
      0  1  2  3
@@ -416,7 +416,7 @@ but is only nested one level, in total this results in 64x64 sized tiles.
 The GPU can render to normal tiled surfaces (such as used by textures) as well as supertiled surfaces. However,
 rendering to supertiled surfaces is likely faster due to better cache locality.
 
-Stride, as used for resolve operations, is for a row of tiles not a row of pixels; 0x1c00 for width 448 (originally 400), 
+Stride, as used for resolve operations, is for a row of tiles not a row of pixels; 0x1c00 for width 448 (originally 400),
 0x3400 for width 832 (originally 800).
 
 Multisampling
@@ -445,7 +445,7 @@ GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing ma
 - 256x256 target with 4 samples creates a 512x512 render target and depth buffer
 
         GL.MULTI_SAMPLE_CONFIG := MSAA_SAMPLES=4X,MSAA_ENABLES=0xf,UNK12=0x0,UNK16=0x0
-        RA.MULTISAMPLE_UNK00E04 := 0x0 
+        RA.MULTISAMPLE_UNK00E04 := 0x0
         RA.MULTISAMPLE_UNK00E10[2] := 0xaaa22a22
         RA.CENTROID_TABLE[8] := 0x262a2288
         RA.CENTROID_TABLE[9] := 0x886688a2
@@ -466,7 +466,7 @@ GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing ma
 
 Other differences when MSAA is enabled:
 
-- `TS.MEM_CONFIG` is different when MSAA is used (see descriptions for fields `MSAA` and `MSAA_FORMAT`). 
+- `TS.MEM_CONFIG` is different when MSAA is used (see descriptions for fields `MSAA` and `MSAA_FORMAT`).
 - The TS surface belonging to the enlarged in the same way; it is treated as if there simply is a bigger render target.
 - It also looks like the PS gets an extra input/temporary when MSAA is enabled:
 
@@ -494,25 +494,25 @@ When rendering points (PRIMITIVE_TYPE_POINTS) there are some differences:
 - There is an extra varying for `gl_pointCoord` with two components. This varying has
   its components in `GL_VARYING_COMPONENT_USE` set to `POINTCOORD_X` and `POINTCOORD_Y`.
   Its `PA_SHADER_ATTRIBUTES` is set to `0x000002f1`.
-  The VS output associated to this varying in `VS_OUTPUT` is discarded, so can be set 
+  The VS output associated to this varying in `VS_OUTPUT` is discarded, so can be set
   to any output register.
 
-- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!). If point 
+- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!). If point
   size per vertex is not set, the value in `PA.POINT_SIZE` is used.
 
-- Distinction between sprite coordinate origin `UPPER_LEFT` / `LOWER_LEFT` is implemented by adding 
+- Distinction between sprite coordinate origin `UPPER_LEFT` / `LOWER_LEFT` is implemented by adding
   a 1.0-y instruction when glPointCoord is used. XXX figure out what is the default.
 
 Vertex texture fetch
 --------------------
 
 Vertex samplers live in the same space as fragment samplers. The blob uses a fixed mapping:
-sampler 0..7 are used as fragment samplers and 8..11 are used as vertex samplers. 
+sampler 0..7 are used as fragment samplers and 8..11 are used as vertex samplers.
 
 The shaders themselves refer to the absolute shader number; so tex8 is the first texture unit used in a
 vertex shader. The normal TEX instruction can be used to sample textures from a vertex shader.
 
-Vivante hw has two texture caches that need to be flushed separately, one for fragment shaders 
+Vivante hw has two texture caches that need to be flushed separately, one for fragment shaders
 one for vertex shaders (bits `GL.FLUSH_CACHE.TEXTURE` and `GL.FLUSH_CACHE.TEXTUREVS` respectively).
 
 This solves a problem with running `cubemap_sphere` after `displacement` demo;
@@ -525,16 +525,16 @@ Even adding a PE to FE semaphore afterwards or dummy state loads does not fix th
 
 All texture filtering options are allowed for vertex texture fetch.
 
-XXX maybe figure out if the sampler units are shared between fragment and vertex shaders and thus interchangeable. This is 
+XXX maybe figure out if the sampler units are shared between fragment and vertex shaders and thus interchangeable. This is
   not important for GL/Gallium because it already lives with the assumption that vertex and fragment shaders
   are distinct.
 
 Shader size on GC2000
 ----------------------
 
-The "query chip identity" ioctl on GC2000 reports an instructionCount of 512. Looking at the low-level command 
-stream dumps the device appears to have 0x0E000 - 0x0C000 = 8192 bytes of instruction memory, with 128 bit 
-instructions this indeed maps to 512 instructions. 
+The "query chip identity" ioctl on GC2000 reports an instructionCount of 512. Looking at the low-level command
+stream dumps the device appears to have 0x0E000 - 0x0C000 = 8192 bytes of instruction memory, with 128 bit
+instructions this indeed maps to 512 instructions.
 
 XXX does the VS/PS split at instruction 256 during rendering affect OpenCL? Hopefully not...
 
diff --git a/doc/isa.md b/doc/isa.md
index faa6006..91929e3 100644
--- a/doc/isa.md
+++ b/doc/isa.md
@@ -12,15 +12,15 @@ Basic vertex shader
     uniform mat4 modelviewMatrix;
     uniform mat4 modelviewprojectionMatrix;
     uniform mat3 normalMatrix;
-    
+
     attribute vec4 in_position;
     attribute vec3 in_normal;
     attribute vec4 in_color;
-    
+
     vec4 lightSource = vec4(2.0, 2.0, 20.0, 0.0);
-    
+
     varying vec4 vVaryingColor;
-    
+
     void main()
     {
         gl_Position = modelviewprojectionMatrix * in_position;
@@ -73,16 +73,16 @@ Vertex shader with texture coordinates
     uniform mat4 modelviewMatrix;
     uniform mat4 modelviewprojectionMatrix;
     uniform mat3 normalMatrix;
-   
+
     attribute vec4 in_position;
     attribute vec3 in_normal;
     attribute vec2 in_coord;
-    
+
     vec4 lightSource = vec4(2.0, 2.0, 20.0, 0.0);
-    
+
     varying vec4 vVaryingColor;
     varying vec2 coord;
-    
+
     void main()
     {
         gl_Position = modelviewprojectionMatrix * in_position;
@@ -129,9 +129,9 @@ Empty (passthrough)
 --------------------
 
     precision mediump float;
-    
+
     varying vec4 vVaryingColor;
-    
+
     void main()
     {
         gl_FragColor = vVaryingColor;
@@ -145,12 +145,12 @@ Texture sampling
 ------------------
 
     precision mediump float;
-    
+
     varying vec4 vVaryingColor;
     varying vec2 coord;
-    
+
     uniform sampler2D in_texture;
-    
+
     void main()
     {
         gl_FragColor = 3.0 * vVaryingColor * texture2D(in_texture, coord);
@@ -174,8 +174,8 @@ This adjusts the output position z, based on w. Likely this works around a diffe
 the hardware and the OpenGL standard.
 
 For the gc2000 in the i.mx6 these two instructions are no longer appended (the only difference in the vertex shader for
-smoothed cube between gc600 and gc2000, as generated by the blob driver, is this). 
-  
+smoothed cube between gc600 and gc2000, as generated by the blob driver, is this).
+
 The cutoff point for this is at GC1000. All Vivante GPUs before GC1000 require these two instructions, except
 the GC880.
 
@@ -191,7 +191,7 @@ Misc notes
 
   gl_fragCoord: contains the window-relative coordinates of the current fragment
 
-- In PS, RGROUP 1 register i0.x contains the value of gl_FrontFacing. 
+- In PS, RGROUP 1 register i0.x contains the value of gl_FrontFacing.
   i0.y also contains a non-zero value. i0.zw are zero.
 
     - i1..i127 are simply aliases of i0, at least on my GC600.
diff --git a/doc/kernel_interface.md b/doc/kernel_interface.md
index 0906099..9d56689 100644
--- a/doc/kernel_interface.md
+++ b/doc/kernel_interface.md
@@ -20,14 +20,14 @@ along with the values on a RK2918 device:
 
 Most important to get right are registerMemSize, registerMemBase and irqLine as these allow the driver to find and
 communicate with the GPU hardware. They depend on the board, not on the GPU. For example, on a CuBox these settings are:
-    
+
     irqLine         42
     registerMemBase 0xf1840000
     contiguousBase  0x08000000
 
 The `dove` (cubox) driver also has a `gpu_frequency` parameter that sets the AXICLK/GCCLK clock at startup,
 if compiled with `ENABLE_GPU_CLOCK_BY_DRIVER`. Some devices may need this, although not the CuBox itself (it is disabled in the makefile).
-In that case your GPU will have an entry `GC` in `/proc/clocks`. 
+In that case your GPU will have an entry `GC` in `/proc/clocks`.
 
 On a Freescale i.MX6 (GK802) device the parameters are:
 
@@ -44,7 +44,7 @@ On a Freescale i.MX6 (GK802) device the parameters are:
     contiguousSize    0x0c000000  (192 MB)
     coreClock         156000000
     signal            48
-    baseAddress       0 
+    baseAddress       0
 
 Diagnostics
 ==============
@@ -173,9 +173,9 @@ At startup, the application connects to galcore device using `open` with the dev
 - `/dev/galcore`, or
 - `/dev/graphics/galcore`
 
-After connecting to the device the entire chunk of contiguous memory, after requesting its address and size, 
+After connecting to the device the entire chunk of contiguous memory, after requesting its address and size,
 is mapped into user space using `mmap`. The kernel will return addresses in this range when the user space driver allocates
-contiguous (unified) memory used for communication with the GPU. 
+contiguous (unified) memory used for communication with the GPU.
 
 Ioctl
 -------
@@ -189,7 +189,7 @@ Communication with the kernel driver happens through ioctl calls on the resultin
 `IOCTL_GCHAL_INTERFACE` is the only one of these that is actually used by the userspace blob. This ioctl is passed one argument
 which is a pointer to the following structure:
 
-    typedef struct 
+    typedef struct
     {
         void *in_buf;
         uint32_t in_buf_size;
@@ -197,28 +197,28 @@ which is a pointer to the following structure:
         uint32_t out_buf_size;
     } vivante_ioctl_data_t;
 
-When used by the blob, `in_buf` and `out_buf` point to the same memory address: a `gcsHAL_INTERFACE` structure that is 
+When used by the blob, `in_buf` and `out_buf` point to the same memory address: a `gcsHAL_INTERFACE` structure that is
 used both for input and output arguments.
 
 Command structure
 ------------------
-The `gcsHAL_INTERFACE` (defined in `gc_hal_driver`) is the structure used by the driver to communicate with the 
-kernel. It can be seen as a communication packet with a command opcode and an union with parameters. 
+The `gcsHAL_INTERFACE` (defined in `gc_hal_driver`) is the structure used by the driver to communicate with the
+kernel. It can be seen as a communication packet with a command opcode and an union with parameters.
 Depending on the `command` a different field of this union is used. The same structure is used both for input and output
-arguments. 
+arguments.
 
-For example, the command `gcvHAL_ALLOCATE_LINEAR_VIDEO_MEMORY` (I will leave off the `gcvHAL_` from now on) 
-uses the fields in `interface->u.AllocateLinearVideoMemory` to pass in the number of bytes to allocate, but 
-also to pass out the number of bytes actually allocated. 
+For example, the command `gcvHAL_ALLOCATE_LINEAR_VIDEO_MEMORY` (I will leave off the `gcvHAL_` from now on)
+uses the fields in `interface->u.AllocateLinearVideoMemory` to pass in the number of bytes to allocate, but
+also to pass out the number of bytes actually allocated.
 
-What is curious about the ioctl protocol is that the communication structures contains fields that are not 
-used by the kernel at all. There is no good reason why these values would need 
+What is curious about the ioctl protocol is that the communication structures contains fields that are not
+used by the kernel at all. There is no good reason why these values would need
 to be present in kernel-facing structures. The line is blurry sometimes.
 It also appears that the structure has been designed with platform-independence in mind, and so some of the fields are not used in the Linux
 drivers such as `status`, `handle`, `pid`.
 
 A possibly worthwhile long-term goal would be to clean up the kernel driver interface. This would break compatibility with
-the Vivante binary blobs, though, so maybe the effort would be better spent building a fully-fledged DRM/DRI 
+the Vivante binary blobs, though, so maybe the effort would be better spent building a fully-fledged DRM/DRI
 infrastructure driver instead.
 
 Allocations
@@ -240,29 +240,29 @@ Memory management happens in the kernel. Two types of memory are allocated:
   Allocated with command `ALLOCATE_LINEAR_VIDEO_MEMORY`
 
   Device memory, from one of the pools (default, local, unified or contiguous system memory)
-  The available pools depend on the hardware; many of the devices have no local memory, and simply 
+  The available pools depend on the hardware; many of the devices have no local memory, and simply
   use a part of system memory as video memory.
 
-`LOCK_VIDEO_MEMORY` locks the video memory both 
+`LOCK_VIDEO_MEMORY` locks the video memory both
 - into the GPU memory space so that it can be used by the GPU
-- into CPU memory so that the application can read/write. 
+- into CPU memory so that the application can read/write.
 It is interesting that these are done by
 the same call.
 
 Command buffers
 -------------------
 
-Like many other GPUs, the primary means of programming the chip is through a command stream 
+Like many other GPUs, the primary means of programming the chip is through a command stream
 interpreted by a DMA engine. This "Front End" takes care of distributing state changes through
-the individual modules of the GPU, kicking off primitive rendering, synchronization, 
+the individual modules of the GPU, kicking off primitive rendering, synchronization,
 and also supports some primitive flow control (branch, call, return).
 
-The command stream is submitted to the kernel by means of command buffers. As most important part these 
-structures contain a pointer to contiguous memory (allocated with command `ALLOCATE_CONTIGUOUS_MEMORY`) 
+The command stream is submitted to the kernel by means of command buffers. As most important part these
+structures contain a pointer to contiguous memory (allocated with command `ALLOCATE_CONTIGUOUS_MEMORY`)
 where the commands start.
 
-Command buffers are built in user space by the driver in a `gcoCMDBUF` structure, then submitted to the kernel with the 
-`COMMIT` command. 
+Command buffers are built in user space by the driver in a `gcoCMDBUF` structure, then submitted to the kernel with the
+`COMMIT` command.
 
 The following structure fields of `gcoCMDBUF` are used by the kernel:
 
@@ -276,16 +276,16 @@ The following structure fields of `gcoCMDBUF` are used by the kernel:
 
 User signal API
 ----------------
-Command `USER_SIGNAL` is used for synchronization signals between the kernel and userspace driver. 
+Command `USER_SIGNAL` is used for synchronization signals between the kernel and userspace driver.
 
 Note: the contents in this section only apply as-is if the kernel was *not* compiled with `USE_NEW_LINUX_SIGNAL`. If this
-flag was set, then a posix real-time signal will be used to notify the process of incoming signals, and the 
+flag was set, then a posix real-time signal will be used to notify the process of incoming signals, and the
 `USER_SIGNAL_WAIT` is a no-op.
 
 The subcommands are:
 
 - `USER_SIGNAL_CREATE` Create a new signal
-  Inputs: 
+  Inputs:
      - manualReset
      If set to gcvTRUE, the `SIGNAL` command must be used with state false to
      reset the signal. If set to gcvFALSE, the signal automatically resets
@@ -306,7 +306,7 @@ The subcommands are:
   Outputs: N/A
 
 - `USER_SIGNAL_WAIT` Wait on the signal (block current thread)
-  Inputs: 
+  Inputs:
     - id     Signal id to wait for
     - wait   Maximum duration to wait (in milliseconds)
   Outputs: N/A
@@ -319,7 +319,7 @@ The subcommands are:
   Inputs: id
   Outputs: N/A
 
-This is used to synchronize GPU and CPU. 
+This is used to synchronize GPU and CPU.
 Signals can be scheduled to be signalled/unsignalled when the GPU finished a certain operation (using an Event).
 They are also used for inter-thread synchronization by the EGL driver.
 
@@ -384,7 +384,7 @@ Context switching
 ==================
 Clients manage their own context, which is passed to COMMIT preemptively in case a context switch is needed.
 
-It appears that context switching is manual. Every process has to keep its own context structure for 
+It appears that context switching is manual. Every process has to keep its own context structure for
 context switching, and pass this to COMMIT. In case this is needed the kernel will then load the state
 from the context buffer.
 
@@ -399,7 +399,7 @@ The state `FE.VERTEX_ELEMENT_CONFIG` is handled specially: write only the elemen
 
 Used fields in `struct _gcoCONTEXT` from the kernel:
 
-- `id` 
+- `id`
     [in] This id is used to determine wether to switch context
     [out] A unique id for the context is generated the first time a COMMIT is done, with context->id==0
 - `hint*` only used when `SECURE_USER` is set
@@ -421,7 +421,7 @@ Profiling
 
 To enable profiling, the kernel most have been built with `VIVANTE_PROFILER` enabled in `gc_hal_options.h` or the appropriate
 `config` file.
-   
+
     USE_PROFILER                        = 1
 
 Vivante also recommends disabling power management features while profiling,
@@ -430,7 +430,7 @@ Vivante also recommends disabling power management features while profiling,
 
 HW profiling registers can be read using the command `READ_ALL_PROFILE_REGISTERS`.
 
-There are also the commands `GET_PROFILE_SETTING` and `SET_PROFILE_SETTING`, which set a flag for 
+There are also the commands `GET_PROFILE_SETTING` and `SET_PROFILE_SETTING`, which set a flag for
 logging to a file (`vprofiler.xml` by default), but this flag doesn't do anything in the kernel driver,
 likely it's meant to be read out by the user space driver.
 
@@ -539,12 +539,12 @@ TODO: input/output arguments.
 * `QUERY_CHIP_IDENTITY`
 
         Query chip identity.
-        
-        Calls: gckHARDWARE_QueryChipIdentity 
+
+        Calls: gckHARDWARE_QueryChipIdentity
 
 * `ALLOCATE_NON_PAGED_MEMORY`
 
-        Allocate non-paged memory. 
+        Allocate non-paged memory.
 
         Calls: gckOS_AllocateNonPagedMemory
 
@@ -558,7 +558,7 @@ TODO: input/output arguments.
 
         Allocate contiguous non-paged memory (used for command buffers).
 
-        Calls: gckOS_AllocateContiguous 
+        Calls: gckOS_AllocateContiguous
 
 * `FREE_CONTIGUOUS_MEMORY`
 
@@ -579,7 +579,7 @@ TODO: input/output arguments.
         Walks all required memory pools to allocate the requested amount of video memory.
 
         gcvPOOL_VIRTUAL: Virtual memory, allocated using gckVIDMEM_ConstructVirtual
-        gcvPOOL_CONTIGUOUS: Contiguous memory, allocated using gckVIDMEM_ConstructVirtual 
+        gcvPOOL_CONTIGUOUS: Contiguous memory, allocated using gckVIDMEM_ConstructVirtual
         gcvPOOL_SYSTEM: Contiguous system memory
         gcvPOOL_LOCAL_INTERNAL: Internal memory
         gcvPOOL_LOCAL_EXTERNAL: External memory
@@ -599,7 +599,7 @@ TODO: input/output arguments.
 
 * `FREE_VIDEO_MEMORY`
 
-        Calls: gckVIDMEM_Free 
+        Calls: gckVIDMEM_Free
 
 * `MAP_MEMORY`
 
@@ -610,7 +610,7 @@ TODO: input/output arguments.
 * `UNMAP_MEMORY`
 
         Unmap memory mapped with `MAP_MEMORY`.
-        
+
         Calls: gckKERNEL_UnmapMemory (gckOS_UnmapMemory)
 
 * `MAP_USER_MEMORY`
@@ -630,20 +630,20 @@ TODO: input/output arguments.
 
         Surface lock.
 
-        Calls: gckVIDMEM_Lock 
+        Calls: gckVIDMEM_Lock
 
 * `UNLOCK_VIDEO_MEMORY`
-        
+
         Surface unlock.
-        
-        Calls: gckVIDMEM_Unlock 
+
+        Calls: gckVIDMEM_Unlock
 
 * `EVENT_COMMIT`
-    
+
         Commit an event queue.
 
         Calls: gckEVENT_Commit
-    
+
 * `USER_SIGNAL`
 
         Dispatch depends on the user signal subcommands (refer to section `User signal API`).
@@ -662,7 +662,7 @@ TODO: input/output arguments.
 * `COMMIT`
 
         Commit a command and context buffer.
-        
+
         Calls: gckCOMMAND_Commit
 
 * `STALL`
@@ -680,7 +680,7 @@ TODO: input/output arguments.
         Calls: gckOS_ReadRegister
 
 * `WRITE_REGISTER`
-        
+
         Write a GPU register. Only enabled if kernel compiled with `gcdREGISTER_ACCESS_FROM_USER` (which
         is obviously an security risk, as it allows user-space to read and write arbitrary registers).
 
@@ -704,7 +704,7 @@ TODO: input/output arguments.
         Calls: gckHARDWARE_QueryProfileRegisters
 
 * `PROFILE_REGISTERS_2D`
-        
+
         Read all 2D profile registers. Only available if kernel compiled with `VIVANTE_PROFILER` enabled.
 
         Calls: gckHARDWARE_ProfileEngine2D
@@ -763,7 +763,7 @@ TODO: input/output arguments.
         Flush or invalidate the cache.
         NOTE: unimplemented on Linux, and also apparently not called by the blob on Linux.
 
-        In: 
+        In:
           invalidate: If FALSE, flush the cache (the GPU is going to need the data)
                       if TRUE, flush and invalidate the cache (if the GPU is going to modify the data)
           process: Process handle Logical belongs to or gcvNULL if Logical belongs to the kernel.
@@ -776,7 +776,7 @@ TODO: input/output arguments.
 
         Broadcast GPU stuck.
 
-        Calls: gckOS_Broadcast 
+        Calls: gckOS_Broadcast
 
 Crash recovery
 ================
diff --git a/doc/patents.md b/doc/patents.md
index 47c0280..b75a82f 100644
--- a/doc/patents.md
+++ b/doc/patents.md
@@ -21,7 +21,7 @@ optionally blended with another data value and written to a memory device. Regis
 of filtering with the first coefficients. The block of data may be read from a location including a
 source coordinate. The final result of filtering may be written to a destination coordinate obtained
 by rotating and/or mirroring the source coordinate. The orientation of arrays filtered using the
-first coefficients varies according to a rotation mode. 
+first coefficients varies according to a rotation mode.
 
 - [US20130091189](https://www.google.com/patents/US20130091189) Single datapath floating point implementation of RCP, SQRT, EXP and LOG functions
 and a low latency RCP based on the same techniques
@@ -33,7 +33,7 @@ for performing a polynomial approximation (e.g. a quadratic polynomial approxima
 and one or more data tables corresponding to at least one of the RCP, SQRT, EXP or LOG functions
 operable to be coupled to the single pipeline according to one or more opcodes; wherein the single
 pipeline is operable for computing at least one of RCP, SQRT, EXP or LOG functions according to the
-one or more opcodes. 
+one or more opcodes.
 
 - [US20130002651](https://www.google.com/patents/US20130002651) Apparatus and Method For Texture Level Of Detail Computation
 
@@ -71,7 +71,7 @@ detect an object edge within the image. An edge style detector is configured to
 edge end and a second edge end. The edge style detector also identifies an edge style associated
 with the detected edge based on the first edge end and the second edge end. The system also includes
 a restoration module configured to identify pixel data associated with the detected edge and a
-blending module configured to blend the pixel data associated with the detected edge. 
+blending module configured to blend the pixel data associated with the detected edge.
 
 - [US20110234609](https://www.google.com/patents/US20110234609) Hierarchical tile-based rasterization algorithm
 
@@ -98,7 +98,7 @@ as X, Y and far Z clipping.  In one embodiment, the SE module performs clipping
 a initial point of rasterization. In one embodiment, the RA module performs clipping by way of
 conducting the rendering step of the rasterization process. This approach distributes the complexity
 in the graphics processing pipeline and makes the design simpler and faster, therefore design
-complexity, cost and performance may all be improved in hardware implementation. 
+complexity, cost and performance may all be improved in hardware implementation.
 
 - [US20100131786](https://www.google.com/patents/US20100131786) Single Chip 3D and 2D Graphics Processor with Embedded Memory and Multiple Levels of
 Power Controls
@@ -124,7 +124,7 @@ and to apply a de-ringing filter to a pixel within a pixel subset of the pixel l
 determination that the pixel is not an edge pixel. The determination that the pixel is not an edge
 pixel is based on the identified maximum pixel jump.
 
-- [US20090122076](https://www.google.com/patents/US20090122076) Thin-line detection apparatus and method 
+- [US20090122076](https://www.google.com/patents/US20090122076) Thin-line detection apparatus and method
 
 Published: 2009-05-14
 
@@ -138,7 +138,7 @@ of the pre-determined thin lines, the pixel block may be deemed to describe a th
 apparatus and method may preclude application of an anti-aliasing filter to the substantially
 central pixel of the pixel block in the event it describes a thin line.
 
-- [US20090122068](https://www.google.com/patents/US20090122068) Intelligent configurable graphics bandwidth modulator 
+- [US20090122068](https://www.google.com/patents/US20090122068) Intelligent configurable graphics bandwidth modulator
 
 Published: 2009-05-14
 
@@ -149,7 +149,7 @@ greater than a selected threshold, the graphics system is configured to operate
 wherein vertex data is rendered immediately upon reception. In the event the rate is less than the
 selected threshold, the graphics system is configured to operate in retained mode, wherein vertex
 data is stored prior to being rendered. The apparatus and method switches between each of the modes
-on-the-fly in a manner that is transparent to the application. 
+on-the-fly in a manner that is transparent to the application.
 
 - [US20090122064](https://www.google.com/patents/US20090122064) Efficient tile-based rasterization
 
diff --git a/native/driver/etna_asm.c b/native/driver/etna_asm.c
index 8ba81fd..65816fc 100644
--- a/native/driver/etna_asm.c
+++ b/native/driver/etna_asm.c
@@ -32,7 +32,7 @@ static inline int rgroup_is_uniform(unsigned rgroup)
            rgroup == INST_RGROUP_UNIFORM_1;
 }
 
-/** An instruction can only read from one distinct uniform. 
+/** An instruction can only read from one distinct uniform.
  * This function verifies this property and returns true if the instruction
  * is deemed correct and false otherwise.
  */
@@ -71,16 +71,16 @@ int etna_assemble(uint32_t *out, const struct etna_inst *inst)
         DBG("warning: generating instruction that accesses two different uniforms");
     }
 
-    out[0] = VIV_ISA_WORD_0_OPCODE(inst->opcode) | 
-             VIV_ISA_WORD_0_COND(inst->cond) | 
-             (inst->sat ? VIV_ISA_WORD_0_SAT : 0) | 
-             (inst->dst.use ? VIV_ISA_WORD_0_DST_USE : 0) | 
+    out[0] = VIV_ISA_WORD_0_OPCODE(inst->opcode) |
+             VIV_ISA_WORD_0_COND(inst->cond) |
+             (inst->sat ? VIV_ISA_WORD_0_SAT : 0) |
+             (inst->dst.use ? VIV_ISA_WORD_0_DST_USE : 0) |
              VIV_ISA_WORD_0_DST_AMODE(inst->dst.amode) |
              VIV_ISA_WORD_0_DST_REG(inst->dst.reg) |
              VIV_ISA_WORD_0_DST_COMPS(inst->dst.comps) |
              VIV_ISA_WORD_0_TEX_ID(inst->tex.id);
     out[1] = VIV_ISA_WORD_1_TEX_AMODE(inst->tex.amode) |
-             VIV_ISA_WORD_1_TEX_SWIZ(inst->tex.swiz) | 
+             VIV_ISA_WORD_1_TEX_SWIZ(inst->tex.swiz) |
              (inst->src[0].use ? VIV_ISA_WORD_1_SRC0_USE : 0) |
              VIV_ISA_WORD_1_SRC0_REG(inst->src[0].reg) |
              VIV_ISA_WORD_1_SRC0_SWIZ(inst->src[0].swiz) |
diff --git a/native/driver/etna_blend.c b/native/driver/etna_blend.c
index f48e6f5..c68f80c 100644
--- a/native/driver/etna_blend.c
+++ b/native/driver/etna_blend.c
@@ -48,7 +48,7 @@ static void *etna_pipe_create_blend_state(struct pipe_context *pipe,
     if(enable)
     {
         cs->PE_ALPHA_CONFIG =
-                VIVS_PE_ALPHA_CONFIG_BLEND_ENABLE_COLOR | 
+                VIVS_PE_ALPHA_CONFIG_BLEND_ENABLE_COLOR |
                 (separate_alpha ? VIVS_PE_ALPHA_CONFIG_BLEND_SEPARATE_ALPHA : 0) |
                 VIVS_PE_ALPHA_CONFIG_SRC_FUNC_COLOR(translate_blend_factor(rt0->rgb_src_factor)) |
                 VIVS_PE_ALPHA_CONFIG_SRC_FUNC_ALPHA(translate_blend_factor(rt0->alpha_src_factor)) |
diff --git a/native/driver/etna_clear_blit.c b/native/driver/etna_clear_blit.c
index cce3f07..3de3f12 100644
--- a/native/driver/etna_clear_blit.c
+++ b/native/driver/etna_clear_blit.c
@@ -66,7 +66,7 @@ void etna_rs_gen_clear_surface(struct etna_surface *surf, uint32_t clear_value)
 {
     uint bs = util_format_get_blocksize(surf->base.format);
     uint format = 0;
-    switch(bs) 
+    switch(bs)
     {
     case 2: format = RS_FORMAT_A1R5G5B5; break;
     case 4: format = RS_FORMAT_A8R8G8B8; break;
@@ -106,10 +106,10 @@ static void etna_pipe_clear(struct pipe_context *pipe,
      */
     etna_set_state(priv->ctx, VIVS_GL_FLUSH_CACHE, VIVS_GL_FLUSH_CACHE_COLOR | VIVS_GL_FLUSH_CACHE_DEPTH);
     /* No need to set up the TS here with sync_context.
-     * RS clear operations (in contrast to resolve and copy) do not require the TS state. 
+     * RS clear operations (in contrast to resolve and copy) do not require the TS state.
      */
     /* Need to update clear command in non-TS (fast clear) case *if*
-     * clear value is different from previous time. 
+     * clear value is different from previous time.
      */
     if(buffers & PIPE_CLEAR_COLOR)
     {
@@ -131,7 +131,7 @@ static void etna_pipe_clear(struct pipe_context *pipe,
                 etna_rs_gen_clear_surface(surf, new_clear_value);
             }
             etna_submit_rs_state(priv->ctx, &surf->clear_command);
-            surf->clear_value = new_clear_value; 
+            surf->clear_value = new_clear_value;
         }
     }
     if((buffers & PIPE_CLEAR_DEPTHSTENCIL) && priv->framebuffer_s.zsbuf != NULL)
@@ -202,8 +202,8 @@ static void etna_pipe_resource_copy_region(struct pipe_context *pipe,
     assert(src->format == dst->format);
     /* Resources with nr_samples > 1 are not allowed. */
     assert(src->nr_samples == 1 && dst->nr_samples == 1);
-    /* XXX we can use the RS as a literal copy engine here 
-     * the only complexity is tiling; the size of the boxes needs to be aligned to the tile size 
+    /* XXX we can use the RS as a literal copy engine here
+     * the only complexity is tiling; the size of the boxes needs to be aligned to the tile size
      * how to handle the case where a resource is copied from/to a non-aligned position?
      * from non-aligned: can fall back to rendering-based copy?
      * to non-aligned: can fall back to rendering-based copy?
diff --git a/native/driver/etna_compiler.c b/native/driver/etna_compiler.c
index 11b15c0..bbc71b1 100644
--- a/native/driver/etna_compiler.c
+++ b/native/driver/etna_compiler.c
@@ -228,7 +228,7 @@ static void assign_temporaries_to_native(struct etna_compile_data *cd, struct et
     }
 }
 
-/* assign inputs and outputs to temporaries 
+/* assign inputs and outputs to temporaries
  * Gallium assumes that the hardware has separate registers for taking input and output,
  * however Vivante GPUs use temporaries both for passing in inputs and passing back outputs.
  * Try to re-use temporary registers where possible.
@@ -240,10 +240,10 @@ static void assign_inouts_to_temporaries(struct etna_compile_data *cd, uint file
     int temp_ptr = 0, num_temps;
     struct sort_rec inout_order[ETNA_MAX_TEMPS];
     struct sort_rec temps_order[ETNA_MAX_TEMPS];
-    num_inouts = sort_registers(inout_order, 
+    num_inouts = sort_registers(inout_order,
             cd->file[file], cd->file_size[file],
             mode_inputs ? LAST_USE_ASC : FIRST_USE_ASC);
-    num_temps = sort_registers(temps_order, 
+    num_temps = sort_registers(temps_order,
             cd->file[TGSI_FILE_TEMPORARY], cd->file_size[TGSI_FILE_TEMPORARY],
             mode_inputs ? FIRST_USE_ASC : LAST_USE_ASC);
 
@@ -278,7 +278,7 @@ static void assign_inouts_to_temporaries(struct etna_compile_data *cd, uint file
     }
 }
 
-/* Allocate an immediate with a certain value and return the index. If 
+/* Allocate an immediate with a certain value and return the index. If
  * there is already an immediate with that value, return that.
  */
 static struct etna_inst_src alloc_imm_u32(struct etna_compile_data *cd, uint32_t value)
@@ -334,7 +334,7 @@ static void etna_compile_parse_declarations(struct etna_compile_data *cd, const
             decl = &ctx.FullToken.FullDeclaration;
             cd->file_size[decl->Declaration.File] = MAX2(cd->file_size[decl->Declaration.File], decl->Range.Last+1);
             break;
-        case TGSI_TOKEN_TYPE_IMMEDIATE: /* immediates are handled differently from other files; they are not declared 
+        case TGSI_TOKEN_TYPE_IMMEDIATE: /* immediates are handled differently from other files; they are not declared
                                            explicitly, and always add four components */
             imm = &ctx.FullToken.FullImmediate;
             assert(cd->imm_size <= (ETNA_MAX_IMM-4));
@@ -383,7 +383,7 @@ static void etna_compile_pass_check_usage(struct etna_compile_data *cd, const st
     while(!tgsi_parse_end_of_tokens(&ctx))
     {
         tgsi_parse_token(&ctx);
-        /* find out max register #s used 
+        /* find out max register #s used
          * for every register mark first and last instruction id where it's used
          * this allows finding slots that can be used as input and output registers
          *
@@ -449,7 +449,7 @@ static void assign_special_inputs(struct etna_compile_data *cd)
     if(cd->processor == TGSI_PROCESSOR_FRAGMENT)
     {
         /* never assign t0; writing to it causes fragment to be discarded? */
-        cd->next_free_native = 1; 
+        cd->next_free_native = 1;
         /* hardwire TGSI_SEMANTIC_POSITION (input and output) to t0 */
         for(int idx=0; idx<cd->total_decls; ++idx)
         {
@@ -464,12 +464,12 @@ static void assign_special_inputs(struct etna_compile_data *cd)
     }
 }
 
-/* Pass -- optimize outputs 
+/* Pass -- optimize outputs
  * Mesa tends to generate code like this at the end if their shaders
  *   MOV OUT[1], TEMP[2]
  *   MOV OUT[0], TEMP[0]
  *   MOV OUT[2], TEMP[1]
- * Recognize if 
+ * Recognize if
  * a) there is only a single assignment to an output register and
  * b) the temporary is not used after that
  * Also recognize direct assignment of IN to OUT (passthrough)
@@ -493,7 +493,7 @@ static void etna_compile_pass_optimize_outputs(struct etna_compile_data *cd, con
             inst = &ctx.FullToken.FullInstruction;
             switch(inst->Instruction.Opcode)
             {
-            case TGSI_OPCODE_MOV: 
+            case TGSI_OPCODE_MOV:
                 if(inst->Dst[0].Register.File == TGSI_FILE_OUTPUT &&
                    inst->Src[0].Register.File == TGSI_FILE_TEMPORARY)
                 {
@@ -509,7 +509,7 @@ static void etna_compile_pass_optimize_outputs(struct etna_compile_data *cd, con
                         cd->dead_inst[inst_idx] = true;
                     }
                 }
-                /* direct assignment of input to output -- 
+                /* direct assignment of input to output --
                  * allocate a new register, and associate both input and output to it */
                 if(inst->Dst[0].Register.File == TGSI_FILE_OUTPUT &&
                    inst->Src[0].Register.File == TGSI_FILE_INPUT)
@@ -517,7 +517,7 @@ static void etna_compile_pass_optimize_outputs(struct etna_compile_data *cd, con
                     uint out_idx = inst->Dst[0].Register.Index;
                     uint in_idx = inst->Src[0].Register.Index;
 
-                    cd->file[TGSI_FILE_OUTPUT][out_idx].native = cd->file[TGSI_FILE_INPUT][in_idx].native = 
+                    cd->file[TGSI_FILE_OUTPUT][out_idx].native = cd->file[TGSI_FILE_INPUT][in_idx].native =
                         alloc_new_native_reg(cd);
                     /* mark this MOV instruction as a no-op */
                     cd->dead_inst[inst_idx] = true;
@@ -587,7 +587,7 @@ static struct etna_inst_src convert_src(struct etna_compile_data *cd, const stru
     return rv;
 }
 
-/* convert destination to source operand (for operation in place) 
+/* convert destination to source operand (for operation in place)
  * i.e,
  *    MUL dst0.x__w, src0.xyzw, 2/PI
  *    SIN dst0.x__w, dst0.xyzw
@@ -829,7 +829,7 @@ static void etna_compile_pass_generate_code(struct etna_compile_data *cd, const
                         .src[1] = convert_src(cd, &inst->Src[1], INST_SWIZ_IDENTITY),
                         });
                 } break;
-            case TGSI_OPCODE_MAD: 
+            case TGSI_OPCODE_MAD:
                 emit_inst(cd, &(struct etna_inst) {
                         .opcode = INST_OPCODE_MAD,
                         .sat = sat,
@@ -860,9 +860,9 @@ static void etna_compile_pass_generate_code(struct etna_compile_data *cd, const
                         .dst = convert_dst(cd, &inst->Dst[0]),
                         .src[2] = convert_src(cd, &inst->Src[0], INST_SWIZ_IDENTITY),
                         });
-                break; 
+                break;
             case TGSI_OPCODE_DP2A: assert(0); break;
-            case TGSI_OPCODE_FRC: 
+            case TGSI_OPCODE_FRC:
                 emit_inst(cd, &(struct etna_inst) {
                         .opcode = INST_OPCODE_FRC,
                         .sat = sat,
@@ -924,7 +924,7 @@ static void etna_compile_pass_generate_code(struct etna_compile_data *cd, const
                         });
                 break;
             case TGSI_OPCODE_RCC: assert(0); break;
-            case TGSI_OPCODE_DPH: assert(0); break; /* src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w */ 
+            case TGSI_OPCODE_DPH: assert(0); break; /* src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w */
             case TGSI_OPCODE_COS: /* fall through */
             case TGSI_OPCODE_SIN:
                 if(cd->specs->has_sin_cos_sqrt)
@@ -1202,7 +1202,7 @@ static struct etna_reg_desc *find_decl_by_semantic(struct etna_compile_data *cd,
     return NULL; /* not found */
 }
 
-/** Add ADD and MUL instruction to bring Z/W to 0..1 if -1..1 if needed: 
+/** Add ADD and MUL instruction to bring Z/W to 0..1 if -1..1 if needed:
  * - this is a vertex shader
  * - and this is an older GPU
  */
@@ -1269,8 +1269,8 @@ static void etna_compile_add_nop_if_needed(struct etna_compile_data *cd)
 
 /* Allocate CONST and IMM to native ETNA_RGROUP_UNIFORM(x).
  * CONST must be consecutive as const buffers are supposed to be consecutive, and before IMM, as this is
- * more convenient because is possible for the compilation process itself to generate extra 
- * immediates for constants such as pi, one, zero. 
+ * more convenient because is possible for the compilation process itself to generate extra
+ * immediates for constants such as pi, one, zero.
  */
 static void assign_constants_and_immediates(struct etna_compile_data *cd)
 {
@@ -1296,7 +1296,7 @@ static void assign_constants_and_immediates(struct etna_compile_data *cd)
 static void assign_texture_units(struct etna_compile_data *cd)
 {
     uint tex_base = 0;
-    if(cd->processor == TGSI_PROCESSOR_VERTEX) 
+    if(cd->processor == TGSI_PROCESSOR_VERTEX)
     {
         tex_base = cd->specs->vertex_sampler_offset;
     }
@@ -1366,7 +1366,7 @@ static void permute_ps_inputs(struct etna_compile_data *cd)
                 .valid = 1,
                 .rgroup = INST_RGROUP_TEMP,
                 .id = input_id
-            }, cd->file[TGSI_FILE_INPUT][idx].native); 
+            }, cd->file[TGSI_FILE_INPUT][idx].native);
     }
     cd->num_varyings = native_idx-1;
     if(native_idx > cd->next_free_native)
@@ -1489,7 +1489,7 @@ static void fill_in_vs_outputs(struct etna_shader_object *sobj, struct etna_comp
      * in the unified shader architecture. More precisely, it is determined from the number of VS outputs, as well as chip-specific
      * vertex output buffer size, vertex cache size, and the number of shader cores.
      *
-     * XXX this is a conservative estimate, the "optimal" value is only known for sure at link time because some 
+     * XXX this is a conservative estimate, the "optimal" value is only known for sure at link time because some
      * outputs may be unused and thus unmapped. Then again, in the general use case with GLSL the vertex and fragment
      * shaders are linked already before submitting to Gallium, thus all outputs are used.
      */
@@ -1505,7 +1505,7 @@ static void fill_in_vs_outputs(struct etna_shader_object *sobj, struct etna_comp
 
 static bool etna_compile_check_limits(struct etna_compile_data *cd)
 {
-    int max_uniforms = (cd->processor == TGSI_PROCESSOR_VERTEX) ? 
+    int max_uniforms = (cd->processor == TGSI_PROCESSOR_VERTEX) ?
                         cd->specs->max_vs_uniforms :
                         cd->specs->max_ps_uniforms;
     if(cd->inst_ptr > cd->specs->max_instructions)
@@ -1541,7 +1541,7 @@ int etna_compile_shader_object(const struct etna_pipe_specs *specs, const struct
     cd->specs = specs;
 
     /* Build a map from gallium register to native registers for files
-     * CONST, SAMP, IMM, OUT, IN, TEMP. 
+     * CONST, SAMP, IMM, OUT, IN, TEMP.
      * SAMP will map as-is for fragment shaders, there will be a +8 offset for vertex shaders.
      */
     /* Pass one -- check register file declarations and immediates */
@@ -1561,7 +1561,7 @@ int etna_compile_shader_object(const struct etna_pipe_specs *specs, const struct
     etna_compile_pass_optimize_outputs(cd, tokens);
 
     /* XXX assign special inputs: gl_FrontFacing (VARYING_SLOT_FACE)
-     *     this is part of RGROUP_INTERNAL 
+     *     this is part of RGROUP_INTERNAL
      */
 
     /* assign inputs: last usage of input should be <= first usage of temp */
@@ -1585,9 +1585,9 @@ int etna_compile_shader_object(const struct etna_pipe_specs *specs, const struct
     assign_inouts_to_temporaries(cd, TGSI_FILE_INPUT);
 
     /* assign outputs: first usage of output should be >= last usage of temp */
-    /*   potential optimization case: 
+    /*   potential optimization case:
      *      if single MOV OUT[x], TEMP[y] (with full write mask, or at least writing all components that are used in
-     *        the shader) after which temp y is no longer used temp[y] can be used as output register as-is 
+     *        the shader) after which temp y is no longer used temp[y] can be used as output register as-is
      *
      *   potential problem: instruction with multiple outputs of which one is the temp and the other is the output;
      *      however, as the temp is not used after this, how would this make sense? could just discard the output value
@@ -1602,27 +1602,27 @@ int etna_compile_shader_object(const struct etna_pipe_specs *specs, const struct
      *       advance temporary pointer
      */
     assign_inouts_to_temporaries(cd, TGSI_FILE_OUTPUT);
-    
+
     assign_constants_and_immediates(cd);
     assign_texture_units(cd);
-    
+
     /* list declarations */
     for(int x=0; x<cd->total_decls; ++x)
     {
         DBG_F(ETNA_COMPILER_MSGS, "%i: %s,%d active=%i first_use=%i last_use=%i native=%i usage_mask=%x has_semantic=%i", x, tgsi_file_name(cd->decl[x].file), cd->decl[x].idx,
                 cd->decl[x].active,
                 cd->decl[x].first_use, cd->decl[x].last_use, cd->decl[x].native.valid?cd->decl[x].native.id:-1,
-                cd->decl[x].usage_mask, 
+                cd->decl[x].usage_mask,
                 cd->decl[x].has_semantic);
         if(cd->decl[x].has_semantic)
             DBG_F(ETNA_COMPILER_MSGS, " semantic_name=%s semantic_idx=%i",
                     tgsi_semantic_names[cd->decl[x].semantic.Name], cd->decl[x].semantic.Index);
     }
     /* XXX for PS we need to permute so that inputs are always in temporary 0..N-1.
-     * There is no "switchboard" for varyings (AFAIK!). The output color, however, can be routed 
+     * There is no "switchboard" for varyings (AFAIK!). The output color, however, can be routed
      * from an arbitrary temporary.
      */
-    if(cd->processor == TGSI_PROCESSOR_FRAGMENT) 
+    if(cd->processor == TGSI_PROCESSOR_FRAGMENT)
     {
         permute_ps_inputs(cd);
     }
@@ -1633,7 +1633,7 @@ int etna_compile_shader_object(const struct etna_pipe_specs *specs, const struct
         DBG_F(ETNA_COMPILER_MSGS, "%i: %s,%d active=%i first_use=%i last_use=%i native=%i usage_mask=%x has_semantic=%i", x, tgsi_file_name(cd->decl[x].file), cd->decl[x].idx,
                 cd->decl[x].active,
                 cd->decl[x].first_use, cd->decl[x].last_use, cd->decl[x].native.valid?cd->decl[x].native.id:-1,
-                cd->decl[x].usage_mask, 
+                cd->decl[x].usage_mask,
                 cd->decl[x].has_semantic);
         if(cd->decl[x].has_semantic)
             DBG_F(ETNA_COMPILER_MSGS, " semantic_name=%s semantic_idx=%i",
@@ -1700,22 +1700,22 @@ void etna_dump_shader_object(const struct etna_shader_object *sobj)
     printf("immediates:\n");
     for(int idx=0; idx<sobj->imm_size; ++idx)
     {
-        printf(" [%i].%s = %f (0x%08x)\n", (idx+sobj->imm_base)/4, tgsi_swizzle_names[idx%4], 
+        printf(" [%i].%s = %f (0x%08x)\n", (idx+sobj->imm_base)/4, tgsi_swizzle_names[idx%4],
                 *((float*)&sobj->imm_data[idx]), sobj->imm_data[idx]);
     }
     printf("inputs:\n");
     for(int idx=0; idx<sobj->num_inputs; ++idx)
     {
-        printf(" [%i] name=%s index=%i pa=%08x comps=%i\n", 
-                sobj->inputs[idx].reg, 
+        printf(" [%i] name=%s index=%i pa=%08x comps=%i\n",
+                sobj->inputs[idx].reg,
                 tgsi_semantic_names[sobj->inputs[idx].semantic.Name], sobj->inputs[idx].semantic.Index,
                 sobj->inputs[idx].pa_attributes, sobj->inputs[idx].num_components);
     }
     printf("outputs:\n");
     for(int idx=0; idx<sobj->num_outputs; ++idx)
     {
-        printf(" [%i] name=%s index=%i pa=%08x comps=%i\n", 
-                sobj->outputs[idx].reg, 
+        printf(" [%i] name=%s index=%i pa=%08x comps=%i\n",
+                sobj->outputs[idx].reg,
                 tgsi_semantic_names[sobj->outputs[idx].semantic.Name], sobj->outputs[idx].semantic.Index,
                 sobj->outputs[idx].pa_attributes, sobj->outputs[idx].num_components);
     }
diff --git a/native/driver/etna_compiler.h b/native/driver/etna_compiler.h
index 4134de1..9ba9d98 100644
--- a/native/driver/etna_compiler.h
+++ b/native/driver/etna_compiler.h
@@ -30,7 +30,7 @@
 #include "pipe/p_shader_tokens.h"
 
 /* XXX some of these such as ETNA_MAX_LABELS are pretty arbitrary limits, may be better to switch
- * to dynamic allocation at some point. 
+ * to dynamic allocation at some point.
  */
 #define ETNA_MAX_TEMPS (64) /* max temp register count of all Vivante hw */
 #define ETNA_MAX_TOKENS (2048)
@@ -65,12 +65,12 @@ struct etna_shader_object
     uint32_t imm_base; /* base of immediates (in 32 bit units) */
     uint32_t imm_size; /* size of immediates (in 32 bit units) */
     uint32_t *imm_data;
-  
-    /* inputs (for linking) 
+
+    /* inputs (for linking)
      *   for fs, the inputs must be in register 1..N */
     unsigned num_inputs;
     struct etna_shader_inout inputs[ETNA_NUM_INPUTS];
-    
+
     /* outputs (for linking) */
     unsigned num_outputs;
     struct etna_shader_inout outputs[ETNA_NUM_INPUTS];
@@ -108,7 +108,7 @@ int etna_compile_shader_object(const struct etna_pipe_specs *specs, const struct
 void etna_dump_shader_object(const struct etna_shader_object *sobj);
 
 /* Link two shader objects together, annotates each PS input with the VS
- * output register. Returns non-zero if the linking fails. 
+ * output register. Returns non-zero if the linking fails.
  */
 int etna_link_shader_objects(struct etna_shader_link_info *info, const struct etna_shader_object *vs, const struct etna_shader_object *fs);
 
diff --git a/native/driver/etna_fence.c b/native/driver/etna_fence.c
index f337d9f..916f152 100644
--- a/native/driver/etna_fence.c
+++ b/native/driver/etna_fence.c
@@ -51,7 +51,7 @@ int etna_fence_new(struct pipe_screen *screen_h, struct etna_ctx *ctx, struct pi
         fence->signalled = false;
     } else {
         fence = CALLOC_STRUCT(etna_fence);
-        /* Create signal with manual reset; we want to be able to probe it 
+        /* Create signal with manual reset; we want to be able to probe it
          * or wait for it without resetting it.
          */
         if((rv = viv_user_signal_create(ctx->conn, /* manualReset */ true, &fence->signal)) != VIV_STATUS_OK)
@@ -90,7 +90,7 @@ static void etna_screen_fence_reference(struct pipe_screen *screen_h,
     struct etna_fence *fence = etna_fence(fence_h);
     struct etna_fence **ptr = (struct etna_fence **) ptr_h;
     struct etna_fence *old_fence = *ptr;
-    if (pipe_reference_described(&(*ptr)->reference, &fence->reference, 
+    if (pipe_reference_described(&(*ptr)->reference, &fence->reference,
                                  (debug_reference_descriptor)debug_describe_fence))
     {
         if(etna_screen_fence_signalled(screen_h, (struct pipe_fence_handle*)old_fence))
@@ -132,7 +132,7 @@ static boolean etna_screen_fence_finish(struct pipe_screen *screen_h,
     if(fence->signalled) /* avoid a kernel roundtrip */
         return true;
     /* nanoseconds to milliseconds */
-    rv = viv_user_signal_wait(screen->dev, fence->signal, 
+    rv = viv_user_signal_wait(screen->dev, fence->signal,
             timeout == PIPE_TIMEOUT_INFINITE ? VIV_WAIT_INDEFINITE : (timeout / 1000000ULL));
     if(rv != VIV_STATUS_OK && rv != VIV_STATUS_TIMEOUT)
     {
diff --git a/native/driver/etna_fence.h b/native/driver/etna_fence.h
index 5621c03..41afbc6 100644
--- a/native/driver/etna_fence.h
+++ b/native/driver/etna_fence.h
@@ -49,8 +49,8 @@ etna_fence(struct pipe_fence_handle *pfence)
  * Create a new fence that will be signalled after GPU completes rendering
  * after the next flush.
  */
-int etna_fence_new(struct pipe_screen *screen, 
-                   struct etna_ctx *ctx, 
+int etna_fence_new(struct pipe_screen *screen,
+                   struct etna_ctx *ctx,
                    struct pipe_fence_handle **fence);
 
 /**
diff --git a/native/driver/etna_internal.h b/native/driver/etna_internal.h
index 232050c..75313a3 100644
--- a/native/driver/etna_internal.h
+++ b/native/driver/etna_internal.h
@@ -206,7 +206,7 @@ struct compiled_framebuffer_state
     uint32_t SE_SCISSOR_TOP; // fixp
     uint32_t SE_SCISSOR_RIGHT; // fixp
     uint32_t SE_SCISSOR_BOTTOM; // fixp
-    uint32_t TS_MEM_CONFIG; 
+    uint32_t TS_MEM_CONFIG;
     uint32_t TS_DEPTH_CLEAR_VALUE;
     uint32_t TS_DEPTH_STATUS_BASE;
     uint32_t TS_DEPTH_SURFACE_BASE;
@@ -239,7 +239,7 @@ struct compiled_set_index_buffer
 };
 
 /* Compiled linked VS+PS shader state */
-struct compiled_shader_state 
+struct compiled_shader_state
 {
     uint32_t RA_CONTROL;
     uint32_t PA_ATTRIBUTE_ELEMENT_COUNT;
@@ -250,7 +250,7 @@ struct compiled_shader_state
     uint32_t VS_TEMP_REGISTER_CONTROL;
     uint32_t VS_OUTPUT[4];
     uint32_t VS_INPUT[4];
-    uint32_t VS_LOAD_BALANCING; 
+    uint32_t VS_LOAD_BALANCING;
     uint32_t VS_START_PC;
     uint32_t PS_END_PC;
     uint32_t PS_OUTPUT_REG;
@@ -361,7 +361,7 @@ struct etna_3d_state
     uint32_t /*0163C*/ RS_CLEAR_CONTROL;
     uint32_t /*01640*/ RS_FILL_VALUE[4];
 
-    uint32_t /*01654*/ TS_MEM_CONFIG; 
+    uint32_t /*01654*/ TS_MEM_CONFIG;
     uint32_t /*01658*/ TS_COLOR_STATUS_BASE;
     uint32_t /*0165C*/ TS_COLOR_SURFACE_BASE;
     uint32_t /*01660*/ TS_COLOR_CLEAR_VALUE;
diff --git a/native/driver/etna_pipe.h b/native/driver/etna_pipe.h
index 7e0e129..5d092d9 100644
--- a/native/driver/etna_pipe.h
+++ b/native/driver/etna_pipe.h
@@ -54,7 +54,7 @@ enum etna_varying_special {
 struct etna_shader_varying
 {
     int num_components;
-    enum etna_varying_special special; 
+    enum etna_varying_special special;
     int pa_attributes;
     int vs_reg; /* VS output register */
 };
@@ -93,7 +93,7 @@ struct etna_resource
 struct etna_surface
 {
     struct pipe_surface base;
-   
+
     enum etna_surface_layout layout;
     struct etna_resource_level surf;
     uint32_t clear_value; // XXX remember depth/stencil clear value from ->clear
diff --git a/native/driver/etna_rasterizer.c b/native/driver/etna_rasterizer.c
index b683a2e..cb27461 100644
--- a/native/driver/etna_rasterizer.c
+++ b/native/driver/etna_rasterizer.c
@@ -44,7 +44,7 @@ static void *etna_pipe_create_rasterizer_state(struct pipe_context *pipe,
         printf("Different front and back fill mode not supported\n");
     }
     cs->PA_CONFIG =
-            (rs->flatshade ? VIVS_PA_CONFIG_SHADE_MODEL_FLAT : VIVS_PA_CONFIG_SHADE_MODEL_SMOOTH) | 
+            (rs->flatshade ? VIVS_PA_CONFIG_SHADE_MODEL_FLAT : VIVS_PA_CONFIG_SHADE_MODEL_SMOOTH) |
             translate_cull_face(rs->cull_face, rs->front_ccw) |
             translate_polygon_mode(rs->fill_front) |
             (rs->point_quad_rasterization ? VIVS_PA_CONFIG_POINT_SPRITE_ENABLE : 0) |
diff --git a/native/driver/etna_resource.c b/native/driver/etna_resource.c
index fac5f64..5e4185d 100644
--- a/native/driver/etna_resource.c
+++ b/native/driver/etna_resource.c
@@ -36,7 +36,7 @@
 #include "util/u_transfer.h" /* u_default_resource_get_handle */
 
 /* Associate an resource with this context when it is bound in any way
- * (vertex buffer, index buffer, texture, surface, blit). 
+ * (vertex buffer, index buffer, texture, surface, blit).
  */
 void etna_resource_touch(struct pipe_context *pipe, struct pipe_resource *resource_)
 {
@@ -93,7 +93,7 @@ static boolean etna_screen_can_create_resource(struct pipe_screen *pscreen,
     }
     return true;
 }
-                           
+
 static struct pipe_resource * etna_screen_resource_from_handle(struct pipe_screen *screen,
                                               const struct pipe_resource *templat,
                                               struct winsys_handle *handle)
@@ -102,7 +102,7 @@ static struct pipe_resource * etna_screen_resource_from_handle(struct pipe_scree
     return NULL;
 }
 
-/* Allocate 2D texture or render target resource 
+/* Allocate 2D texture or render target resource
  */
 static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *screen,
                                          const struct pipe_resource *templat)
@@ -112,7 +112,7 @@ static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *sc
     unsigned element_size = util_format_get_blocksize(templat->format);
     if(!element_size)
         return NULL;
-    
+
     /* Check input */
     if(templat->target == PIPE_TEXTURE_CUBE)
     {
@@ -136,14 +136,14 @@ static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *sc
     assert(templat->array_size != 0);
 
     /* Figure out what tiling to use -- for now, assume that textures cannot be supertiled, and cannot be linear.
-     * There is a feature flag SUPERTILED_TEXTURE (not supported on any known hw) that may allow this, as well 
-     * as LINEAR_TEXTURE_SUPPORT (supported on gc880 and gc2000 at least), but not sure how it works. 
+     * There is a feature flag SUPERTILED_TEXTURE (not supported on any known hw) that may allow this, as well
+     * as LINEAR_TEXTURE_SUPPORT (supported on gc880 and gc2000 at least), but not sure how it works.
      * Buffers always have LINEAR layout.
      */
     unsigned layout = ETNA_LAYOUT_LINEAR;
     if(templat->target != PIPE_BUFFER)
     {
-        if(!(templat->bind & PIPE_BIND_SAMPLER_VIEW) && priv->specs.can_supertile) 
+        if(!(templat->bind & PIPE_BIND_SAMPLER_VIEW) && priv->specs.can_supertile)
             layout = ETNA_LAYOUT_SUPER_TILED;
         else
             layout = ETNA_LAYOUT_TILED;
@@ -183,7 +183,7 @@ static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *sc
         mip->padded_height = align(y, paddingY);
         mip->stride = align(resource->levels[ix].padded_width, divSizeX)/divSizeX * element_size;
         mip->offset = offset;
-        mip->layer_stride = align(mip->padded_width, divSizeX)/divSizeX * 
+        mip->layer_stride = align(mip->padded_width, divSizeX)/divSizeX *
                       align(mip->padded_height, divSizeY)/divSizeY * element_size;
         mip->size = templat->array_size * mip->layer_stride;
         offset += mip->size;
@@ -196,7 +196,7 @@ static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *sc
 
     /* Determine memory size, and whether to create a tile status */
     size_t rt_size = offset;
-    
+
     /* determine memory type */
     enum viv_surf_type memtype = VIV_SURF_UNKNOWN;
     if(templat->bind & PIPE_BIND_SAMPLER_VIEW)
@@ -205,14 +205,14 @@ static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *sc
         memtype = VIV_SURF_RENDER_TARGET;
     else if(templat->bind & PIPE_BIND_DEPTH_STENCIL)
         memtype = VIV_SURF_DEPTH;
-    else if(templat->bind & PIPE_BIND_INDEX_BUFFER) 
+    else if(templat->bind & PIPE_BIND_INDEX_BUFFER)
         memtype = VIV_SURF_INDEX;
     else if(templat->bind & PIPE_BIND_VERTEX_BUFFER)
         memtype = VIV_SURF_VERTEX;
 
     DBG_F(ETNA_RESOURCE_MSGS, "%p: Allocate surface of %ix%i (padded to %ix%i) of format %i (%i bpe %ix%i), size %08x flags %08x, memtype %i",
             resource,
-            templat->width0, templat->height0, resource->levels[0].padded_width, resource->levels[0].padded_height, templat->format, 
+            templat->width0, templat->height0, resource->levels[0].padded_width, resource->levels[0].padded_height, templat->format,
             element_size, divSizeX, divSizeY, rt_size, templat->bind, memtype);
 
     struct etna_vidmem *rt = 0;
@@ -221,7 +221,7 @@ static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *sc
         printf("Problem allocating video memory for resource\n");
         return NULL;
     }
-   
+
     resource->base = *templat;
     resource->base.last_level = ix; /* real last mipmap level */
     resource->base.screen = screen;
@@ -236,7 +236,7 @@ static struct pipe_resource * etna_screen_resource_create(struct pipe_screen *sc
         struct etna_resource_level *mip = &resource->levels[ix];
         mip->address = resource->surface->address + mip->offset;
         mip->logical = resource->surface->logical + mip->offset;
-        DBG_F(ETNA_RESOURCE_MSGS, "  %08x level %i: %ix%i (%i) stride=%i layer_stride=%i", 
+        DBG_F(ETNA_RESOURCE_MSGS, "  %08x level %i: %ix%i (%i) stride=%i layer_stride=%i",
                 (int)mip->address, ix, (int)mip->width, (int)mip->height, (int)mip->size,
                 (int)mip->stride, (int)mip->layer_stride);
     }
@@ -254,7 +254,7 @@ static void etna_screen_resource_destroy(struct pipe_screen *screen,
     if(resource->last_ctx != NULL)
     {
         /* XXX This could fail when multiple contexts share this resource,
-         * (the last one to bind it will "own" it) or fail miserably if 
+         * (the last one to bind it will "own" it) or fail miserably if
          * the context was since destroyed.
          */
         struct etna_pipe_context *ectx = resource->last_ctx;
diff --git a/native/driver/etna_screen.c b/native/driver/etna_screen.c
index dba71dd..1cf80ea 100644
--- a/native/driver/etna_screen.c
+++ b/native/driver/etna_screen.c
@@ -238,12 +238,12 @@ static int etna_screen_get_shader_param( struct pipe_screen *screen, unsigned sh
     case PIPE_SHADER_CAP_MAX_ADDRS:
             return 1; /* Max native address registers */
     case PIPE_SHADER_CAP_MAX_CONSTS:
-            /* Absolute maximum on ideal hardware is 256 (as that's how much register space is reserved); 
-             * immediates are included in here, so actual space available for constants will always be less. 
-             * Also the amount of registers really available depends on the hw. 
+            /* Absolute maximum on ideal hardware is 256 (as that's how much register space is reserved);
+             * immediates are included in here, so actual space available for constants will always be less.
+             * Also the amount of registers really available depends on the hw.
              * XXX see also: viv_specs.num_constants, if this is 0 we need to come up with some default value.
              */
-            return 256; 
+            return 256;
     case PIPE_SHADER_CAP_MAX_CONST_BUFFERS:
             return 1;
     case PIPE_SHADER_CAP_MAX_PREDS:
@@ -266,7 +266,7 @@ static int etna_screen_get_shader_param( struct pipe_screen *screen, unsigned sh
     case PIPE_SHADER_CAP_INTEGERS: /* XXX supported on gc2000 but not yet implemented */
             return 0;
     case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS:
-            return shader==PIPE_SHADER_FRAGMENT ? priv->specs.fragment_sampler_count : 
+            return shader==PIPE_SHADER_FRAGMENT ? priv->specs.fragment_sampler_count :
                                                   priv->specs.vertex_sampler_count;
     case PIPE_SHADER_CAP_PREFERRED_IR:
             return PIPE_SHADER_IR_TGSI;
@@ -316,7 +316,7 @@ static boolean etna_screen_is_format_supported( struct pipe_screen *screen,
     struct etna_screen *priv = etna_screen(screen);
     unsigned allowed = 0;
     if ((target >= PIPE_MAX_TEXTURE_TYPES) ||
-                (sample_count > 1) /* TODO add MSAA */) 
+                (sample_count > 1) /* TODO add MSAA */)
     {
         return FALSE;
     }
@@ -364,7 +364,7 @@ static boolean etna_screen_is_format_supported( struct pipe_screen *screen,
         }
     }
     /* Always allowed */
-    allowed |= usage & (PIPE_BIND_DISPLAY_TARGET | PIPE_BIND_SCANOUT | 
+    allowed |= usage & (PIPE_BIND_DISPLAY_TARGET | PIPE_BIND_SCANOUT |
             PIPE_BIND_SHARED | PIPE_BIND_TRANSFER_READ | PIPE_BIND_TRANSFER_WRITE);
 
     return usage == allowed;
@@ -447,7 +447,7 @@ etna_screen_create(struct viv_conn *dev)
     screen->specs.shader_core_count = dev->chip.shader_core_count;
     screen->specs.stream_count = dev->chip.stream_count;
     screen->specs.has_sin_cos_sqrt = VIV_FEATURE(dev, chipMinorFeatures0, HAS_SQRT_TRIG);
-    screen->specs.has_shader_range_registers = dev->chip.chip_model >= 0x1000 || dev->chip.chip_model == 0x880;    
+    screen->specs.has_shader_range_registers = dev->chip.chip_model >= 0x1000 || dev->chip.chip_model == 0x880;
     if (dev->chip.instruction_count > 256) /* unified instruction memory? */
     {
         screen->specs.vs_offset = 0xC000;
diff --git a/native/driver/etna_shader.c b/native/driver/etna_shader.c
index 4a27b90..b22792e 100644
--- a/native/driver/etna_shader.c
+++ b/native/driver/etna_shader.c
@@ -34,14 +34,14 @@
 #include <etnaviv/state_3d.xml.h>
 
 /* Link vs and fs together: fill in shader_state from vs and fs
- * as this function is called every time a new fs or vs is bound, the goal is to do 
- * little processing as possible here, and to precompute as much as possible in the 
+ * as this function is called every time a new fs or vs is bound, the goal is to do
+ * little processing as possible here, and to precompute as much as possible in the
  * vs/fs shader_object.
  * XXX we could cache the link result for a certain set of VS/PS; usually a pair
  * of VS and PS will be used together anyway.
  */
 void etna_link_shaders(struct pipe_context *pipe,
-                              struct compiled_shader_state *cs, 
+                              struct compiled_shader_state *cs,
                               const struct etna_shader_object *vs, const struct etna_shader_object *fs)
 {
     assert(vs->processor == TGSI_PROCESSOR_VERTEX);
@@ -84,7 +84,7 @@ void etna_link_shaders(struct pipe_context *pipe,
         DBG_F(ETNA_LINKER_MSGS,"  %i -> %i", link.varyings_vs_reg[idx], idx+1);
     }
 
-    /* vs outputs (varyings) */ 
+    /* vs outputs (varyings) */
     uint32_t vs_output[16] = {0};
     int varid = 0;
     vs_output[varid++] = vs->vs_pos_out_reg;
@@ -95,10 +95,10 @@ void etna_link_shaders(struct pipe_context *pipe,
 
     for(int idx=0; idx<4; ++idx)
     {
-        cs->VS_OUTPUT[idx] =(vs_output[idx*4+0] << 0)  | (vs_output[idx*4+1] << 8) | 
+        cs->VS_OUTPUT[idx] =(vs_output[idx*4+0] << 0)  | (vs_output[idx*4+1] << 8) |
                                  (vs_output[idx*4+2] << 16) | (vs_output[idx*4+3] << 24);
     }
-    
+
     /* vs inputs (attributes) */
     uint32_t vs_input[4] = {0};
     for(int idx=0; idx<vs->num_inputs; ++idx)
@@ -143,7 +143,7 @@ void etna_link_shaders(struct pipe_context *pipe,
     cs->GL_VARYING_NUM_COMPONENTS = num_components;
     cs->GL_VARYING_COMPONENT_USE[0] = component_use[0];
     cs->GL_VARYING_COMPONENT_USE[1] = component_use[1];
-    
+
     /* reference instruction memory */
     cs->vs_inst_mem_size = vs->code_size;
     cs->VS_INST_MEM = vs->code;
@@ -165,7 +165,7 @@ static void etna_set_constant_buffer(struct pipe_context *pipe,
     struct etna_pipe_context *priv = etna_pipe_context(pipe);
     if(buf == NULL) /* Unbinding constant buffer is a no-op as we don't keep a pointer */
         return;
-    assert(buf->buffer == NULL && buf->user_buffer != NULL); 
+    assert(buf->buffer == NULL && buf->user_buffer != NULL);
     /* support only user buffer for now */
     assert(priv->vs && priv->fs);
     if(likely(index == 0))
diff --git a/native/driver/etna_shader.h b/native/driver/etna_shader.h
index 6012c3a..a9c2610 100644
--- a/native/driver/etna_shader.h
+++ b/native/driver/etna_shader.h
@@ -30,7 +30,7 @@ struct etna_shader_object;
 struct compiled_shader_state;
 
 void etna_link_shaders(struct pipe_context *pipe,
-                              struct compiled_shader_state *cs, 
+                              struct compiled_shader_state *cs,
                               const struct etna_shader_object *vs, const struct etna_shader_object *fs);
 
 void etna_pipe_shader_init(struct pipe_context *pipe);
diff --git a/native/driver/etna_surface.c b/native/driver/etna_surface.c
index b2bebef..df1f3f4 100644
--- a/native/driver/etna_surface.c
+++ b/native/driver/etna_surface.c
@@ -48,7 +48,7 @@ static struct pipe_surface *etna_pipe_create_surface(struct pipe_context *pipe,
     unsigned layer = templat->u.tex.first_layer;
     unsigned level = templat->u.tex.level;
     assert(layer < resource->base.array_size);
-   
+
     surf->base.context = pipe;
 
     pipe_reference_init(&surf->base.reference, 1);
@@ -74,8 +74,8 @@ static struct pipe_surface *etna_pipe_create_surface(struct pipe_context *pipe,
 
     surf->layout = resource->layout;
     surf->surf = resource->levels[level];
-    surf->surf.address += layer * surf->surf.layer_stride; 
-    surf->surf.logical += layer * surf->surf.layer_stride; 
+    surf->surf.address += layer * surf->surf.layer_stride;
+    surf->surf.logical += layer * surf->surf.layer_stride;
     surf->clear_value = 0; /* last clear value */
 
     if(surf->surf.ts_address)
diff --git a/native/driver/etna_texture.c b/native/driver/etna_texture.c
index 9040d0f..9eb594a 100644
--- a/native/driver/etna_texture.c
+++ b/native/driver/etna_texture.c
@@ -50,7 +50,7 @@ static void *etna_pipe_create_sampler_state(struct pipe_context *pipe,
                 /* XXX get from sampler view: VIVS_TE_SAMPLER_CONFIG0_FORMAT(tex_format) */
     cs->TE_SAMPLER_CONFIG1 = 0; /* VIVS_TE_SAMPLER_CONFIG1 (swizzle, extended format) fully determined by sampler view */
     cs->TE_SAMPLER_LOD_CONFIG =
-            (ss->lod_bias != 0.0 ? VIVS_TE_SAMPLER_LOD_CONFIG_BIAS_ENABLE : 0) | 
+            (ss->lod_bias != 0.0 ? VIVS_TE_SAMPLER_LOD_CONFIG_BIAS_ENABLE : 0) |
             VIVS_TE_SAMPLER_LOD_CONFIG_BIAS(float_to_fixp55(ss->lod_bias));
     if(ss->min_mip_filter != PIPE_TEX_MIPFILTER_NONE)
     {
diff --git a/native/driver/etna_transfer.c b/native/driver/etna_transfer.c
index 551e422..c8aa20f 100644
--- a/native/driver/etna_transfer.c
+++ b/native/driver/etna_transfer.c
@@ -67,10 +67,10 @@ static void *etna_pipe_transfer_map(struct pipe_context *pipe,
     /* XXX we don't handle PIPE_TRANSFER_FLUSH_EXPLICIT; this flag can be ignored when mapping in-place,
      * but when not in place we need to fire off the copy operation in transfer_flush_region (currently
      * a no-op) instead of unmap. Need to handle this to support ARB_map_buffer_range extension at least.
-     */ 
+     */
     /* XXX we don't take care of current operations on the resource; which can be, at some point in the pipeline
        which is not yet executed:
-      
+
        - bound as surface
        - bound through vertex buffer
        - bound through index buffer
@@ -85,13 +85,13 @@ static void *etna_pipe_transfer_map(struct pipe_context *pipe,
        We also need to know whether the resource is in use to determine if a sync is needed (or just do it
        always, but that comes at the expense of performance).
 
-       A conservative approximation without too much overhead would be to mark all resources that have 
-       been bound at some point as busy. A drawback would be that accessing resources that have 
+       A conservative approximation without too much overhead would be to mark all resources that have
+       been bound at some point as busy. A drawback would be that accessing resources that have
        been bound but are no longer in use for a while still carry a performance penalty. On the other hand,
-       the program could be using PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE or PIPE_TRANSFER_UNSYNCHRONIZED to 
+       the program could be using PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE or PIPE_TRANSFER_UNSYNCHRONIZED to
        avoid this in the first place...
-       
-       A) We use an in-pipe copy engine, and queue the copy operation after unmap so that the copy 
+
+       A) We use an in-pipe copy engine, and queue the copy operation after unmap so that the copy
           will be performed when all current commands have been executed.
           Using the RS is possible, not sure if always efficient. This can also do any kind of tiling for us.
           Only possible when PIPE_TRANSFER_DISCARD_RANGE is set.
@@ -140,7 +140,7 @@ static void *etna_pipe_transfer_map(struct pipe_context *pipe,
                     uint bpe = util_format_get_blocksize(resource_priv->base.format);
                     /* XXX currently only handles multiples of the tile size */
                     void *ptr = res_level->logical + etna_compute_offset(resource_priv->base.format, &ptrans->base.box, res_level->stride, res_level->layer_stride);
-                    etna_texture_untile(ptrans->buffer, ptr, ptrans->base.box.width, ptrans->base.box.height, 
+                    etna_texture_untile(ptrans->buffer, ptr, ptrans->base.box.width, ptrans->base.box.height,
                             ptrans->base.stride, bpe);
                 } else { /* non-tiled or compressed format */
                     util_copy_box(ptrans->buffer,
@@ -162,7 +162,7 @@ static void *etna_pipe_transfer_map(struct pipe_context *pipe,
     *out_transfer = &ptrans->base;
     return ptrans->buffer;
 }
-   
+
 static void etna_pipe_transfer_flush_region(struct pipe_context *pipe,
 				  struct pipe_transfer *transfer_,
 				  const struct pipe_box *box)
@@ -176,7 +176,7 @@ static void etna_pipe_transfer_unmap(struct pipe_context *pipe,
     struct etna_pipe_context *priv = etna_pipe_context(pipe);
     struct etna_transfer *ptrans = etna_transfer(transfer_);
 
-    /* XXX 
+    /* XXX
      * When writing to a resource that is already in use, replace the resource with a completely new buffer
      * and free the old one using a fenced free.
      * The most tricky case to implement will be: tiled or supertiled surface, partial write, target not aligned to 4/64
@@ -197,7 +197,7 @@ static void etna_pipe_transfer_unmap(struct pipe_context *pipe,
                     uint bpe = util_format_get_blocksize(resource->base.format);
                     /* XXX currently only handles multiples of the tile size */
                     void *ptr = level->logical + etna_compute_offset(resource->base.format, &ptrans->base.box, level->stride, level->layer_stride);
-                    etna_texture_tile(ptr, ptrans->buffer, ptrans->base.box.width, ptrans->base.box.height, 
+                    etna_texture_tile(ptr, ptrans->buffer, ptrans->base.box.width, ptrans->base.box.height,
                             ptrans->base.stride, bpe);
                 } else { /* non-tiled or compressed format */
                     util_copy_box(level->logical,
diff --git a/native/driver/etna_translate.h b/native/driver/etna_translate.h
index c98cb4d..adf5415 100644
--- a/native/driver/etna_translate.h
+++ b/native/driver/etna_translate.h
@@ -70,7 +70,7 @@ static inline uint32_t translate_stencil_mode(bool enable_0, bool enable_1)
 {
     if(enable_0)
     {
-        return enable_1 ? VIVS_PE_STENCIL_CONFIG_MODE_TWO_SIDED : 
+        return enable_1 ? VIVS_PE_STENCIL_CONFIG_MODE_TWO_SIDED :
                           VIVS_PE_STENCIL_CONFIG_MODE_ONE_SIDED;
     } else {
         return VIVS_PE_STENCIL_CONFIG_MODE_DISABLED;
@@ -125,7 +125,7 @@ static inline uint32_t translate_blend_factor(unsigned blend_factor)
     case PIPE_BLENDFACTOR_INV_DST_COLOR: return BLEND_FUNC_ONE_MINUS_DST_COLOR;
     case PIPE_BLENDFACTOR_INV_CONST_COLOR: return BLEND_FUNC_ONE_MINUS_CONSTANT_COLOR;
     case PIPE_BLENDFACTOR_INV_CONST_ALPHA: return BLEND_FUNC_ONE_MINUS_CONSTANT_ALPHA;
-    case PIPE_BLENDFACTOR_SRC1_COLOR: 
+    case PIPE_BLENDFACTOR_SRC1_COLOR:
     case PIPE_BLENDFACTOR_SRC1_ALPHA:
     case PIPE_BLENDFACTOR_INV_SRC1_COLOR:
     case PIPE_BLENDFACTOR_INV_SRC1_ALPHA:
@@ -206,7 +206,7 @@ static inline uint32_t translate_texture_format(enum pipe_format fmt, bool silen
 /* render target format (non-rb swapped RS-supported formats) */
 static inline uint32_t translate_rt_format(enum pipe_format fmt, bool silent)
 {
-    switch(fmt) 
+    switch(fmt)
     {
     /* Note: Pipe format convention is LSB to MSB, VIVS is MSB to LSB */
     case PIPE_FORMAT_B4G4R4X4_UNORM: return RS_FORMAT_X4R4G4B4;
@@ -223,7 +223,7 @@ static inline uint32_t translate_rt_format(enum pipe_format fmt, bool silent)
 
 static inline uint32_t translate_depth_format(enum pipe_format fmt, bool silent)
 {
-    switch(fmt) 
+    switch(fmt)
     {
     /* Note: Pipe format convention is LSB to MSB, VIVS is MSB to LSB */
     case PIPE_FORMAT_Z16_UNORM: return VIVS_PE_DEPTH_CONFIG_DEPTH_FORMAT_D16;
@@ -236,7 +236,7 @@ static inline uint32_t translate_depth_format(enum pipe_format fmt, bool silent)
 /* render target format for MSAA */
 static inline uint32_t translate_msaa_format(enum pipe_format fmt, bool silent)
 {
-    switch(fmt) 
+    switch(fmt)
     {
     /* Note: Pipe format convention is LSB to MSB, VIVS is MSB to LSB */
     case PIPE_FORMAT_B4G4R4X4_UNORM: return VIVS_TS_MEM_CONFIG_MSAA_FORMAT_A4R4G4B4;
@@ -373,9 +373,9 @@ static inline uint32_t translate_vertex_format_type(enum pipe_format fmt, bool s
 static inline uint32_t translate_vertex_format_normalize(enum pipe_format fmt)
 {
     const struct util_format_description *desc = util_format_description(fmt);
-    if(!desc) 
+    if(!desc)
         return VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_OFF;
-    /* assumes that normalization of channel 0 holds for all channels; 
+    /* assumes that normalization of channel 0 holds for all channels;
      * this holds for all vertex formats that we support */
     return desc->channel[0].normalized ? VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_ON :
                                          VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_OFF;
@@ -408,7 +408,7 @@ static inline uint32_t translate_draw_mode(unsigned mode)
     }
 }
 
-/* Get size multiple for size of texture/rendertarget with a certain layout 
+/* Get size multiple for size of texture/rendertarget with a certain layout
  * This is affected by many different parameters:
  *   -  A horizontal multiple of 16 is used when possible as in this case tile status and resolve can be used
  *       at the cost of only a little bit extra memory usage.
@@ -449,7 +449,7 @@ static inline void etna_layout_multiple(unsigned layout, unsigned pixel_pipes,
         *paddingY = 64 * pixel_pipes;
         *halign = TEXTURE_HALIGN_SPLIT_SUPER_TILED;
         break;
-    default: DBG("Unhandled layout %i\n", layout); 
+    default: DBG("Unhandled layout %i\n", layout);
     }
 }
 
@@ -466,23 +466,23 @@ static inline uint32_t translate_clear_color(enum pipe_format format, const unio
                 (etna_cfloat_to_uintN(color->f[0], 8) << 16) |
                 (etna_cfloat_to_uintN(color->f[3], 8) << 24);
         break;
-    case PIPE_FORMAT_B4G4R4X4_UNORM: 
-    case PIPE_FORMAT_B4G4R4A4_UNORM: 
+    case PIPE_FORMAT_B4G4R4X4_UNORM:
+    case PIPE_FORMAT_B4G4R4A4_UNORM:
         clear_value = etna_cfloat_to_uintN(color->f[2], 4) |
                 (etna_cfloat_to_uintN(color->f[1], 4) << 4) |
                 (etna_cfloat_to_uintN(color->f[0], 4) << 8) |
                 (etna_cfloat_to_uintN(color->f[3], 4) << 12);
         clear_value |= clear_value << 16;
         break;
-    case PIPE_FORMAT_B5G5R5X1_UNORM: 
-    case PIPE_FORMAT_B5G5R5A1_UNORM: 
+    case PIPE_FORMAT_B5G5R5X1_UNORM:
+    case PIPE_FORMAT_B5G5R5A1_UNORM:
         clear_value = etna_cfloat_to_uintN(color->f[2], 5) |
                 (etna_cfloat_to_uintN(color->f[1], 5) << 5) |
                 (etna_cfloat_to_uintN(color->f[0], 5) << 10) |
                 (etna_cfloat_to_uintN(color->f[3], 1) << 15);
         clear_value |= clear_value << 16;
         break;
-    case PIPE_FORMAT_B5G6R5_UNORM: 
+    case PIPE_FORMAT_B5G6R5_UNORM:
         clear_value = etna_cfloat_to_uintN(color->f[2], 5) |
                 (etna_cfloat_to_uintN(color->f[1], 6) << 5) |
                 (etna_cfloat_to_uintN(color->f[0], 5) << 11);
@@ -499,12 +499,12 @@ static inline uint32_t translate_clear_depth_stencil(enum pipe_format format, fl
     uint32_t clear_value = 0;
     switch(format) // XXX util_pack_color
     {
-    case PIPE_FORMAT_Z16_UNORM: 
+    case PIPE_FORMAT_Z16_UNORM:
         clear_value = etna_cfloat_to_uintN(depth, 16);
         clear_value |= clear_value << 16;
         break;
-    case PIPE_FORMAT_X8Z24_UNORM: 
-    case PIPE_FORMAT_S8_UINT_Z24_UNORM: 
+    case PIPE_FORMAT_X8Z24_UNORM:
+    case PIPE_FORMAT_S8_UINT_Z24_UNORM:
         clear_value = (etna_cfloat_to_uintN(depth, 24) << 8) | (stencil & 0xFF);
         break;
     default:
diff --git a/native/driver/etna_zsa.c b/native/driver/etna_zsa.c
index 5464d64..892ecfb 100644
--- a/native/driver/etna_zsa.c
+++ b/native/driver/etna_zsa.c
@@ -46,7 +46,7 @@ static void *etna_pipe_create_depth_stencil_alpha_state(struct pipe_context *pip
 
     /* Set operations to KEEP if write mask is 0.
      * When we don't do this, the depth buffer is written for the entire primitive instead of
-     * just where the stencil condition holds (GC600 rev 0x0019, without feature CORRECT_STENCIL). 
+     * just where the stencil condition holds (GC600 rev 0x0019, without feature CORRECT_STENCIL).
      * Not sure if this is a hardware bug or just a strange edge case.
      */
     for(i=0; i<2; ++i)
@@ -60,7 +60,7 @@ static void *etna_pipe_create_depth_stencil_alpha_state(struct pipe_context *pip
     /* Determine whether to enable early z reject. Don't enable it when any of the stencil functions is used. */
     if(dsa.stencil[0].enabled)
     {
-        if(dsa.stencil[0].fail_op != PIPE_STENCIL_OP_KEEP || 
+        if(dsa.stencil[0].fail_op != PIPE_STENCIL_OP_KEEP ||
            dsa.stencil[0].zfail_op != PIPE_STENCIL_OP_KEEP ||
            dsa.stencil[0].zpass_op != PIPE_STENCIL_OP_KEEP)
         {
@@ -68,7 +68,7 @@ static void *etna_pipe_create_depth_stencil_alpha_state(struct pipe_context *pip
         }
         else if(dsa.stencil[1].enabled)
         {
-            if(dsa.stencil[1].fail_op != PIPE_STENCIL_OP_KEEP || 
+            if(dsa.stencil[1].fail_op != PIPE_STENCIL_OP_KEEP ||
                dsa.stencil[1].zfail_op != PIPE_STENCIL_OP_KEEP ||
                dsa.stencil[1].zpass_op != PIPE_STENCIL_OP_KEEP)
             {
@@ -89,7 +89,7 @@ static void *etna_pipe_create_depth_stencil_alpha_state(struct pipe_context *pip
     cs->PE_STENCIL_OP =
             VIVS_PE_STENCIL_OP_FUNC_FRONT(dsa.stencil[0].func) |
             VIVS_PE_STENCIL_OP_FUNC_BACK(dsa.stencil[1].func) |
-            VIVS_PE_STENCIL_OP_FAIL_FRONT(translate_stencil_op(dsa.stencil[0].fail_op)) | 
+            VIVS_PE_STENCIL_OP_FAIL_FRONT(translate_stencil_op(dsa.stencil[0].fail_op)) |
             VIVS_PE_STENCIL_OP_FAIL_BACK(translate_stencil_op(dsa.stencil[1].fail_op)) |
             VIVS_PE_STENCIL_OP_DEPTH_FAIL_FRONT(translate_stencil_op(dsa.stencil[0].zfail_op)) |
             VIVS_PE_STENCIL_OP_DEPTH_FAIL_BACK(translate_stencil_op(dsa.stencil[1].zfail_op)) |
@@ -97,7 +97,7 @@ static void *etna_pipe_create_depth_stencil_alpha_state(struct pipe_context *pip
             VIVS_PE_STENCIL_OP_PASS_BACK(translate_stencil_op(dsa.stencil[1].zpass_op));
     cs->PE_STENCIL_CONFIG =
             translate_stencil_mode(dsa.stencil[0].enabled, dsa.stencil[1].enabled) |
-            VIVS_PE_STENCIL_CONFIG_MASK_FRONT(dsa.stencil[0].valuemask) | 
+            VIVS_PE_STENCIL_CONFIG_MASK_FRONT(dsa.stencil[0].valuemask) |
             VIVS_PE_STENCIL_CONFIG_WRITE_MASK(dsa.stencil[0].writemask);
             /* XXX back masks in VIVS_PE_DEPTH_CONFIG_EXT? */
             /* XXX VIVS_PE_STENCIL_CONFIG_REF_FRONT comes from pipe_stencil_ref */
diff --git a/native/etnaviv/etna.c b/native/etnaviv/etna.c
index 8e72142..2cf44ba 100644
--- a/native/etnaviv/etna.c
+++ b/native/etnaviv/etna.c
@@ -194,7 +194,7 @@ static int gpu_context_clear(struct etna_ctx *ctx)
 }
 
 /** Start building context buffer.
- * Subsequent etna_reserve and other state setting commands will go to 
+ * Subsequent etna_reserve and other state setting commands will go to
  * the context buffer instead of the command buffer.
  * initial_pipe is the pipe as it has to be at the beginning of the context
  * buffer.
@@ -351,7 +351,7 @@ int etna_create(struct viv_conn *conn, struct etna_ctx **ctx_out)
             return ETNA_INTERNAL_ERROR;
         }
 #ifdef DEBUG
-        printf("Allocated buffer %i: phys=%08x log=%08x bytes=%08x [signal %i]\n", x, 
+        printf("Allocated buffer %i: phys=%08x log=%08x bytes=%08x [signal %i]\n", x,
                 (uint32_t)buf0_physical, (uint32_t)buf0_logical, buf0_bytes, ctx->cmdbuf_sig[x]);
 #endif
     }
@@ -427,10 +427,10 @@ int etna_free(struct etna_ctx *ctx)
     return ETNA_OK;
 }
 
-/* internal (non-inline) part of etna_reserve 
+/* internal (non-inline) part of etna_reserve
  * - commit current command buffer (if there is a current command buffer)
  * - signify when current command buffer becomes available using a signal
- * - switch to next command buffer 
+ * - switch to next command buffer
  */
 int _etna_reserve_internal(struct etna_ctx *ctx, size_t n)
 {
@@ -559,7 +559,7 @@ int etna_flush(struct etna_ctx *ctx)
         cur_buf->startOffset = cur_buf->offset = COMMAND_BUFFER_SIZE - END_COMMIT_CLEARANCE;
     }
 
-    /* Set writing offset for next etna_reserve. For convenience this is 
+    /* Set writing offset for next etna_reserve. For convenience this is
        stored as an index instead of a byte offset.  */
     ctx->offset = cur_buf->offset / 4;
 #ifdef DEBUG
@@ -687,7 +687,7 @@ int etna_set_context_cb(struct etna_ctx *ctx, etna_context_snapshot_cb_t snapsho
 void etna_dump_cmd_buffer(struct etna_ctx *ctx)
 {
     uint32_t start_offset = ctx->cmdbuf[ctx->cur_buf]->startOffset/4 + 8;
-    uint32_t *buf = &ctx->buf[start_offset]; 
+    uint32_t *buf = &ctx->buf[start_offset];
     size_t size = ctx->offset - start_offset;
     printf("cmdbuf:\n");
     for(unsigned idx=0; idx<size; ++idx)
diff --git a/native/etnaviv/etna.h b/native/etnaviv/etna.h
index 0d74b00..516861a 100644
--- a/native/etnaviv/etna.h
+++ b/native/etnaviv/etna.h
@@ -21,7 +21,7 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
-/* Buffer building and submission, abstracts away specific kernel interface 
+/* Buffer building and submission, abstracts away specific kernel interface
  * as much as practically possible.
  */
 #ifndef H_ETNA
@@ -79,7 +79,7 @@ enum etna_status {
 };
 
 /* HW pipes.
- * Used by GPU to tell front-end what back-end modules to synchronize operations with. 
+ * Used by GPU to tell front-end what back-end modules to synchronize operations with.
  */
 enum etna_pipe {
     ETNA_PIPE_3D = 0,
@@ -96,7 +96,7 @@ struct etna_context_info {
     void *logical;
 };
 
-typedef int (*etna_context_snapshot_cb_t)(void *data, struct etna_ctx *ctx, 
+typedef int (*etna_context_snapshot_cb_t)(void *data, struct etna_ctx *ctx,
         enum etna_pipe *initial_pipe, enum etna_pipe *final_pipe);
 
 struct etna_ctx {
@@ -250,9 +250,9 @@ int etna_set_context_cb(struct etna_ctx *ctx, etna_context_snapshot_cb_t snapsho
 /* print command buffer for debugging */
 void etna_dump_cmd_buffer(struct etna_ctx *ctx);
 
-/** 
+/**
  * Direct state setting functions; these can be used for convenience. When absolute performance
- * is required while updating big blocks of state at once, it is recommended to use the 
+ * is required while updating big blocks of state at once, it is recommended to use the
  * ETNA_EMIT_* macros and etna_reserve directly.
  */
 static inline void etna_set_state(struct etna_ctx *cmdbuf, uint32_t address, uint32_t value)
diff --git a/native/etnaviv/etna_fb.c b/native/etnaviv/etna_fb.c
index 9457692..5bd7470 100644
--- a/native/etnaviv/etna_fb.c
+++ b/native/etnaviv/etna_fb.c
@@ -108,8 +108,8 @@ bool etna_fb_get_format(const struct fb_var_screeninfo *fb_var, unsigned *rs_for
                 (int)fb_var->transp.offset, (int)fb_var->transp.length);
         return false;
     } else {
-        printf("Framebuffer format: %i, flip_rb=%i\n", 
-                etna_fb_formats[fmt_idx].rs_format, 
+        printf("Framebuffer format: %i, flip_rb=%i\n",
+                etna_fb_formats[fmt_idx].rs_format,
                 etna_fb_formats[fmt_idx].swap_rb);
         *rs_format = etna_fb_formats[fmt_idx].rs_format;
         *swap_rb = etna_fb_formats[fmt_idx].swap_rb;
diff --git a/native/etnaviv/etna_mem.c b/native/etnaviv/etna_mem.c
index 5f4918b..19fec5e 100644
--- a/native/etnaviv/etna_mem.c
+++ b/native/etnaviv/etna_mem.c
@@ -40,7 +40,7 @@
 #include "gc_hal_types.h"
 
 //#define DEBUG
-#define ETNA_VIDMEM_ALIGNMENT (0x40) 
+#define ETNA_VIDMEM_ALIGNMENT (0x40)
 
 int etna_vidmem_alloc_linear(struct viv_conn *conn, struct etna_vidmem **mem_out, size_t bytes, enum viv_surf_type type, enum viv_pool pool, bool lock)
 {
diff --git a/native/etnaviv/etna_rs.c b/native/etnaviv/etna_rs.c
index c208cc8..04103ef 100644
--- a/native/etnaviv/etna_rs.c
+++ b/native/etnaviv/etna_rs.c
@@ -85,7 +85,7 @@ void etna_compile_rs_state(struct compiled_rs_state *cs, const struct rs_state *
     SET_STATE(RS_EXTRA_CONFIG, VIVS_RS_EXTRA_CONFIG_AA(rs->aa) | VIVS_RS_EXTRA_CONFIG_ENDIAN(rs->endian_mode));
 }
 
-/* submit RS state, without any processing and no dependence on context 
+/* submit RS state, without any processing and no dependence on context
  * except TS if this is a source-to-destination blit. */
 void etna_submit_rs_state(struct etna_ctx *restrict ctx, const struct compiled_rs_state *cs)
 {
diff --git a/native/etnaviv/etna_tex.c b/native/etnaviv/etna_tex.c
index 229ca78..569e0e8 100644
--- a/native/etnaviv/etna_tex.c
+++ b/native/etnaviv/etna_tex.c
@@ -20,7 +20,7 @@ void etna_texture_tile(void *dest, void *src, unsigned width, unsigned height, u
             unsigned ty = (srcy/TEX_TILE_HEIGHT) * dst_stride + (srcy%TEX_TILE_HEIGHT) * TEX_TILE_WIDTH;
             for(unsigned srcx=0; srcx<width; ++srcx)
             {
-                ((uint32_t*)dest)[ty + (srcx/TEX_TILE_WIDTH)*TEX_TILE_WORDS + (srcx%TEX_TILE_WIDTH)] = 
+                ((uint32_t*)dest)[ty + (srcx/TEX_TILE_WIDTH)*TEX_TILE_WORDS + (srcx%TEX_TILE_WIDTH)] =
                     ((uint32_t*)src)[srcy * src_stride + srcx];
             }
         }
@@ -32,7 +32,7 @@ void etna_texture_tile(void *dest, void *src, unsigned width, unsigned height, u
             unsigned ty = (srcy/TEX_TILE_HEIGHT) * dst_stride + (srcy%TEX_TILE_HEIGHT) * TEX_TILE_WIDTH;
             for(unsigned srcx=0; srcx<width; ++srcx)
             {
-                ((uint16_t*)dest)[ty + (srcx/TEX_TILE_WIDTH)*TEX_TILE_WORDS + (srcx%TEX_TILE_WIDTH)] = 
+                ((uint16_t*)dest)[ty + (srcx/TEX_TILE_WIDTH)*TEX_TILE_WORDS + (srcx%TEX_TILE_WIDTH)] =
                     ((uint16_t*)src)[srcy * src_stride + srcx];
             }
         }
@@ -43,7 +43,7 @@ void etna_texture_tile(void *dest, void *src, unsigned width, unsigned height, u
             unsigned ty = (srcy/TEX_TILE_HEIGHT) * dst_stride + (srcy%TEX_TILE_HEIGHT) * TEX_TILE_WIDTH;
             for(unsigned srcx=0; srcx<width; ++srcx)
             {
-                ((uint8_t*)dest)[ty + (srcx/TEX_TILE_WIDTH)*TEX_TILE_WORDS + (srcx%TEX_TILE_WIDTH)] = 
+                ((uint8_t*)dest)[ty + (srcx/TEX_TILE_WIDTH)*TEX_TILE_WORDS + (srcx%TEX_TILE_WIDTH)] =
                     ((uint8_t*)src)[srcy * src_stride + srcx];
             }
         }
@@ -67,7 +67,7 @@ void etna_texture_untile(void *dest, void *src, unsigned width, unsigned height,
             unsigned ty = (dsty/TEX_TILE_HEIGHT) * src_stride + (dsty%TEX_TILE_HEIGHT) * TEX_TILE_WIDTH;
             for(unsigned dstx=0; dstx<width; ++dstx)
             {
-                ((uint32_t*)dest)[dsty * dst_stride + dstx] = 
+                ((uint32_t*)dest)[dsty * dst_stride + dstx] =
                     ((uint32_t*)src)[ty + (dstx/TEX_TILE_WIDTH)*TEX_TILE_WORDS + (dstx%TEX_TILE_WIDTH)];
             }
         }
diff --git a/native/etnaviv/viv.c b/native/etnaviv/viv.c
index 5f25c1b..7a9439e 100644
--- a/native/etnaviv/viv.c
+++ b/native/etnaviv/viv.c
@@ -178,7 +178,7 @@ int viv_open(enum viv_hw_type hw_type, struct viv_conn **out)
     conn->kernel_driver.patch = 0;
     conn->kernel_driver.build = 0;
 #endif
-    snprintf(conn->kernel_driver.name, sizeof(conn->kernel_driver.name), 
+    snprintf(conn->kernel_driver.name, sizeof(conn->kernel_driver.name),
             "Vivante GPL kernel driver %i.%i.%i.%i",
             conn->kernel_driver.major, conn->kernel_driver.minor,
             conn->kernel_driver.patch, conn->kernel_driver.build);
@@ -276,7 +276,7 @@ int viv_alloc_linear_vidmem(struct viv_conn *conn, size_t bytes, size_t alignmen
     *node = VIV_TO_HANDLE(id.u.AllocateLinearVideoMemory.node);
     if(bytes_out != NULL)
         *bytes_out = id.u.AllocateLinearVideoMemory.bytes;
-    return gcvSTATUS_OK; 
+    return gcvSTATUS_OK;
 }
 
 int viv_lock_vidmem(struct viv_conn *conn, viv_node_t node, viv_addr_t *physical, void **logical)
@@ -298,7 +298,7 @@ int viv_lock_vidmem(struct viv_conn *conn, viv_node_t node, viv_addr_t *physical
     }
     *physical = id.u.LockVideoMemory.address;
     *logical = VIV_TO_PTR(id.u.LockVideoMemory.memory);
-    return gcvSTATUS_OK; 
+    return gcvSTATUS_OK;
 }
 
 /** Unlock (unmap) video memory node from GPU and CPU memory.
diff --git a/native/etnaviv/viv.h b/native/etnaviv/viv.h
index 017ff67..11fe66e 100644
--- a/native/etnaviv/viv.h
+++ b/native/etnaviv/viv.h
@@ -149,7 +149,7 @@ typedef uint64_t viv_context_t;
 typedef uint64_t viv_usermem_t;
 
 /* kernel-interface independent chip specs structure, this is much easier to use
- * than checking GCABI defines all the time. 
+ * than checking GCABI defines all the time.
  */
 struct viv_specs {
     uint32_t chip_model;
@@ -177,7 +177,7 @@ struct viv_kernel_driver_version {
 struct viv_conn {
     int fd;
     enum viv_hw_type hw_type;
-    
+
     viv_addr_t base_address;
     void *mem;
     viv_addr_t mem_base;
@@ -256,7 +256,7 @@ int viv_user_signal_create(struct viv_conn *conn, int manualReset, int *id_out);
  */
 int viv_user_signal_signal(struct viv_conn *conn, int sig_id, int state);
 
-/** Wait for signal. 
+/** Wait for signal.
  * @param[in] wait Provide time to wait in milliseconds, or VIV_WAIT_INDEFINITE.
  */
 int viv_user_signal_wait(struct viv_conn *conn, int sig_id, int wait);
@@ -272,9 +272,9 @@ void viv_show_chip_info(struct viv_conn *conn);
  */
 int viv_reset(struct viv_conn *conn);
 
-/** Convenience macro to probe features from state.xml.h: 
- * VIV_FEATURE(chipFeatures, FAST_CLEAR) 
- * VIV_FEATURE(chipMinorFeatures1, AUTO_DISABLE) 
+/** Convenience macro to probe features from state.xml.h:
+ * VIV_FEATURE(chipFeatures, FAST_CLEAR)
+ * VIV_FEATURE(chipMinorFeatures1, AUTO_DISABLE)
  */
 #define VIV_FEATURE(conn, word, feature) ((conn->chip.chip_features[viv_ ## word] & (word ## _ ## feature))!=0)
 
diff --git a/native/test2d/filterblt2d.c b/native/test2d/filterblt2d.c
index d26b56c..fdf191c 100644
--- a/native/test2d/filterblt2d.c
+++ b/native/test2d/filterblt2d.c
@@ -163,7 +163,7 @@ int main(int argc, char **argv)
     {
         for(int val_idx=0; val_idx<FB_NR_VALUES; ++val_idx)
         {
-            /* -2.0 .. 2.0 to -0x8000..0x7fff 
+            /* -2.0 .. 2.0 to -0x8000..0x7fff
              * (1.14 fixed point with sign bit)
              */
             float val = kernel_in[row_idx][val_idx] * (1<<14);
@@ -192,24 +192,24 @@ int main(int argc, char **argv)
         etna_set_state(ctx, VIVS_DE_VPLANE_ADDRESS, 0);
         etna_set_state(ctx, VIVS_DE_VPLANE_STRIDE, 0);
 
-        /* Are these used in VR blit? 
+        /* Are these used in VR blit?
          * Likely, only the source format is.
          */
         etna_set_state(ctx, VIVS_DE_SRC_ROTATION_CONFIG, 0);
-        etna_set_state(ctx, VIVS_DE_SRC_CONFIG, 
+        etna_set_state(ctx, VIVS_DE_SRC_CONFIG,
                 VIVS_DE_SRC_CONFIG_SOURCE_FORMAT(DE_FORMAT_A8R8G8B8) |
                 VIVS_DE_SRC_CONFIG_LOCATION_MEMORY |
                 VIVS_DE_SRC_CONFIG_PE10_SOURCE_FORMAT(DE_FORMAT_A8R8G8B8));
-        etna_set_state(ctx, VIVS_DE_SRC_ORIGIN, 
+        etna_set_state(ctx, VIVS_DE_SRC_ORIGIN,
                 VIVS_DE_SRC_ORIGIN_X(0) |
                 VIVS_DE_SRC_ORIGIN_Y(0));
-        etna_set_state(ctx, VIVS_DE_SRC_SIZE, 
+        etna_set_state(ctx, VIVS_DE_SRC_SIZE,
                 VIVS_DE_SRC_SIZE_X(src_width) |
                 VIVS_DE_SRC_SIZE_Y(src_height)
                 ); // source size is ignored
 
         /* Compute stretch factors */
-        etna_set_state(ctx, VIVS_DE_STRETCH_FACTOR_LOW, 
+        etna_set_state(ctx, VIVS_DE_STRETCH_FACTOR_LOW,
                 VIVS_DE_STRETCH_FACTOR_LOW_X(((src_width - 1) << 16) / (width - 1)));
         etna_set_state(ctx, VIVS_DE_STRETCH_FACTOR_HIGH,
                 VIVS_DE_STRETCH_FACTOR_HIGH_Y(((src_height - 1) << 16) / (height - 1)));
@@ -218,7 +218,7 @@ int main(int argc, char **argv)
         etna_set_state(ctx, VIVS_DE_DEST_ADDRESS, bmp->address);
         etna_set_state(ctx, VIVS_DE_DEST_STRIDE, width*4);
         etna_set_state(ctx, VIVS_DE_DEST_ROTATION_CONFIG, 0);
-        etna_set_state(ctx, VIVS_DE_DEST_CONFIG, 
+        etna_set_state(ctx, VIVS_DE_DEST_CONFIG,
                 VIVS_DE_DEST_CONFIG_FORMAT(DE_FORMAT_A8R8G8B8) |
                 VIVS_DE_DEST_CONFIG_COMMAND_HOR_FILTER_BLT |
                 VIVS_DE_DEST_CONFIG_SWIZZLE(DE_SWIZZLE_ARGB) |
@@ -226,15 +226,15 @@ int main(int argc, char **argv)
                 VIVS_DE_DEST_CONFIG_MINOR_TILED_DISABLE
                 // | VIVS_DE_DEST_CONFIG_GDI_STRE_ENABLE
                 );
-        etna_set_state(ctx, VIVS_DE_ROP, 
+        etna_set_state(ctx, VIVS_DE_ROP,
                 VIVS_DE_ROP_ROP_FG(0xcc) | VIVS_DE_ROP_ROP_BG(0xcc) | VIVS_DE_ROP_TYPE_ROP4);
         /* Clipping rectangle (probably not used in VR blit) */
-        etna_set_state(ctx, VIVS_DE_CLIP_TOP_LEFT, 
+        etna_set_state(ctx, VIVS_DE_CLIP_TOP_LEFT,
                 VIVS_DE_CLIP_TOP_LEFT_X(0) |
                 VIVS_DE_CLIP_TOP_LEFT_Y(0)
                 );
-        etna_set_state(ctx, VIVS_DE_CLIP_BOTTOM_RIGHT, 
-                VIVS_DE_CLIP_BOTTOM_RIGHT_X(width) | 
+        etna_set_state(ctx, VIVS_DE_CLIP_BOTTOM_RIGHT,
+                VIVS_DE_CLIP_BOTTOM_RIGHT_X(width) |
                 VIVS_DE_CLIP_BOTTOM_RIGHT_Y(height)
                 );
 
@@ -259,10 +259,10 @@ int main(int argc, char **argv)
         /* Program video rasterizer */
         etna_set_state(ctx, VIVS_DE_VR_CONFIG_EX, 0);
         etna_set_state(ctx, VIVS_DE_VR_SOURCE_IMAGE_LOW,
-                VIVS_DE_VR_SOURCE_IMAGE_LOW_LEFT(0) | 
+                VIVS_DE_VR_SOURCE_IMAGE_LOW_LEFT(0) |
                 VIVS_DE_VR_SOURCE_IMAGE_LOW_TOP(0));
         etna_set_state(ctx, VIVS_DE_VR_SOURCE_IMAGE_HIGH,
-                VIVS_DE_VR_SOURCE_IMAGE_HIGH_RIGHT(src_width) | 
+                VIVS_DE_VR_SOURCE_IMAGE_HIGH_RIGHT(src_width) |
                 VIVS_DE_VR_SOURCE_IMAGE_HIGH_BOTTOM(src_height));
 
         etna_set_state(ctx, VIVS_DE_VR_SOURCE_ORIGIN_LOW,
@@ -271,16 +271,16 @@ int main(int argc, char **argv)
                 VIVS_DE_VR_SOURCE_ORIGIN_HIGH_Y(0));
 
         etna_set_state(ctx, VIVS_DE_VR_TARGET_WINDOW_LOW,
-                VIVS_DE_VR_TARGET_WINDOW_LOW_LEFT(0) | 
+                VIVS_DE_VR_TARGET_WINDOW_LOW_LEFT(0) |
                 VIVS_DE_VR_TARGET_WINDOW_LOW_TOP(0));
         etna_set_state(ctx, VIVS_DE_VR_TARGET_WINDOW_HIGH,
-                VIVS_DE_VR_TARGET_WINDOW_HIGH_RIGHT(width) | 
+                VIVS_DE_VR_TARGET_WINDOW_HIGH_RIGHT(width) |
                 VIVS_DE_VR_TARGET_WINDOW_HIGH_BOTTOM(height));
 
         etna_set_state_multi(ctx, VIVS_DE_FILTER_KERNEL(0), FB_DWORD_COUNT, filter_kernel);
 
         /* Kick off VR */
-        etna_set_state(ctx, VIVS_DE_VR_CONFIG, 
+        etna_set_state(ctx, VIVS_DE_VR_CONFIG,
                 VIVS_DE_VR_CONFIG_START_HORIZONTAL_BLIT);
 
         etna_set_state(ctx, VIVS_GL_FLUSH_CACHE, VIVS_GL_FLUSH_CACHE_PE2D);
diff --git a/rnndb/cmdstream.xml b/rnndb/cmdstream.xml
index b93894a..f34118e 100644
--- a/rnndb/cmdstream.xml
+++ b/rnndb/cmdstream.xml
@@ -56,8 +56,8 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
 <domain name="VIVS"/>
 
 <domain name="VIV_FE">
-    <!-- A stripe for each command follows. 
-         The different commands are distinguished by the OP field. 
+    <!-- A stripe for each command follows.
+         The different commands are distinguished by the OP field.
      -->
     <stripe name="LOAD_STATE" varset="FE_OPCODE" variants="LOAD_STATE">
         <reg32 offset="0x00000" name="HEADER">
diff --git a/rnndb/common.xml b/rnndb/common.xml
index 5a3070c..c06a861 100644
--- a/rnndb/common.xml
+++ b/rnndb/common.xml
@@ -25,7 +25,7 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
 <!-- Shared enums and type definitions.
   -->
-    <enum name="ENABLE_DISABLE" inline="yes"> 
+    <enum name="ENABLE_DISABLE" inline="yes">
         <!-- Generic type for functionality that can be enabled/disabled, it can be somewhat clearer than a
              boolean (RNN default type for 1-bit values) in some cases.
           -->
diff --git a/rnndb/isa.xml b/rnndb/isa.xml
index 112b401..9f8a5f2 100644
--- a/rnndb/isa.xml
+++ b/rnndb/isa.xml
@@ -31,14 +31,14 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
  an instruction using the bitfields and instructions defined in this file.
  -->
 <enum name="INST_OPCODE" brief="Main opcode table">
-    <!-- TODO: match against TGSI instructions 
-         http://people.freedesktop.org/~csimpson/gallium-docs/tgsi.html 
+    <!-- TODO: match against TGSI instructions
+         http://people.freedesktop.org/~csimpson/gallium-docs/tgsi.html
 
          Overall the ISA seems to be based on DirectX shader assembly. This is pretty obvious in retrospect
          as Vivante started by marketing DirectX-compatible GPUs for playing desktop PC games.
 
          Restrictions:
-         - only one uniform can be read per instruction, however this single uniform can be used in 
+         - only one uniform can be read per instruction, however this single uniform can be used in
            multiple arguments.
             - when violating this restriction it will be as if the uniform in the last source register
               is broadcasted to all arguments that use an uniform.
@@ -88,7 +88,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             dst := src0.x * src1.x + src0.y * src1.y + src0.z * src1.z
 
-            Computes the component-wise dot product of the first three components between src0 and src1 and 
+            Computes the component-wise dot product of the first three components between src0 and src1 and
             broadcasts the results to all destination components in temporary register dst.
         </doc>
     </value>
@@ -96,7 +96,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             dst := src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src0.w * src1.w
 
-            Computes the component-wise dot product between src0 and src1 and broadcasts the results to all destination 
+            Computes the component-wise dot product between src0 and src1 and broadcasts the results to all destination
             components in temporary register dst.
         </doc>
     </value>
@@ -129,7 +129,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             dst := 1.0 / sqrt(src2)
 
-            Computes the reciprocal of the square root of src2, and puts the result into 
+            Computes the reciprocal of the square root of src2, and puts the result into
             temporary register dst.
         </doc>
     </value>
@@ -153,7 +153,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             This operation is performed per-component. It is used to implement MIN(a,b) and MAX(a,b)
             in the following way:
             - MIN(a,b): SELECT.GT dst, a, b, a   (b > a ? a : b)
-            - MAX(a,b): SELECT.LT dst, a, b, a   (b &lt; a ? a : b) 
+            - MAX(a,b): SELECT.LT dst, a, b, a   (b &lt; a ? a : b)
         </doc>
     </value>
     <value value="0x10" name="SET">
@@ -167,7 +167,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             dst := exp2(src2.x)
 
-            Sets temporary register dst to the 2-exponent of the x component of src2). 
+            Sets temporary register dst to the 2-exponent of the x component of src2).
             This is a scalar operation, the result is broadcasted over all active destination components.
         </doc>
     </value>
@@ -175,7 +175,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             dst := log2(src2.x)
 
-            Sets temporary register dst to the 2-log of the x component of src2). 
+            Sets temporary register dst to the 2-log of the x component of src2).
             This is a scalar operation, the result is broadcasted over all active destination components.
         </doc>
     </value>
@@ -183,8 +183,8 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             dst := frc(src2)
 
-            Sets temporary register dst to the fractional portion of src2 for positive values. For negative values, 
-            the returned value will be 1.0 - the fractional portion. For example, 1.5 will become 0.5, and -0.1 
+            Sets temporary register dst to the fractional portion of src2 for positive values. For negative values,
+            the returned value will be 1.0 - the fractional portion. For example, 1.5 will become 0.5, and -0.1
             will be turned into 0.9.
         </doc>
     </value>
@@ -202,7 +202,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             Discards (kills) the current fragment. Can only be used in PS.
 
-            No input or output arguments. 
+            No input or output arguments.
 
             XXX figure out if the COND bits are used. It appears not, as the blob generates BRANCH.XX based flow
             control around it.
@@ -234,7 +234,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>
             dst := sin(src2 * (PI/2))
 
-            Computes the sine of src2 and puts the result in temporary register dst. 
+            Computes the sine of src2 and puts the result in temporary register dst.
 
             The period of the sine is 4 and not 2 PI, thus to get normal behavior the instruction
             should be prefixed by a division by PI/2.
@@ -245,7 +245,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             dst := cos(src2 * (PI/2))
 
             Computes the cosine of src2 and puts the result in temporary register dst.
-            
+
             The period of the cosine is 4 and not 2 PI, thus to get normal behavior the instruction
             should be prefixed by a division by PI/2.
         </doc>
@@ -253,7 +253,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
     <value value="0x25" name="FLOOR" brief="Largest integral value not greater than the argument"> <!-- HAS_SIGN_FLOOR_CEIL -->
         <doc>
             dst := floor(src2)
-            
+
             Computes the largest integral value not greater than the argument, and puts the result in temporary
             register dst.
         </doc>
@@ -261,7 +261,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
     <value value="0x26" name="CEIL" brief="Smallest integral value not less than the argument"> <!-- HAS_SIGN_FLOOR_CEIL -->
         <doc>
             dst := ceil(src2)
-            
+
             Computes the smallest integral value not less than the argument, and puts the result in temporary
             register dst.
         </doc>
@@ -398,7 +398,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
     <bitfield high="2" low="0" name="SRC1_RGROUP" type="INST_RGROUP" brief="Source operand 1 register group"/>
     <!-- bits 7..21: instruction address, effectively takes the place of src2 operand -->
     <bitfield high="21" low="7" name="SRC2_IMM" brief="Immediate (address) operand"/>
-    <!-- operand 2 --> 
+    <!-- operand 2 -->
     <bitfield high="3" low="3" name="SRC2_USE" brief="Source operand 2 used"/>
     <bitfield high="12" low="4" name="SRC2_REG" brief="Source operand 2 register"/>
     <bitfield high="13" low="13" name="UNK3_13"/>
diff --git a/rnndb/state.xml b/rnndb/state.xml
index 3ba09ba..be19ec1 100644
--- a/rnndb/state.xml
+++ b/rnndb/state.xml
@@ -42,7 +42,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
          - type="fixedp": (N/2).(N/2) fixed-point value, where N is the number of bits of the bit field,
              so in case of a 32 bit field it will be a 16.16 FP value.
          - domains can be used as types (to signify a memory address in that domain)
-         - for registers: masked="yes", defaults to "no" to mark registers that support partial 
+         - for registers: masked="yes", defaults to "no" to mark registers that support partial
            state writes by using masks per bit group (mask for state group A must be called A_MASK)
 
      TODO: variants are somewhat different from NV; what features are supported is determined by (most common first)
@@ -51,7 +51,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
          3) Chip model (GC800, GC2000, ...)
          4) Chip revision of the form 0x1234
 
-         Generally the chip feature flags are used to distinguish functionality, as well as the specs, and not the model 
+         Generally the chip feature flags are used to distinguish functionality, as well as the specs, and not the model
          and revision. Unlike NV which parametrizes everything on the model, for GC this is done only for bugfixes.
 -->
 <domain name="VIVM" brief="GPU memory domain">
@@ -83,7 +83,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             takes care of parsing the command stream and loading states, as well as loading
             vertex streams.</doc>
         <reg32 offset="0x00600" name="VERTEX_ELEMENT_CONFIG" value="0x00000000" length="16" stride="4">
-            <!-- length="attrCount" where attrCount, the number of attributes, 
+            <!-- length="attrCount" where attrCount, the number of attributes,
                  is 16 if chipMinorFeatures1 bit 23 set, otherwise 12.
              -->
             <bitfield high="3" low="0" name="TYPE" brief="Data type">
@@ -278,7 +278,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         </reg32>
         <reg32 offset="0x03828" name="VARYING_COMPONENT_USE" brief="Varying component usage" value="0x00000000" length="2" stride="4">
             <doc>2 bits per varying component, 16 components per 32-bit word.</doc>
-            <bitfield high="1" low="0" name="COMP0" type="VARYING_COMPONENT_USE"/> 
+            <bitfield high="1" low="0" name="COMP0" type="VARYING_COMPONENT_USE"/>
             <bitfield high="3" low="2" name="COMP1" type="VARYING_COMPONENT_USE"/>
             <bitfield high="5" low="4" name="COMP2" type="VARYING_COMPONENT_USE"/>
             <bitfield high="7" low="6" name="COMP3" type="VARYING_COMPONENT_USE"/>
diff --git a/rnndb/state_2d.xml b/rnndb/state_2d.xml
index 25909ea..a22c478 100644
--- a/rnndb/state_2d.xml
+++ b/rnndb/state_2d.xml
@@ -185,7 +185,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
     <bitset name="2D_SRC_ORIGIN" inline="yes" brief="Source origin">
         <doc>
         Absolute or relative (determined by the SRC_RELATIVE field of the SRC_CONFIG register)
-        X and Y coordinates in pixels of the top left corner of the source rectangle within 
+        X and Y coordinates in pixels of the top left corner of the source rectangle within
         the source surface.
         </doc>
         <bitfield high="15" low="0" name="X"/>
@@ -370,7 +370,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             <doc>
             In mono expansion this register defines the source color if the mono pixel is 0.
             The color must be set in A8R8G8B8 format.
-            In color blits it defines the source transparency color and must be of the same 
+            In color blits it defines the source transparency color and must be of the same
             format as the source format.
             </doc>
         </reg32>
@@ -384,7 +384,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             <bitfield high="30" low="0" name="X" brief="Horizontal stretch factor in 15.16 fixed point format">
                 <doc>
                 The value is calculated using the following formula:
-                    
+
                     factor = ((srcWidth - 1) &lt;&lt; 16) / (dstWidth - 1)
 
                 Stretch blit uses only the integer part of the value while Filter blit uses all
@@ -396,7 +396,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             <bitfield high="30" low="0" name="Y" brief="Vertical stretch factor in 15.16 fixed point format">
                 <doc>
                 The value is calculated using the following formula:
-                    
+
                     factor = ((srcWidth - 1) &lt;&lt; 16) / (dstWidth - 1)
 
                 Stretch blit uses only the integer part of the value while Filter blit uses all
@@ -488,7 +488,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             <doc>
             Byte pattern mask used for clearing. PE1.0 uses a 8-byte pattern for clearing
             as specified in the CLEAR_PIXEL_VALUE_LOW and CLEAR_PIXEL_VALUE_HIGH registers.
-            Each of 8 bits refers to a byte. A value of 1 clears the respective byte, 
+            Each of 8 bits refers to a byte. A value of 1 clears the respective byte,
             and a value of 0 ignores it.
             </doc>
         </reg32>
diff --git a/rnndb/state_3d.xml b/rnndb/state_3d.xml
index ed6ed02..a3f59d1 100644
--- a/rnndb/state_3d.xml
+++ b/rnndb/state_3d.xml
@@ -82,8 +82,8 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <value value="5" name="X8R8G8B8"/>
         <value value="6" name="A8R8G8B8"/>
         <value value="7" name="YUY2"/><!-- only supported with YUY2_AVERAGING -->
-        <!-- 8..15 do show color and don't result in GPU crashes directly, also they don't 
-             seem to be direct aliases of 0..7, but it needs to be figured out if 
+        <!-- 8..15 do show color and don't result in GPU crashes directly, also they don't
+             seem to be direct aliases of 0..7, but it needs to be figured out if
              they're useful or just alternative names for formats in range 0..7
          -->
     </enum>
@@ -212,7 +212,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         </reg32>
         <reg32 offset="0x00810" name="OUTPUT" value="0x00000000" brief="Output routing" length="4" stride="4">
             <doc>
-                Each bitfield (up to 16 in total) contains a temporary register number that 
+                Each bitfield (up to 16 in total) contains a temporary register number that
                 is used as output at the end of the shader for that varying.
             </doc>
             <bitfield high="7" low="0" name="O0" type="uint"/>
@@ -222,7 +222,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         </reg32>
         <reg32 offset="0x00820" name="INPUT" value="0x00000000" brief="Input routing" length="4" stride="4">
             <doc>
-                Each bitfield (up to 16 in total) contains the number of a temporary register that 
+                Each bitfield (up to 16 in total) contains the number of a temporary register that
                 is assigned the input for that attribute at the beginning of shader execution.
             </doc>
             <bitfield high="7" low="0" name="I0" type="uint"/>
@@ -292,7 +292,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         </reg32>
         <reg32 offset="0x00924" name="UNK00924" value="0x00000000"/>
     </stripe>
-    
+
     <stripe name="PA" brief="Primitive assembly states">
         <doc>Primitive assembly assembles primitives (tris, quads, lines, points etc) from vertices for 3D rendering.
             Viewport scaling, line width and point size is configured here.</doc>
@@ -358,7 +358,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <reg32 offset="0x00A84" name="VIEWPORT_UNK00A84" value="0x00000000" type="float"/>
         <reg32 offset="0x00A8C" name="VIEWPORT_UNK00A8C" value="0x00000000" type="float"/>
     </stripe>
-   
+
     <stripe name="SE" brief="Setup Engine states">
         <doc>The setup engine takes care of scissor, clipping, and depth scale.</doc>
         <reg32 offset="0x00C00" name="SCISSOR_LEFT" value="0x00000000" type="float"/>
@@ -376,7 +376,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <reg32 offset="0x00C20" name="CLIP_RIGHT" value="0x00000000" type="float"/>
         <reg32 offset="0x00C24" name="CLIP_BOTTOM" value="0x00000000" type="float"/>
     </stripe>
-    
+
     <stripe name="RA" brief="Raster states">
         <doc>Configuration for the rasterizer. This mainly controls multisampling.</doc>
         <reg32 offset="0x00E00" name="CONTROL" value="0x00000001">
@@ -388,7 +388,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <reg32 offset="0x00E10" name="MULTISAMPLE_UNK00E10" value="0x00000000" length="4" stride="4"/>
         <reg32 offset="0x00E40" name="CENTROID_TABLE" value="0x00000000" length="16" stride="4"/>
     </stripe>
-  
+
     <stripe name="PS" brief="Pixel Shader states">
         <doc>The Pixel (Fragment) shader to use is configured here</doc>
         <reg32 offset="0x01000" name="END_PC" value="0x00000000"/>
@@ -417,9 +417,9 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             <!-- length="fragmentUniforms * 4" -->
         </reg32>
     </stripe>
-    
+
     <stripe name="PE" brief="Pixel Engine states">
-        <doc>The Pixel Engine takes care of writing pixels to the framebuffer, doing 
+        <doc>The Pixel Engine takes care of writing pixels to the framebuffer, doing
             blending, depth testing and alpha testing if needed.
 
             Some flags can be set either per group of bits, or all at once, by using masking flags.
@@ -560,7 +560,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             <bitfield high="31" low="16" name="UNK16"/>
         </reg32>
         <reg32 offset="0x014A4" name="LOGIC_OP" masked="yes" value="0x000E400C">
-            <!-- only supported if feature bit LOGIC_OP present. 
+            <!-- only supported if feature bit LOGIC_OP present.
                  Either a coincidence or due to shared roots in DirectX: same enum as Gallium3D PIPE_LOGICOP -->
             <bitfield high="3" low="0" name="OP" type="LOGIC_OP"/>
             <bitfield pos="4" name="OP_MASK"/>
@@ -569,8 +569,8 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <reg32 offset="0x014B0" name="UNK014B0" value="0x00000000"/>
         <reg32 offset="0x014B4" name="UNK014B4" value="0x00000000"/>
         <reg32 offset="0x01580" name="UNK01580" value="0x00000000" length="3" stride="4"/>
-    </stripe>     
-    
+    </stripe>
+
     <stripe name="CO" brief="Compose states">
         <doc>Hardware composer. This functionality is present on some GCxxxx chips and allows
             for blending surfaces together with Porter-Diff composition methods, to accelerate the
@@ -604,11 +604,11 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <array offset="0x03200" name="ADDR_UNK03200" length="8" stride="0x20">
             <reg32 offset="0" name="PPIPE" length="8"/> <!-- sampler*0x20 + pixelpipe*0x4 -->
         </array>
-    </stripe>     
-    
+    </stripe>
+
     <stripe name="RS" brief="Resolve states">
         <doc>To my current understanding, RESOLVE is a multifunctional copy/fill engine that can copy blocks of pixels from
-            one place in memory to another, actually clearing tiles that are marked as cleared in the process. 
+            one place in memory to another, actually clearing tiles that are marked as cleared in the process.
             Other capabilities are:
 
             - Conversion between pixel formats
@@ -665,7 +665,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         </reg32>
         <reg32 offset="0x01620" name="WINDOW_SIZE" value="0x00000000" brief="Size of area to resolve">
             <doc>
-                When downsampling the source and destination size will be different. 
+                When downsampling the source and destination size will be different.
                 In this case, the WINDOW_SIZE will be the (unscaled) source size.
             </doc>
             <bitfield high="31" low="16" name="HEIGHT" type="uint"/>
@@ -712,7 +712,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             </reg32>
         </stripe>
     </stripe>
-    
+
     <stripe name="TS" brief="Tile Status">
         <doc>
         Tile status block contains information about the tiles to be resolved.
@@ -838,7 +838,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
     </bitset>
     <stripe name="TE" brief="TExture sampler states">
         <doc>Texture sampling, filtering, LOD, etc</doc>
-        <stripe name="SAMPLER" length="12" stride="4"> 
+        <stripe name="SAMPLER" length="12" stride="4">
             <doc>8 fragment texture samplers, 4 vertex texture samplers</doc>
             <reg32 offset="0x2000" name="CONFIG0" value="0x00000000" type="TE_SAMPLER_CONFIG0"/>
             <reg32 offset="0x2040" name="SIZE" value="0x00000000" type="TE_SAMPLER_SIZE"/>
@@ -853,7 +853,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
             <reg32 offset="0x2400" name="LOD_ADDR" length="14" stride="0x40" type="VIVM"/> <!-- Base address, per LOD level -->
         </stripe>
     </stripe>
-    
+
     <stripe name="NTE" brief="New texture states">
         <doc>Extra texture states for newer hardware. These exist if chipMinorFeatures2 bit 11 set.</doc>
         <array offset="0x10000" name="SAMPLER" length="32" stride="4">
@@ -878,7 +878,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <reg32 offset="0x12000" name="UNK12000" value="0x00000000" length="256" stride="4"/> <!-- chipMinorFeatures2 bit 15 set -->
         <reg32 offset="0x12400" name="UNK12400" value="0x00000000" length="256" stride="4"/> <!-- chipMinorFeatures2 bit 15 set -->
     </stripe>
-    
+
     <stripe name="SH" brief="Unified shader instruction memory">
         <doc>Shader instruction memory on new hardware that supports more than 256, or more than
             1024 shader instructions (different areas are used based on these cases).</doc>
diff --git a/rnndb/state_hi.xml b/rnndb/state_hi.xml
index 4e9ef3e..338995f 100644
--- a/rnndb/state_hi.xml
+++ b/rnndb/state_hi.xml
@@ -29,7 +29,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
 <domain name="VIVS" brief="GPU state">
     <!-- Hardware description fields.
          For access in userspace these are also exposed over the ioctl interface through QUERY_CHIP_IDENTITY.
-     --> 
+     -->
 
     <stripe name="HI" brief="Host Interface">
         <doc>Host interface contains bits to identify the chip, control clock speed,
@@ -162,9 +162,9 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         <doc>New MMU unit. Only exists on more recent hardware.</doc>
         <reg32 offset="0x00180" name="SAFE_ADDRESS" brief="MMU safe address">
             <doc>
-            A 64-byte range starting from this address that will act as a 'safe' zone. 
-            Any address that would cause an exception is routed to this safe zone. 
-            Reads will happen and writes will go to this address, but with a write-enable 
+            A 64-byte range starting from this address that will act as a 'safe' zone.
+            Any address that would cause an exception is routed to this safe zone.
+            Reads will happen and writes will go to this address, but with a write-enable
             of 0. This register can only be programmed once after a reset. Any attempt to write
             to this register after the initial write-after-reset will be ignored.
             </doc>
author	Wladimir J. van der Laan <laanwj@gmail.com>	2013-08-23 18:03:14 +0200
committer	Wladimir J. van der Laan <laanwj@gmail.com>	2013-08-23 18:03:14 +0200
commit	c51729b9c122e6169103be1a0f0a133ba2bcbef6 (patch)
tree	f30c3c7002e35b8a6121e5ba00020a001a215162
parent	d9dcbafc88dd396d1e7e3b84c9ed37b4afdbc1aa (diff)