rename THREAD_WALKER to CL

also small documentation updates
author: Wladimir J. van der Laan <laanwj@gmail.com> 2013-03-04 19:52:55 +0100
committer: Wladimir J. van der Laan <laanwj@gmail.com> 2013-03-04 19:52:55 +0100
commit: 1a1cdc94d7b09b674e878a0b6a3479c0a5a54c7c (patch)
tree: e9023dc02a916247d678e76dcdf94ee35608526d
parent: 830d108382ed40d17749034ccdeeff17e7c931b3 (diff)
9 files changed, 165 insertions, 136 deletions
diff --git a/README.md b/README.md
index 56be4ff..6d38e0c 100644
--- a/README.md
+++ b/README.md
@@ -20,19 +20,19 @@ ARM-based:
 - OLPC (also Marvell Armada something with GC1000)
 - CuBox, including pro variant (Marvell Armada 510, GC600)
 - Many older tablets and such based on Rockchip 2918 SoC (GC800)
-- Devices based on Freescale i.MX6 Series (GC2000, GC320, GC355)
+- Devices based on Freescale i.MX6 Series (GC2000 + GC320 + GC355)
 
 MIPS-based:
 - Devices based on Ingenic JZ4770 MIPS SoC (GC860), such as the GCW zero, and JZ4760 (GC200, 2D only).
 
 See also [wikipedia](https://en.wikipedia.org/wiki/Vivante_Corporation).
 
-For the Vivante GPUs on some platforms the detailed features and specs are known, these can be found in `doc/gpus_comparison.html`.
+For the Vivante GPUs on many platforms the detailed features and specs are known, these can be found in `doc/gpus_comparison.html`.
 
 Contents
 ==========
 
-The repository contains different tools and documentation related to figuring out how to 
+The repository contains various tools and documentation related to figuring out how to 
 program Vivante GCxxx GPU chips.
 
 Framebuffer tests
@@ -44,15 +44,15 @@ Framebuffer tests
 ![mip_cube output](https://raw.github.com/laanwj/etna_viv/master/doc/images/mipmap.png)
 ![displacement output](https://raw.github.com/laanwj/etna_viv/master/doc/images/displacement.png)
 
-To execise the initial-stage driver there are a few framebuffer tests in:
+To exercise the initial-stage driver there are a few framebuffer tests in:
 
     native/fb/
 
-These do double-buffered animated rendering of 1000 frames to the framebuffer using 
-the proof-of-concept `etna` command stream building API. The goal of this API is to provide an low-level interface
-to the Vivante hardware while abstracting away kernel interface details.
+These demos do double-buffered animated rendering of 1000 frames to the framebuffer using 
+the proof-of-concept `etna` rendering and command stream building API. The goal of this API is to provide a Gallium-like 
+low-level interface to the Vivante hardware while abstracting away kernel interface details.
 
-- `companion_cube`: Animated rotating "weighted companion cube", using array or indexed rendering. Exercised in this demo:
+- `companion_cube`: Rotating "weighted companion cube", using array or indexed rendering. Exercised in this demo:
   - Array and indexed rendering of arbitrary mesh
   - Video memory allocation
   - Setting up render state
@@ -62,10 +62,6 @@ to the Vivante hardware while abstracting away kernel interface details.
   - Double-buffered rendering to framebuffer
   - MSAA (off / 2X / 4X)
 
-- `etna_test`: Full screen pixel shader with frame number passed in as uniform. Can be used as a visual shader sandbox.
-
-- `rotate_cube`: Rotating smoothed color cube
-
 - `mip_cube_state`: Rotating cube with a mipmapped texture loaded from a `dds` file provided on the command line. One 
   of the example textures have a different color and number on each mipmap level, to explicitly show interpolation 
   between mipmap levels as the surface 
@@ -76,7 +72,7 @@ to the Vivante hardware while abstracting away kernel interface details.
 
 - `alpha_blend`: Alpha blending quads
 
-- `cubemap_sphere`: Cube mapping textures
+- `cubemap_sphere`: Cubemap textures
 
 - `stencil_test`: Test stencil buffer handling
 
@@ -99,7 +95,10 @@ State map
 
 Map of documentation for known render state and registers. Mapped in rules-ng-ng (envytools) format:
 
-    rnndb/state.xml
+    rnndb/state.xml     Top-level database, global state
+    rnndb/state_hi.xml  Host interface registers
+    rnndb/state_2d.xml  2D engine state
+    rnndb/state_3d.xml  3D engine state
 
 Other scattered bits of documentation about the hardware and ISA can be found in `doc/hardware.md`.
 
@@ -123,18 +122,18 @@ Assembler and disassembler
 
 A basic disassembler for the shader instructions (to a custom format) can be found in the tools directory:
 
-    tools/disasm.py rnn/isa.xml <shader.bin>
+    tools/disasm.py <shader.bin>
 
 This can be used to disassemble shaders extracted using `dump_cmdstream.py --dump-shaders`.
 
 There is also an assembler, which accepts the same syntax that is produced by the disassembler:
 
-    tools/asm.py rnn/isa.xml <shader.asm> -o <shader.bin>
+    tools/asm.py <shader.asm> [-o <shader.bin>]
 
 Command stream format
 -----------------------
 
-Like many other GPUs, the primary means of programming the chip is through a command stream 
+Like other modern GPUs, the primary means of programming the chip is through a command stream 
 interpreted by a DMA engine. This "Front End" takes care of distributing state changes through
 the individual modules of the GPU, kicking off primitive rendering, synchronization, 
 and also supports basic flow control (branch, call, return).
@@ -148,9 +147,10 @@ The command stream format represented in rules-ng-ng XML format can be found her
 Command stream interception
 ----------------------------
 
-`viv_hook`: A library to intercept and log the traffic between libGAL (the Vivante user space blob) and the kernel
+A significant part of reverse engineering was done by intercepting command streams while running GL simple demos. 
+`viv_hook` is a library to intercept and log the traffic between libGAL (the Vivante user space blob) and the kernel
 driver / hardware.
-
+    
 This library uses ELF hooks to intercept only system calls such as ioctl and mmap coming from the driver, not from
 other parts of the application, unlike more crude hacks using `LD_PRELOAD`.
 
@@ -160,8 +160,6 @@ and flush buffers. This should even work for native android applications that fo
 The raw binary structures interchanged with the kernel are written to disk in a `.fdr` file, along 
 with updates to video memory, to be parsed by the accompanying command stream dumper and other tools.
 
-    native/egl/*.c
-
 Command stream dumper
 ----------------------
 
@@ -169,7 +167,7 @@ Other tools live in:
 
     tools/
 
-The most useful ones are:
+The most useful ones, aside from the assembler and disassembler mentioned before are:
 
 - `show_egl2_log.sh` (uses `dump_cmdstream.py`, you may have to adapt this script to use another structure definition json depending on your kernel interface)
 
@@ -209,7 +207,7 @@ The headers and implementation files for the Vivante GPL kernel drivers are also
 
     kernel_drivers/
 
-Three GPL kernel driver versions, `gc600_driver_dove`, `v2` and `v4`, are provided. They are useful in understanding the kernel 
+Four GPL kernel driver versions, `gc600_driver_dove`, `v2` and `v4` and `imx6`, are provided. They are useful in understanding the kernel 
 interface, and the hardware at a basic level.
 
 As open source drivers for the kernel are available, there are currently no plans to write a DRM/DRI kernel driver for Vivante.
@@ -245,10 +243,11 @@ different offsets for fields, different management of context, and so on). These
 - `dove_old`: Marvell Dove, older drivers (0.8.0.1998, 0.8.0.1123)
 - `arnova`: Android, Arnova 10B G3 tablet (RK2918)
 - `v2`: Various Android, for older chips (RK2918 etc)
+- `imx6`: Various Android, for newer chips (i.MX6 specific)
 - `v4`: Various Android, for newer chips (i.MX6 etc)
 
-If possible get the `gc_*.h` headers for your specific kernel version. If that's not possible, try to find which of the above is most similar,
-and adapt that.
+If possible get the `gc_*.h` headers for your specific kernel version. If that's not possible, try to find which of the above sets
+of headers is most similar, and adapt that.
 
 gc_abi.h
 ----------
@@ -262,9 +261,10 @@ setting of the environment variable `GCABI`:
 - `GCABI_HAS_CONTEXT`: `struct _gcsHAL_COMMIT` has `contextBuffer` field
 - `GCABI_HAS_STATE_DELTAS`: `struct _gcsHAL_COMMIT` has `delta` field
 
-It would be really nice to have an auto-detection of the Vivante kernel version, to prevent crashes and such from wrong
+It would be very useful to have an auto-detection of the Vivante kernel version, to prevent crashes and such from wrong
 interfaces. However, I don't currently know any way to do this. The kernel does check the size of the passed ioctl structure, however
-this guarantees nothing about the field offsets. There is `/proc/driver/gc` that in some cases contains a version number.
+this guarantees nothing about the field offsets. There is `/proc/driver/gc` that in some cases contains a version number. In
+very new drivers there is an ioctl call `gcvHAL_VERSION` that returns the major, minor and build version.
 
 Android
 ---------
@@ -287,14 +287,14 @@ environment variables, for example like this:
     #export GCABI="v4"
     export GCABI="arnova"
 
-To build the egl samples, you need to copy `libEGL_VIVANTE.so` `libGLESv2_VIVANTE.so` from the device `/system/lib/egl` to
-`native/lib/egl`. This is not needed if you just want to build the replay or etna tests, which do not rely in any way on the
-userspace blob.
+To build the egl samples (for command stream interception), you need to copy `libEGL_VIVANTE.so` `libGLESv2_VIVANTE.so` from 
+the device `/system/lib/egl` to `native/lib/egl`. This is not needed if you just want to build the `replay`, `etna` or `fb`
+tests, which do not rely in any way on the userspace blob.
 
 Linux
 -------
 
-For Linux ARM cross compile, create a script like this (example for CuBox) to set up the build environment. 
+For non-Android Linux ARM cross compile, create a script like this (example for CuBox) to set up the build environment. 
 Don't forget to also copy the EGL/GLES2/KDR headers from some place and put them in a directory `include` under the location
 where the script is installed, and get the `libEGL.so` and `libGLESv2.so` from the device into `lib`:
 
@@ -326,7 +326,7 @@ Etna_pipe is currently compatible with the following GC chips at least:
 - GC600
 - GC800
 
-GC2000 support is under way.
+GC2000 support is underway.
 
 The command stream on different device GCxxx variants will also likely be slightly different; the features bit system
 allows for a ton of slightly different chips. When porting it, look for:
diff --git a/doc/hardware.md b/doc/hardware.md
index d207774..72dea20 100644
--- a/doc/hardware.md
+++ b/doc/hardware.md
@@ -8,25 +8,25 @@ Major optional blocks: each of these can be present or not depending on the spec
 - 3D engine
 - VG engine
 
-Some SoCs have multiple GPU cores, and have distributed these blocks over the cores (I suppose for extra parallelism and/or
-granularity in power switching). For example the Marvell Armada 620 has a GC2000 with only the 3D engine as well 
-as a GC300 with only the 2D engine.
+Some SoCs have multiple GPU cores, and have distributed the blocks mentioned above over the cores (I suppose 
+for extra parallelism and/or granularity in power switching). For example the Marvell Armada 620 has a GC2000 
+with only the 3D engine as well as a GC300 with only the 2D engine. Similarly, the Freescale i.mx6 SoC has a
+GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
 
 Feature bits
 =================
 
 Which features are supported on a certain Vivante core is not only determined by the model number 
-(which AFAIK mainly determines the performance), but determined by a combination of feature bits:
+(which AFAIK mainly determines the performance), but specified by a combination of factors:
 
  1) Chip features and minor feature flags
  2) Chip specs (number of instructions, pipelines, ...)
  3) Chip model (GC800, GC2000, ...)
  4) Chip revision of the form 0x1234
 
-All of these are available in read-only registers on the hardware. On most cases it suffices to check the feature flags.
-
-Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (but 
-even these sometimes have their own feature bit).
+All of these are available in read-only registers on the hardware. On most cases it suffices to check the feature flags as
+Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (even these sometimes 
+have their own feature bit).
 
 For an overview of the feature bits see the enumerations in `state.xml`.
 
@@ -79,7 +79,7 @@ These abbreviations are used in `state.xml` for the stripes where appropriate.
 Operations
 -----------
 
-Modules are programmed and kicked off using state updates, queued through the FE. Except for 2D/3D primitive rendering,
+Modules are programmed and kicked off using state updates, queued through the FE. An exception is 2D and 3D primitive rendering,
 which is kicked off directly through a FE command.
 
 The GC320 technical manual [1] describes quite a few operations, but only for the 2D part (DE).
@@ -94,7 +94,7 @@ Thread walker = Rectangle walker? (seems to have to do with OpenCL)
 
 Connections 
 -------------
-Connections between the different module follow the OpenGL pipeline design [3].
+Connections between the different modules follow the OpenGL pipeline design [3].
 
 - FE2VS (FE-VS) fetch engine to vertex shader: attributes
 - RA2SH (RA-PS) rasterizer to shader engine: varyings
@@ -163,14 +163,14 @@ Synchronization
 There are various states related to synchronization, either between different modules in the GPU
 and the GPU and the CPU (through the FE).
 
-- `SEMAPHORE_TOKEN`
-- `STALL_TOKEN`
-- `STALL` command in command stream
+- State `GL.SEMAPHORE_TOKEN`
+- State `GL.STALL_TOKEN`
+- The `STALL` command in command stream
 
 The following sequence of states is common:
 
-    GLOBAL.SEMAPHORE_TOKEN := FROM=RA,TO=PE
-    GLOBAL.STALL_TOKEN := FROM=RA,TO=PE
+    GL.SEMAPHORE_TOKEN := FROM=RA,TO=PE
+    GL.STALL_TOKEN := FROM=RA,TO=PE
 
 The first state load arms the semaphore, the second one stalls the FROM module until the TO module has raised its semaphore. In 
 this example it stalls the rasterizer until the pixel engine has completed the commands up until now. 
@@ -185,13 +185,14 @@ Resolve
 -----------
 The resolve module is a copy and fill engine. It can copy blocks of pixels from one GPU address to another, 
 optionally tiling/detiling, converting between pixel formats, or scaling down by a factor of 2. The source and 
-destination address can be the same for fill in tiles that were not touched during the rendering process with the background color.
+destination address can be the same to fill in tiles that were not touched during the rendering process 
+(according to the Tile Status, see below) with the background color.
 
 Tile status (Fast clear)
 -------------------------
 A render target is divided in tiles, and every tile has a couple of status flags.
 
-An auxilary buffer for each render surface keeps track of tile status flags, allocated with `gcvSURF_TILE_STATUS`.
+An auxilary buffer associated with each render surface keeps track of these tile status flags, allocated with `gcvSURF_TILE_STATUS`.
 
 One of these flags is the `clear` flag, that signifies that the tile has been cleared.
 `fast clear` happens by setting the clear bit for each tile instead of clearing the actual surface
@@ -207,12 +208,12 @@ Shader ISA
 ================
 
 Vivante GPUs have a unified shader ISA, this means that vertex and pixel shaders share the same 
-instruction set. See `isa.xml` for details about the instructions, this section only provides a high-level overview.
+instruction set. See `isa.xml` and `isa.md` for details of the instructions, this section only provides a high-level overview.
 
-- One operation consists of 4 32-bit words. This have a fixed format, with bitfields with a fixed purpose
-which only differs very little per opcode. The actual instruction fields that are used (which operands) does differ per opcode.
+- Each instruction consists of 4 32-bit words. These have a fixed format, with bitfields 
+that have a meaning which differs only very little per opcode. Which of these fields is used (which operands) does differ per opcode.
 
-- Four-component SIMD processor
+- Four-component SIMD processor (for most of the instructions)
 
 - Older GPUs have floating point operations only, the newer ones have support for integer operations in the context of OpenCL. 
   The split is around GC1000, though this being Vivante there is likely some feature bit for it.
@@ -230,9 +231,10 @@ which only differs very little per opcode. The actual instruction fields that ar
      possibly others.
 
 - Registers:
-  - N four-component float temporary registers `tX` (actual number depends on the hardware, maximum seems to be at least 64, but like with other GPUs
-    using more registers will likely restrict the available paralellism)
-  - 1 four-component address register `a0`
+  - `N` four-component float temporary registers `tX` (actual number depends on the hardware, maximum seems to be 64 for all
+      vivante GPUs I've encountered up until now), but like with other GPUs using more registers will likely restrict 
+      the available paralellism)
+  - `1` four-component address register `a0`
 
 Temporary registers are also used for shader inputs (attributes, varyings) and outputs (colors, positions). They are set to
 the input values before the shader executes, and should have the output values when the shader ends. If the output
@@ -246,11 +248,11 @@ of the framebuffer using the `FBIOGET_VSCREENINFO` and `FBIOGET_FSCREENINFO` ioc
 This physical address can then directly be used as target address for a resolve operation, just like when copying
 to a normal bitmap.
 
-It *may* also be possible to use the physical address of the frame buffer directly for rendering, which would save a 
-copy operation, if the device supports rendering to a linear (non-tiled) buffer. However, as this prevents the use of (super) tiling,
-so in the end it may be slower. XXX this needs to be tested.
+Even though it would save a resolve operation it is not useful to use the physical address of the frame buffer 
+directly for rendering, as it only possible to render to tiled and supertiled surfaces, and (afaik) no display controller
+supports scan out from tiled formats.
 
-Usually, there is more framebuffer memory than that which is used for the current screen, which causes larger virtual resolution
+In many cases there is more framebuffer memory than that which is used for the current screen, which causes larger virtual resolution
 to be returned than the physical resolution. Double-buffering is achieved by changing the y-offset within that virtual frame buffer. 
 
 Operations
@@ -260,7 +262,7 @@ their operation.
 
 - RS: Kick off resolve by writing a value with bit 0 set to `RS_KICKER`. State used:
   - `RS_*`
-  - `TS_*` (if fast clear enabled through `TS_CONFIG`)
+  - `TS_*` (only when reading, if fast clear enabled through `TS_CONFIG`)
 
 - FE: Kick off 3D rendering by sending command `DRAW_PRIMITIVES` / `DRAW_INDEXED_PRIMITIVES`
   - `FE_*` (vertex element layout, vertex streams, index stream, ...)
@@ -287,11 +289,7 @@ Programming pecularities
 - The FE can convert from 16.16 fixed point format to 32 bit float. This is enabled by the `fixp` bit
   in the `LOAD_STATE` command. This is mostly useful for older ARM CPUs without native floating point
   support. The blob driver uses it for some states (viewport scaling, offset, scissor, ...)
-  but not others (uniforms etc). 
-
-  - Some of the states in states.xml are labeled as format "fixp" even though the FE does conversion and
-    their actual format is float, and they could be written as float as well when this is faster
-    from the driver perspective. This needs to be checked.
+  but not others (uniforms etc).
 
 - It is quite easy to hang the GPU when making a minor programming mistake. 
   When the GPU is stuck it is possible to submit command buffers, however nothing gets drawn and nothing 
@@ -304,25 +302,26 @@ Programming pecularities
   - Wrong length of shader
   - Texture sampling without properly setup texture units
   - `SE_SCISSOR`: setting SCISSOR bottom/right to `(x<<16)|5` instead of `(x<<16)-1` causes crashes for higher resolutions 
-    such as 1920x1080 on gc600. I don't know why, maybe some buffer or cache overflow. The rockchip vivante driver always uses |5 AFAIK.
+    such as 1920x1080 on GC600. I don't know why, maybe some buffer or cache overflow. The rockchip vivante driver always uses |5 AFAIK,
+    this offset appears to be different per specific chip/revision.
 
   This may be a (kernel) driver problem. It is possible to reset the GPU from user space with an ioctl, but 
   this usually is not enough to make it un-stuck. It would probably be a better solution to introduce a kernel-based timeout
-  instead of relying on userspace to be 100% correct.
+  instead of relying on userspace to be 100% correct (may exist on v4?).
 
 Masked state
 -------------
 
-Many groups of state bits, especially in the PE, have mask bits. These have been named `*_MASK`.
-When the mask bit belonging to a group of state bits is set on a state write, the accompanying
-state bits will be unaffected. If the mask bit is unset, the state bits will be written.
+Many groups of state bits, especially in the PE, have a mask bit. These have been named `*_MASK`.
+When the mask bit belonging to a group of state bits is *set* on a state write, the accompanying
+state bits will be unaffected. If the mask bit is *unset*, the state bits will be written.
 
-This allows setting state either per group of bits, or all at once. For example, it allows setting only
+This allows setting state per group of bits. For example, it allows setting only
 the destination alpha function (`ALPHA_CONFIG.DST_FUNC_ALPHA`) without affecting the 
 other bits in that state word.
 
-If masking functionality is not desired, as it is often practical to simply write all bits at once, simply keep all the `_MASK`
-bits at zero.
+If masking functionality is not desired, simply keep all the `_MASK` bits at zero and write all 
+bits at once. This is what I used in `etna_pipe`, as I keep track of all state myself.
 
 Texture tiling
 ----------------
@@ -373,12 +372,13 @@ However, as the name 'supertiled' implies, the tiles themselves are also tiled,
     196 197  204 205  212 213  220 221  228 229  236 237  244 245  252 253
     198 199  206 207  214 215  222 223  230 231  238 239  246 247  254 255
 
-In total this results in size 64x64 tiles.
+This has some similarity to a http://en.wikipedia.org/wiki/Z-order_curve or other space-filling curve,
+but is only nested one level, in total this results in 64x64 sized tiles.
 
 The GPU can render to normal tiled surfaces (such as used by textures) as well as supertiled surfaces. However,
-rendering to supertiled surfaces is likely faster.
+rendering to supertiled surfaces is likely faster due to better cache locality.
 
-Stride, as used for resolve operations, is for a row of tiles; 0x1c00 for width 448 (originally 400), 
+Stride, as used for resolve operations, is for a row of tiles not a row of pixels; 0x1c00 for width 448 (originally 400), 
 0x3400 for width 832 (originally 800).
 
 Multisampling
@@ -386,7 +386,7 @@ Multisampling
 
 GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing may reveal some of the inner workings.
 
-- 256x256 target with 0 samples creates a 256x256 render target (duh)
+- 256x256 target with 1 sample creates a 256x256 render target (duh)
 
         GL.MULTI_SAMPLE_CONFIG := MSAA_SAMPLES=NONE,MSAA_ENABLES=0xf,UNK12=0x0,UNK16=0x0
         PE.COLOR_STRIDE := 0x400
@@ -429,7 +429,7 @@ GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing ma
 Other differences when MSAA is enabled:
 
 - `TS.MEM_CONFIG` is different when MSAA is used (see descriptions for fields `MSAA` and `MSAA_FORMAT`). 
-- The TS surface belonging to the enlarged in the same way; just like there simply is a bigger render target.
+- The TS surface belonging to the enlarged in the same way; it is treated as if there simply is a bigger render target.
 - It also looks like the PS gets an extra input/temporary when MSAA is enabled:
 
         -0x00001f02, /*   PS.INPUT_COUNT := COUNT=2,COUNT2=31 */
@@ -438,7 +438,7 @@ Other differences when MSAA is enabled:
         +0x00000003, /*   PS.TEMP_REGISTER_CONTROL := NUM_TEMPS=3 */
 
 Haven't yet checked what the value is that is passed in (XXX todo). The shader code itself is unaffected the same so the extra
-input is likely added to the end.
+input is added to the end.
 
 - When resolving the supersampled surface to another (normal pixmap) surface, flag `SOURCE_MSAA` must be configured appropriately to
   un-subsample the surface. `WINDOW_SIZE` for this resolve is the *doubled* window size as above, so 512x512 for a 256x256 render
@@ -459,7 +459,8 @@ When rendering points (PRIMITIVE_TYPE_POINTS) there are some differences:
   The VS output associated to this varying in `VS_OUTPUT` is discarded, so can be set 
   to any output register.
 
-- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!).
+- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!). If point 
+  size per vertex is not set, the value in `PA.POINT_SIZE` is used.
 
 - Distinction between sprite coordinate origin `UPPER_LEFT` / `LOWER_LEFT` is implemented by adding 
   a 1.0-y instruction when glPointCoord is used. XXX figure out what is the default.
@@ -471,22 +472,21 @@ Vertex samplers live in the same space as fragment samplers. The blob uses a fix
 sampler 0..7 are used as fragment samplers and 8..11 are used as vertex samplers. 
 
 The shaders themselves refer to the absolute shader number; so tex8 is the first texture unit used in a
-vertex shader.
+vertex shader. The normal TEX instruction can be used to sample textures from a vertex shader.
 
-It appears that Vivante hw has two texture caches that need to be
-flushed separately, one for fragment shaders one for vertex shaders 
-(GL.FLUSH_CACHE.TEXTURE and GL.FLUSH_CACHE.TEXTUREVS respectively).
+Vivante hw has two texture caches that need to be flushed separately, one for fragment shaders 
+one for vertex shaders (bits `GL.FLUSH_CACHE.TEXTURE` and `GL.FLUSH_CACHE.TEXTUREVS` respectively).
 
-This solves a problem with running cubemap_sphere after displacement demo;
+This solves a problem with running `cubemap_sphere` after `displacement` demo;
 it seemed that some leftover cache from using a texture in displacement
-caused the texture in cubemap_sphere (which is only 1x1x6) to be messed
-up.
+caused the texture in `cubemap_sphere` (which is only 1x1x6) to be messed
+up (due to containing old values).
+
+All texture filtering options are allowed for vertex texture fetch.
 
 XXX maybe figure out if the sampler units are shared between fragment and vertex shaders and thus interchangeable. This is 
   not important for GL/Gallium because it already lives with the assumption that vertex and fragment shaders
   are distinct.
 
-XXX figure out which texture filtering options are allowed for vertex texture fetch
-
-XXX figure out wether normal texture2D works or that texture2DLod is needed
+XXX figure out wether normal texture2D works or that texture2DLod is needed.
 
diff --git a/doc/kernel_interface.md b/doc/kernel_interface.md
index 8ed2b12..4121843 100644
--- a/doc/kernel_interface.md
+++ b/doc/kernel_interface.md
@@ -55,7 +55,9 @@ At startup, the application connects to galcore device using `open` with the dev
 - `/dev/galcore`, or
 - `/dev/graphics/galcore`
 
-Immediately after connecting the entire chunk of contiguous memory, after requesting its address and size, is mapped into user space using `mmap`.
+After connecting to the device the entire chunk of contiguous memory, after requesting its address and size, 
+is mapped into user space using `mmap`. The kernel will return addresses in this range when the user space driver allocates
+contiguous (unified) memory used for communication with the GPU. 
 
 Ioctl
 -------
@@ -92,8 +94,8 @@ uses the fields in `interface->u.AllocateLinearVideoMemory` to pass in the numbe
 also to pass out the number of bytes actually allocated. 
 
 What is curious about the ioctl protocol is that the communication structures contains fields that are not 
-used by the kernel at all, but only in user-space. There is no good reason why these values would need 
-to be present in kernel-facing structures at all. The line is blurry sometimes.
+used by the kernel at all. There is no good reason why these values would need 
+to be present in kernel-facing structures. The line is blurry sometimes.
 It also appears that the structure has been designed with platform-independence in mind, and so some of the fields are not used in the Linux
 drivers such as `status`, `handle`, `pid`.
 
diff --git a/native/include/etna/common.xml.h b/native/include/etna/common.xml.h
index 67c6ad9..a05a61b 100644
--- a/native/include/etna/common.xml.h
+++ b/native/include/etna/common.xml.h
@@ -8,11 +8,11 @@ http://0x04.net/cgit/index.cgi/rules-ng-ng
 git clone git://0x04.net/rules-ng-ng
 
 The rules-ng-ng source files this header was generated from are:
-- /home/orion/projects/etna_viv/rnndb/state.xml    (  15002 bytes, from 2013-02-23 18:59:36)
+- /home/orion/projects/etna_viv/rnndb/state.xml    (  15001 bytes, from 2013-03-03 11:55:38)
 - /home/orion/projects/etna_viv/rnndb/common.xml   (  15985 bytes, from 2013-02-10 12:55:03)
 - /home/orion/projects/etna_viv/rnndb/state_hi.xml (  11825 bytes, from 2013-02-01 10:14:08)
 - /home/orion/projects/etna_viv/rnndb/state_2d.xml (  30577 bytes, from 2013-02-01 10:02:25)
-- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  49672 bytes, from 2013-02-20 18:04:17)
+- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  50943 bytes, from 2013-03-04 16:52:52)
 
 Copyright (C) 2013
 */
diff --git a/native/include/etna/state.xml.h b/native/include/etna/state.xml.h
index 8d52114..073f912 100644
--- a/native/include/etna/state.xml.h
+++ b/native/include/etna/state.xml.h
@@ -8,11 +8,11 @@ http://0x04.net/cgit/index.cgi/rules-ng-ng
 git clone git://0x04.net/rules-ng-ng
 
 The rules-ng-ng source files this header was generated from are:
-- /home/orion/projects/etna_viv/rnndb/state.xml    (  15002 bytes, from 2013-02-23 18:59:36)
+- /home/orion/projects/etna_viv/rnndb/state.xml    (  15001 bytes, from 2013-03-03 11:55:38)
 - /home/orion/projects/etna_viv/rnndb/common.xml   (  15985 bytes, from 2013-02-10 12:55:03)
 - /home/orion/projects/etna_viv/rnndb/state_hi.xml (  11825 bytes, from 2013-02-01 10:14:08)
 - /home/orion/projects/etna_viv/rnndb/state_2d.xml (  30577 bytes, from 2013-02-01 10:02:25)
-- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  49672 bytes, from 2013-02-20 18:04:17)
+- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  50943 bytes, from 2013-03-04 16:52:52)
 
 Copyright (C) 2013
 */
diff --git a/native/include/etna/state_2d.xml.h b/native/include/etna/state_2d.xml.h
index 1d8cf7e..be887e8 100644
--- a/native/include/etna/state_2d.xml.h
+++ b/native/include/etna/state_2d.xml.h
@@ -8,11 +8,11 @@ http://0x04.net/cgit/index.cgi/rules-ng-ng
 git clone git://0x04.net/rules-ng-ng
 
 The rules-ng-ng source files this header was generated from are:
-- /home/orion/projects/etna_viv/rnndb/state.xml    (  15002 bytes, from 2013-02-23 18:59:36)
+- /home/orion/projects/etna_viv/rnndb/state.xml    (  15001 bytes, from 2013-03-03 11:55:38)
 - /home/orion/projects/etna_viv/rnndb/common.xml   (  15985 bytes, from 2013-02-10 12:55:03)
 - /home/orion/projects/etna_viv/rnndb/state_hi.xml (  11825 bytes, from 2013-02-01 10:14:08)
 - /home/orion/projects/etna_viv/rnndb/state_2d.xml (  30577 bytes, from 2013-02-01 10:02:25)
-- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  49672 bytes, from 2013-02-20 18:04:17)
+- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  50943 bytes, from 2013-03-04 16:52:52)
 
 Copyright (C) 2013
 */
diff --git a/native/include/etna/state_3d.xml.h b/native/include/etna/state_3d.xml.h
index a1e849c..0403990 100644
--- a/native/include/etna/state_3d.xml.h
+++ b/native/include/etna/state_3d.xml.h
@@ -8,11 +8,11 @@ http://0x04.net/cgit/index.cgi/rules-ng-ng
 git clone git://0x04.net/rules-ng-ng
 
 The rules-ng-ng source files this header was generated from are:
-- /home/orion/projects/etna_viv/rnndb/state.xml    (  15002 bytes, from 2013-02-23 18:59:36)
+- /home/orion/projects/etna_viv/rnndb/state.xml    (  15001 bytes, from 2013-03-03 11:55:38)
 - /home/orion/projects/etna_viv/rnndb/common.xml   (  15985 bytes, from 2013-02-10 12:55:03)
 - /home/orion/projects/etna_viv/rnndb/state_hi.xml (  11825 bytes, from 2013-02-01 10:14:08)
 - /home/orion/projects/etna_viv/rnndb/state_2d.xml (  30577 bytes, from 2013-02-01 10:02:25)
-- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  49672 bytes, from 2013-02-20 18:04:17)
+- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  50943 bytes, from 2013-03-04 16:52:52)
 
 Copyright (C) 2013
 */
@@ -202,6 +202,16 @@ Copyright (C) 2013
 
 #define VIVS_VS_UNK00858					0x00000858
 
+#define VIVS_VS_RANGE						0x0000085c
+#define VIVS_VS_RANGE_LOW__MASK					0x0000ffff
+#define VIVS_VS_RANGE_LOW__SHIFT				0
+#define VIVS_VS_RANGE_LOW(x)					(((x) << VIVS_VS_RANGE_LOW__SHIFT) & VIVS_VS_RANGE_LOW__MASK)
+#define VIVS_VS_RANGE_HIGH__MASK				0xffff0000
+#define VIVS_VS_RANGE_HIGH__SHIFT				16
+#define VIVS_VS_RANGE_HIGH(x)					(((x) << VIVS_VS_RANGE_HIGH__SHIFT) & VIVS_VS_RANGE_HIGH__MASK)
+
+#define VIVS_VS_NEW_UNK00860					0x00000860
+
 #define VIVS_VS_INST_MEM(i0)				       (0x00004000 + 0x4*(i0))
 #define VIVS_VS_INST_MEM__ESIZE					0x00000004
 #define VIVS_VS_INST_MEM__LEN					0x00000400
@@ -210,27 +220,27 @@ Copyright (C) 2013
 #define VIVS_VS_UNIFORMS__ESIZE					0x00000004
 #define VIVS_VS_UNIFORMS__LEN					0x00000400
 
-#define VIVS_THREAD_WALKER					0x00000000
+#define VIVS_CL							0x00000000
 
-#define VIVS_THREAD_WALKER_UNK00900				0x00000900
+#define VIVS_CL_UNK00900					0x00000900
 
-#define VIVS_THREAD_WALKER_UNK00904				0x00000904
+#define VIVS_CL_UNK00904					0x00000904
 
-#define VIVS_THREAD_WALKER_UNK00908				0x00000908
+#define VIVS_CL_UNK00908					0x00000908
 
-#define VIVS_THREAD_WALKER_UNK0090C				0x0000090c
+#define VIVS_CL_UNK0090C					0x0000090c
 
-#define VIVS_THREAD_WALKER_UNK00910				0x00000910
+#define VIVS_CL_UNK00910					0x00000910
 
-#define VIVS_THREAD_WALKER_UNK00914				0x00000914
+#define VIVS_CL_UNK00914					0x00000914
 
-#define VIVS_THREAD_WALKER_UNK00918				0x00000918
+#define VIVS_CL_UNK00918					0x00000918
 
-#define VIVS_THREAD_WALKER_UNK0091C				0x0000091c
+#define VIVS_CL_UNK0091C					0x0000091c
 
-#define VIVS_THREAD_WALKER_KICKER				0x00000920
+#define VIVS_CL_KICKER						0x00000920
 
-#define VIVS_THREAD_WALKER_UNK00924				0x00000924
+#define VIVS_CL_UNK00924					0x00000924
 
 #define VIVS_PA							0x00000000
 
@@ -307,6 +317,8 @@ Copyright (C) 2013
 
 #define VIVS_PA_VIEWPORT_UNK00A84				0x00000a84
 
+#define VIVS_PA_VIEWPORT_UNK00A8C				0x00000a8c
+
 #define VIVS_SE							0x00000000
 
 #define VIVS_SE_SCISSOR_LEFT					0x00000c00
@@ -375,6 +387,14 @@ Copyright (C) 2013
 
 #define VIVS_PS_START_PC					0x00001018
 
+#define VIVS_PS_RANGE						0x0000101c
+#define VIVS_PS_RANGE_LOW__MASK					0x0000ffff
+#define VIVS_PS_RANGE_LOW__SHIFT				0
+#define VIVS_PS_RANGE_LOW(x)					(((x) << VIVS_PS_RANGE_LOW__SHIFT) & VIVS_PS_RANGE_LOW__MASK)
+#define VIVS_PS_RANGE_HIGH__MASK				0xffff0000
+#define VIVS_PS_RANGE_HIGH__SHIFT				16
+#define VIVS_PS_RANGE_HIGH(x)					(((x) << VIVS_PS_RANGE_HIGH__SHIFT) & VIVS_PS_RANGE_HIGH__MASK)
+
 #define VIVS_PS_INST_MEM(i0)				       (0x00006000 + 0x4*(i0))
 #define VIVS_PS_INST_MEM__ESIZE					0x00000004
 #define VIVS_PS_INST_MEM__LEN					0x00000400
@@ -559,17 +579,17 @@ Copyright (C) 2013
 
 #define VIVS_PE_UNK0145C					0x0000145c
 
-#define VIVS_PE_UNK01460(i0)				       (0x00000000 + 0x4*(i0))
-#define VIVS_PE_UNK01460__ESIZE					0x00000004
-#define VIVS_PE_UNK01460__LEN					0x00000008
+#define VIVS_PE_PIPE(i0)				       (0x00000000 + 0x4*(i0))
+#define VIVS_PE_PIPE__ESIZE					0x00000004
+#define VIVS_PE_PIPE__LEN					0x00000008
 
-#define VIVS_PE_UNK01460_ADDR_UNK01460(i0)		       (0x00001460 + 0x4*(i0))
+#define VIVS_PE_PIPE_COLOR_ADDR(i0)			       (0x00001460 + 0x4*(i0))
 
-#define VIVS_PE_UNK01460_ADDR_UNK01480(i0)		       (0x00001480 + 0x4*(i0))
+#define VIVS_PE_PIPE_DEPTH_ADDR(i0)			       (0x00001480 + 0x4*(i0))
 
-#define VIVS_PE_UNK01460_ADDR_UNK01500(i0)		       (0x00001500 + 0x4*(i0))
+#define VIVS_PE_PIPE_ADDR_UNK01500(i0)			       (0x00001500 + 0x4*(i0))
 
-#define VIVS_PE_UNK01460_ADDR_UNK01520(i0)		       (0x00001520 + 0x4*(i0))
+#define VIVS_PE_PIPE_ADDR_UNK01520(i0)			       (0x00001520 + 0x4*(i0))
 
 #define VIVS_PE_STENCIL_CONFIG_EXT				0x000014a0
 #define VIVS_PE_STENCIL_CONFIG_EXT_REF_BACK__MASK		0x000000ff
@@ -685,6 +705,7 @@ Copyright (C) 2013
 #define VIVS_RS_SOURCE_STRIDE_STRIDE__MASK			0x0003ffff
 #define VIVS_RS_SOURCE_STRIDE_STRIDE__SHIFT			0
 #define VIVS_RS_SOURCE_STRIDE_STRIDE(x)				(((x) << VIVS_RS_SOURCE_STRIDE_STRIDE__SHIFT) & VIVS_RS_SOURCE_STRIDE_STRIDE__MASK)
+#define VIVS_RS_SOURCE_STRIDE_MULTI				0x40000000
 #define VIVS_RS_SOURCE_STRIDE_TILING				0x80000000
 
 #define VIVS_RS_DEST_ADDR					0x00001610
@@ -693,6 +714,7 @@ Copyright (C) 2013
 #define VIVS_RS_DEST_STRIDE_STRIDE__MASK			0x0003ffff
 #define VIVS_RS_DEST_STRIDE_STRIDE__SHIFT			0
 #define VIVS_RS_DEST_STRIDE_STRIDE(x)				(((x) << VIVS_RS_DEST_STRIDE_STRIDE__SHIFT) & VIVS_RS_DEST_STRIDE_STRIDE__MASK)
+#define VIVS_RS_DEST_STRIDE_MULTI				0x40000000
 #define VIVS_RS_DEST_STRIDE_TILING				0x80000000
 
 #define VIVS_RS_WINDOW_SIZE					0x00001620
@@ -735,15 +757,21 @@ Copyright (C) 2013
 
 #define VIVS_RS_UNK016B4					0x000016b4
 
-#define VIVS_RS_UNK016C0(i0)				       (0x00000000 + 0x4*(i0))
-#define VIVS_RS_UNK016C0__ESIZE					0x00000004
-#define VIVS_RS_UNK016C0__LEN					0x00000008
+#define VIVS_RS_PIPE(i0)				       (0x00000000 + 0x4*(i0))
+#define VIVS_RS_PIPE__ESIZE					0x00000004
+#define VIVS_RS_PIPE__LEN					0x00000008
 
-#define VIVS_RS_UNK016C0_UNK016C0(i0)			       (0x000016c0 + 0x4*(i0))
+#define VIVS_RS_PIPE_SOURCE_ADDR(i0)			       (0x000016c0 + 0x4*(i0))
 
-#define VIVS_RS_UNK016C0_UNK016E0(i0)			       (0x000016e0 + 0x4*(i0))
+#define VIVS_RS_PIPE_DEST_ADDR(i0)			       (0x000016e0 + 0x4*(i0))
 
-#define VIVS_RS_UNK016C0_UNK01700(i0)			       (0x00001700 + 0x4*(i0))
+#define VIVS_RS_PIPE_OFFSET(i0)				       (0x00001700 + 0x4*(i0))
+#define VIVS_RS_PIPE_OFFSET_X__MASK				0x0000ffff
+#define VIVS_RS_PIPE_OFFSET_X__SHIFT				0
+#define VIVS_RS_PIPE_OFFSET_X(x)				(((x) << VIVS_RS_PIPE_OFFSET_X__SHIFT) & VIVS_RS_PIPE_OFFSET_X__MASK)
+#define VIVS_RS_PIPE_OFFSET_Y__MASK				0xffff0000
+#define VIVS_RS_PIPE_OFFSET_Y__SHIFT				16
+#define VIVS_RS_PIPE_OFFSET_Y(x)				(((x) << VIVS_RS_PIPE_OFFSET_Y__SHIFT) & VIVS_RS_PIPE_OFFSET_Y__MASK)
 
 #define VIVS_TS							0x00000000
 
@@ -762,6 +790,9 @@ Copyright (C) 2013
 #define VIVS_TS_MEM_CONFIG_MSAA_FORMAT_R5G6B5			0x00000200
 #define VIVS_TS_MEM_CONFIG_MSAA_FORMAT_A8R8G8B8			0x00000300
 #define VIVS_TS_MEM_CONFIG_MSAA_FORMAT_X8R8G8B8			0x00000400
+#define VIVS_TS_MEM_CONFIG_HDEPTH__MASK				0x00003000
+#define VIVS_TS_MEM_CONFIG_HDEPTH__SHIFT			12
+#define VIVS_TS_MEM_CONFIG_HDEPTH(x)				(((x) << VIVS_TS_MEM_CONFIG_HDEPTH__SHIFT) & VIVS_TS_MEM_CONFIG_HDEPTH__MASK)
 
 #define VIVS_TS_COLOR_STATUS_BASE				0x00001658
 
@@ -779,10 +810,12 @@ Copyright (C) 2013
 
 #define VIVS_TS_DEPTH_AUTO_DISABLE_COUNT			0x00001674
 
-#define VIVS_TS_HDEPTH_BASE					0x000016a4
+#define VIVS_TS_HDEPTH_STATUS_BASE				0x000016a4
 
 #define VIVS_TS_HDEPTH_CLEAR_VALUE				0x000016a8
 
+#define VIVS_TS_HDEPTH_SIZE					0x000016ac
+
 #define VIVS_TS_SAMPLER(i0)				       (0x00000000 + 0x4*(i0))
 #define VIVS_TS_SAMPLER__ESIZE					0x00000004
 #define VIVS_TS_SAMPLER__LEN					0x00000008
@@ -1011,12 +1044,6 @@ Copyright (C) 2013
 
 #define VIVS_SH							0x00000000
 
-#define VIVS_SH_UNK0085C					0x0000085c
-
-#define VIVS_SH_UNK00860					0x00000860
-
-#define VIVS_SH_UNK0101C					0x0000101c
-
 #define VIVS_SH_UNK20000(i0)				       (0x00020000 + 0x4*(i0))
 #define VIVS_SH_UNK20000__ESIZE					0x00000004
 #define VIVS_SH_UNK20000__LEN					0x00002000
diff --git a/native/include/etna/state_hi.xml.h b/native/include/etna/state_hi.xml.h
index 66b98fe..ea7aed6 100644
--- a/native/include/etna/state_hi.xml.h
+++ b/native/include/etna/state_hi.xml.h
@@ -8,11 +8,11 @@ http://0x04.net/cgit/index.cgi/rules-ng-ng
 git clone git://0x04.net/rules-ng-ng
 
 The rules-ng-ng source files this header was generated from are:
-- /home/orion/projects/etna_viv/rnndb/state.xml    (  15002 bytes, from 2013-02-23 18:59:36)
+- /home/orion/projects/etna_viv/rnndb/state.xml    (  15001 bytes, from 2013-03-03 11:55:38)
 - /home/orion/projects/etna_viv/rnndb/common.xml   (  15985 bytes, from 2013-02-10 12:55:03)
 - /home/orion/projects/etna_viv/rnndb/state_hi.xml (  11825 bytes, from 2013-02-01 10:14:08)
 - /home/orion/projects/etna_viv/rnndb/state_2d.xml (  30577 bytes, from 2013-02-01 10:02:25)
-- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  49672 bytes, from 2013-02-20 18:04:17)
+- /home/orion/projects/etna_viv/rnndb/state_3d.xml (  50943 bytes, from 2013-03-04 16:52:52)
 
 Copyright (C) 2013
 */
diff --git a/rnndb/state_3d.xml b/rnndb/state_3d.xml
index 4a2f604..b6550ea 100644
--- a/rnndb/state_3d.xml
+++ b/rnndb/state_3d.xml
@@ -247,7 +247,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ rules-ng.xsd">
         </reg32>
     </stripe>
 
-    <stripe name="THREAD_WALKER" brief="Thread walker states">
+    <stripe name="CL" brief="Thread walker states">
         <doc>The thread walker drives shaders in a predefined grid for GPGPU computing (OpenCL).
         These states are not used for normal rendering.</doc>
         <reg32 offset="0x00900" name="UNK00900" value="0x00000000"/>
author	Wladimir J. van der Laan <laanwj@gmail.com>	2013-03-04 19:52:55 +0100
committer	Wladimir J. van der Laan <laanwj@gmail.com>	2013-03-04 19:52:55 +0100
commit	1a1cdc94d7b09b674e878a0b6a3479c0a5a54c7c (patch)
tree	e9023dc02a916247d678e76dcdf94ee35608526d
parent	830d108382ed40d17749034ccdeeff17e7c931b3 (diff)