initial commit

author: Wladimir J. van der Laan <laanwj@gmail.com> 2012-12-31 20:40:34 +0100
committer: Wladimir J. van der Laan <laanwj@gmail.com> 2012-12-31 21:35:11 +0100
commit: 9d07cb1b2cfee55390de0194db9052530a73a0c2 (patch)
tree: db94cfbe4cc117dbb523d34525400994b9699e5d /doc/kernel_interface.md
1 files changed, 200 insertions, 0 deletions
diff --git a/doc/kernel_interface.md b/doc/kernel_interface.md
new file mode 100644
index 0000000..6952e37
--- /dev/null
+++ b/doc/kernel_interface.md
@@ -0,0 +1,200 @@
+Devices
+=======================
+
+At startup, the application connects to galcore device using open with the device
+
+- `/dev/galcore`, or
+- `/dev/graphics/galcore`
+
+Then it figures out what the base and size of the contigous memory is using the "query video memory" call.
+
+After that it mmaps the complete contiguous memory (128MB in my case) from device into memory space.
+
+Allocations
+------------
+Two types of memory are allocated:
+
+Contiguous memory
+  Used for command buffers
+  Reserved system memory that is contiguous (not fragmented by MMU) and mapped into GPU memory
+  It looks like the blob driver also allocates a signal for each contigous memory block, how does this get used?
+
+Linear video memory
+  Used for render targets, textures, surfaces, vertex buffers, bitmaps (see `gcvSURF_*`) ...
+
+  Device memory, from one of the pools (default, local, unified or contiguous system memory)
+  The available pools depend on the hardware; many of the devices have no local memory, and simply 
+  use a part of system memory as video memory.
+
+`LOCK_VIDEO_MEMORY` locks the memory both into the GPU memory so that it can be used by the GPU as
+into CPU memory so that the application can read/write. It is interesting that these are done by
+the same call.
+
+User signal API
+----------------
+Commands with op `USER_SIGNAL` are used for signaling between the kernel and userspace
+driver.
+The subcommands are:
+
+- `USER_SIGNAL_CREATE` Create a new signal
+  Inputs: 
+     - manualReset
+     If set to gcvTRUE, the `SIGNAL` command must be used with state false to
+     reset the signal. If set to gcvFALSE, the signal automatically resets
+     after waiting for it with `WAIT`.
+  Outputs: id
+
+- `USER_SIGNAL_DESTROY` Destroy the signal
+  Inputs: id
+  Outputs: N/A
+
+- `USER_SIGNAL_SIGNAL` Signal the signal
+  Inputs: id, state
+    - id    Signal id to signal
+    - state If gcvTRUE, the signal will be set to signaled state, if gcvFALSE
+             the signal will be set to nonsignaled state.
+  Outputs: N/A
+
+- `USER_SIGNAL_WAIT` Wait on the signal (block current thread)
+  Inputs: 
+    - id     Signal id to wait for
+    - wait   Maximum duration to wait (in milliseconds)
+  Outputs: N/A
+
+- `USER_SIGNAL_MAP` Map the signal
+  Inputs: id
+  Outputs: N/A
+
+- `USER_SIGNAL_UNMAP` Same as destroy
+  Inputs: id
+  Outputs: N/A
+
+This is used to synchronize GPU and CPU
+Signals can be scheduled to be signalled/unsignalled when the GPU finished a certain operation (using an Event).
+They are also used for inter-thread synchronization by the EGL driver.
+
+The event queue schedules kernel operations to happen in the future.
+
+Events queues are sent to the kernel using `gcvHAL_EVENT_COMMIT`. Types of interfaces that can be sent using an event are:
+
+- `FREE_NON_PAGED_MEMORY`
+- `FREE_CONTIGUOUS_MEMORY`
+- `FREE_VIDEO_MEMORY`
+- `WRITE_DATA`
+- `UNLOCK_VIDEO_MEMORY`
+- `SIGNAL`
+- `UNMAP_USER_MEMORY`
+
+User can then wait for them using `gcvHAL_USER_EVENT`.
+
+Anatomy of a small rendering test
+----------------------------------
+
+See `native/replay` examples.
+
+- Get GPU base address
+- Get chip identity
+- Create user signals for synchronization
+- Query video memory
+- Allocate contiguous memory A of 0x8000 bytes, physical cdd30b40 logical 484ab000
+  -> Command buffer queue
+- Allocate contiguous memory B of 0x8000 bytes, physical cde41e40 logical 484f0000
+  -> Spare command buffer queue?
+- Allocate contiguous memory C of 0x8000 bytes, physical ce699d80 logical 4854b000
+  -> Spare command buffer queue?
+- Allocate contiguous memory D of 0x8000 bytes, physical cdd30440 logical 485a4000
+  -> Spare command buffer queue?
+- Allocate linear vidmem E of 0x70000 bytes, type `RENDER_TARGET`, node cf85a2e0
+    Main render target
+- Allocate linear vidmem F of 0x700 bytes, type `TILE_STATUS`, node d09ab6a8
+    looks like the tile status is an auxilary structure, of render target size /0x100 rounded up to 0x100
+- Lock vidmem E, address 7f4f4100, memory 477e2100
+- Lock vidmem F, address 7a003300, memory 422f1300
+- Allocate linear vidmem G  of 0x38000 bytes, type `DEPTH`, node cf8571b0
+    Depth surface of main render target
+- Allocate linear vidmem H  of 0x400 bytes, type `TILE_STATUS`, node cf8633a8
+    Tile status of depth surface
+- Lock vidmem G, address 7e468000, memory 46756000
+- Lock vidmem H, address 7a002900, memory 422f0900
+- Allocate linear vidmem I  of 0x60000 bytes, type `VERTEX`, node cf85f830
+    Vertex buffer
+- Lock vidmem I, address 7c061d80, memory 4434fd80
+- Allocate linear vidmem J  of 0x4000 bytes, type `RENDER_TARGET`, node cf8633e0 (pool SYSTEM)
+    What is this? (64x64 aux render target?)
+- Allocate linear vidmem K  of 0x100 bytes, type `TILE_STATUS`, node d09a4250
+    Tile status of J aux render target
+- Lock vidmem J, address 7f284000, memory 47572000
+- Lock vidmem K, address 7a002f00, memory 422f0f00
+- Build and commit the command buffer
+
+Command buffers
+==================
+Command buffers are built in user space by the driver in a `gcoCMDBUF` structure, then submitted to the kernel with the 
+`gcvHAL_COMMIT` command. 
+
+The following structure fields of `gcoCMDBUF` are used by the kernel:
+
+- object: marks the type of object (`gcvOBJ_COMMANDBUFFER`)
+- physical: physical address of command buffer
+- logical: logical (user space) address of command buffer
+- bytes: size of command buffer memory block in bytes
+- startOffset: offset at which to start sending command buffer (in bytes)
+- offset: end offset (in bytes)
+- free: number of free bytes in command buffer
+
+Context switching
+==================
+Clients manage their own context, which is passed to COMMIT in case a context switch is needed.
+
+It appears that context switching is manual. Every process has to keep its own context structure for 
+context switching, and pass this to COMMIT. In case this is needed the kernel will then load the state
+from the context buffer.
+
+The context contains a copy of all state that should be preserved when the context has been switched
+(when multiple programs are using the GPU).
+
+This has the form of a giant command stream buffer, accompanied by a state map (an array of offsets
+into the command stream buffer for every known state), and the address where to put a link
+to the main command buffer.
+
+The state `FE.VERTEX_ELEMENT_CONFIG` is handled specially: write only the elements that are used, starting from 0x00600
+
+Used fields in `_gcoCONTEXT` from the kernel:
+
+- id is used
+    [in] This id is used to determine wether to switch context
+    [out] A unique id for the context is generated the first time a COMMIT is done, with context->id==0
+- `hint*` only used when SECURE_USER
+- logical and bufferSize are used
+- pipe2DIndex is used
+    if this is set, "we have to check pipes", and the pipe is set to initialPipe if needed
+- entryPipe is used
+    this is the pipe that has to be active on entering the passed command buffer
+    (and that holds at the end of the context buffer)
+- initialPipe is used 
+    this is the pipe that has to be active on entering the context command buffer
+- currentPipe is used
+    this is the pipe that is active after the passed command buffer
+- inUse is used
+    value at this address is set to gcvTRUE, to mark the context as used. The context is "used" when
+    a context switch happened.
+
+All command buffers are padded with 4 NOPs at the beginning to make place for a PIPE command if needed.
+At the end of the command buffer must be place for a LINK (1 NOP + padding).
+
+What are these used for? Seems they are the last parameters of a 'LOAD_STATE' command so that it
+  can be extended, but why? Was this only used for building or does the kernel also use it?
+- lastAddress
+- lastSize
+- lastIndex
+- lastFixed
+- postCommit
+- buffer (userspace buffer for what is put into logical)
+   same as logical, except that the PIPE2D command at the end is nopped out
+   including accompanying semaphore and stall
+   (probably because we're using the 3D pipe)
+
+At least in the v2 kernel driver they are not used. They are used for building the buffer from the 
+userspace driver, but not for using it.
+
+
author	Wladimir J. van der Laan <laanwj@gmail.com>	2012-12-31 20:40:34 +0100
committer	Wladimir J. van der Laan <laanwj@gmail.com>	2012-12-31 21:35:11 +0100
commit	9d07cb1b2cfee55390de0194db9052530a73a0c2 (patch)
tree	db94cfbe4cc117dbb523d34525400994b9699e5d /doc/kernel_interface.md