diff options
Diffstat (limited to 'ext/ogg/README')
-rw-r--r-- | ext/ogg/README | 366 |
1 files changed, 0 insertions, 366 deletions
diff --git a/ext/ogg/README b/ext/ogg/README deleted file mode 100644 index 557e9d50..00000000 --- a/ext/ogg/README +++ /dev/null @@ -1,366 +0,0 @@ -This document describes some things to know about the Ogg format, as well -as implementation details in GStreamer. - -INTRODUCTION -============ - -ogg and the granulepos ----------------------- - -An ogg stream contains pages with a serial number and a granulepos. -The granulepos is a 64 bit signed integer. It is a value that in some way -represents a time since the start of the stream. -The interpretation as such is however both codec-specific and -stream-specific. - -ogg has no notion of time: it only knows about bytes and granulepos values -on pages. - -The granule position is just a number; the only guarantee for a valid ogg -stream is that within a logical stream, this number never decreases. - -While logically a granulepos value can be constructed for every ogg packet, -the page is marked with only one granulepos value: the granulepos of the -last packet to end on that page. - -theora and the granulepos -------------------------- - -The granulepos in theora is an encoding of the frame number of the last -key frame ("i frame"), and the number of frames since the last key frame -("p frame"). The granulepos is constructed as the sum of the first number, -shifted to the left for granuleshift bits, and the second number: -granulepos = pframe << granuleshift + iframe - -(This means that given a framenumber or a timestamp, one cannot generate - the one and only granulepos for that page; several granulepos possibilities - correspond to this frame number. You also need the last keyframe, as well - as the granuleshift. - However, given a granulepos, the theora codec can still map that to a - unique timestamp and frame number for that theora stream) - - Note: currently theora stores the "presentation time" as the granulepos; - ie. a first data page with one packet contains one video frame and - will be marked with 0/0. Changing that to be 1/0 (so that it - represents the number of decodable frames up to that point, like - for Vorbis) is being discussed. - -vorbis and granulepos ---------------------- - -In Vorbis, the granulepos represents the number of samples that can be -decoded from all packets up to that point. - -In GStreamer, the vorbisenc elements produces a stream where: -- OFFSET is the time corresponding to the granulepos - number of bytes produced before -- OFFSET_END is the granulepos of the produced vorbis buffer -- TIMESTAMP is the timestamp matching the begin of the buffer -- DURATION is set to the length in time of the buffer - -Ogg media mapping ------------------ - -Ogg defines a mapping for each media type that it embeds. - -For Vorbis: - - - 3 header pages, with granulepos 0. - - 1 page with 1 packet header identification - - N pages with 2 packets comments and codebooks - - granulepos is samplenumber of next page - - one packet can contain a variable number of samples but one frame - that should be handed to the vorbis decoder. - -For Theora - - - 3 header pages, with granulepos 0. - - 1 page with 1 packet header identification - - N pages with 2 packets comments and codebooks - - granulepos is framenumber of last packet in page, where framenumber - is a combination of keyframe number and p frames since keyframe. - - one packet contains 1 frame - - - - -DEMUXING -======== - -ogg demuxer ------------ - -This ogg demuxer has two modes of operation, which both share a significant -amount of code. The first mode is the streaming mode which is automatically -selected when the demuxer is connected to a non-getrange based element. When -connected to a getrange based element the ogg demuxer can do full seeking -with great efficiency. - -1) the streaming mode. - -In this mode, the ogg demuxer receives buffers in the _chain() function which -are then simply submited to the ogg sync layer. Pages are then processed when -the sync layer detects them, pads are created for new chains and packets are -sent to the peer elements of the pads. - -In this mode, no seeking is possible. This is the typical case when the -stream is read from a network source. - -In this mode, no setup is done at startup, the pages are just read and decoded. -A new logical chain is detected when one of the pages has the BOS flag set. At -this point the existing pads are removed and new pads are created for all the -logical streams in this new chain. - - -2) the random access mode. - - In this mode, the ogg file is first scanned to detect the position and length -of all chains. This scanning is performed using a recursive binary search -algorithm that is explained below. - - find_chains(start, end) - { - ret1 = read_next_pages (start); - ret2 = read_prev_page (end); - - if (WAS_HEADER (ret1)) { - } - else { - } - - } - - a) read first and last pages - - start end - V V - +-----------------------+-------------+--------------------+ - | 111 | 222 | 333 | - BOS BOS BOS EOS - - - after reading start, serial 111, BOS, chain[0] = 111 - after reading end, serial 333, EOS - - start serialno != end serialno, binary search start, (end-start)/2 - - start bisect end - V V V - +-----------------------+-------------+--------------------+ - | 111 | 222 | 333 | - - - after reading start, serial 111, BOS, chain[0] = 111 - after reading end, serial 222, EOS - - while ( - - - -testcases ---------- - - a) stream without BOS - - +----------------------------------------------------------+ - 111 | - EOS - - b) chained stream, first chain without BOS - - +-------------------+--------------------------------------+ - 111 | 222 | - BOS EOS - - - c) chained stream - - +-------------------+--------------------------------------+ - | 111 | 222 | - BOS BOS EOS - - - d) chained stream, second without BOS - - +-------------------+--------------------------------------+ - | 111 | 222 | - BOS EOS - -What can an ogg demuxer do? ---------------------------- - -An ogg demuxer can read pages and get the granulepos from them. -It can ask the decoder elements to convert a granulepos to time. - -An ogg demuxer can also get the granulepos of the first and the last page of a -stream to get the start and end timestamp of that stream. -It can also get the length in bytes of the stream -(when the peer is seekable, that is). - -An ogg demuxer is therefore basically able to seek to any byte position and -timestamp. - -When asked to seek to a given granulepos, the ogg demuxer should always convert -the value to a timestamp using the peer decoder element conversion function. It -can then binary search the file to eventually end up on the page with the given -granule pos or a granulepos with the same timestamp. - -Seeking in ogg currently ------------------------- - -When seeking in an ogg, the decoders can choose to forward the seek event as a -granulepos or a timestamp to the ogg demuxer. - -In the case of a granulepos, the ogg demuxer will seek back to the beginning of -the stream and skip pages until it finds one with the requested timestamp. - -In the case of a timestamp, the ogg demuxer also seeks back to the beginning of -the stream. For each page it reads, it asks the decoder element to convert the -granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until -the page has a timestamp bigger or equal to the requested one. - -It is therefore important that the decoder elements in vorbis can convert a -granulepos into a timestamp or never seek on timestamp on the oggdemuxer. - -The default format on the oggdemuxer source pads is currently defined as a the -granulepos of the packets, it is also the value of the OFFSET field in the -GstBuffer. - -MUXING -====== - -Oggmux ------- - -The ogg muxer's job is to output complete Ogg pages such that the absolute -time represented by the valid (ie, not -1) granulepos values on those pages -never decreases. This has to be true for all logical streams in the group at -the same time. - -To achieve this, encoders are required to pass along the exact time that the -granulepos represents for each ogg packet that it pushes to the ogg muxer. -This is ESSENTIAL: without this exact time representation of the granulepos, -the muxer can not produce valid streams. - -The ogg muxer has a packet queue per sink pad. From this queue a page can -be flushed when: - - total byte size of queued packets exceeds a given value - - total time duration of queued packets exceeds a given value - - total byte size of queued packets exceeds maximum Ogg page size - - eos of the pad - - encoder sent a command to flush out an ogg page after this new packet - (in 0.8, through a flush event; in 0.10, with a GstOggBuffer) - - muxer wants a flush to happen (so it can output pages) - -The ogg muxer also has a page queue per sink pad. This queue collects -Ogg pages from the corresponding packet queue. Each page is also marked -with the timestamp that the granulepos in the header represents. - -A page can be flushed from this collection of page queues when: -- ideally, every page queue has at least one page with a valid granulepos - -> choose the page, from all queues, with the lowest timestamp value -- if not, muxer can wait if the following limits aren't reached: - - total byte size of any page queue exceeds a limit - - total time duration of any page queue exceeds a limit -- if this limit is reached, then: - - request a page flush from packet queue to page queue for each queue - that does not have pages - - now take the page from all queues with the lowest timestamp value - - make sure all later-coming data is marked as old, either to be still - output (but producing an invalid stream, though it can be fixed later) - or dropped (which means it's gone forever) - -The oggmuxer uses the offset fields to fill in the granulepos in the pages. - -GStreamer implementation details --------------------------------- -As said before, the basic rule is that the ogg muxer needs an exact time -representation for each granulepos. This needs to be provided by the encoder. - -Potential problems are: - - initial offsets for a raw stream need to be preserved somehow. Example: - if the first audio sample has time 0.5, the granulepos in the vorbis encoder - needs to be adjusted to take this into account. - - initial offsets may need be on rate boundaries. Example: - if the framerate is 5 fps, and the first video frame has time 0.1 s, the - granulepos cannot correctly represent this timestamp. - This can be handled out-of-band (initial offset in another muxing format, - skeleton track with initial offsets, ...) - -Given that the basic rule for muxing is that the muxer needs an exact timestamp -matching the granulepos, we need some way of communicating this time value -from encoders to the Ogg muxer. So we need a mechanism to communicate -a granulepos and its time representation for each GstBuffer. - -(This is an instance of a more generic problem - having a way to attach - more fields to a GstBuffer) - -Possible ways: -- setting TIMESTAMP to this value: bad - this value represents the end time - of the buffer, and thus conflicts with GStreamer's idea of what TIMESTAMP - is. This would cause problems muxing the encoded stream in other muxing - formats, or for streaming. Note that this is what was done in GStreamer 0.8 -- setting DURATION to GP_TIME - TIMESTAMP: bad - this breaks the concept of - duration for this frame. Take the video example above; each buffer would - have a correct timestamp, but always a 0.1 s duration as opposed to the - correct 0.2 s duration -- subclassing GstBuffer: clean, but requires a common header used between - ogg muxer and all encoders that can be muxed into ogg. Also, what if - a format can be muxed into more than one container, and they each have - their own "extra" info to communicate ? -- adding key/value pairs to GstBuffer: clean, but requires changes to - core. Also, the overhead of allocating e.g. a GstStructure for *each* buffer - may be expensive. -- "cheating": - - abuse OFFSET to store the timestamp matching this granulepos - - abuse OFFSET_END to store the granulepos value - The drawback here is that before, it made sense to use OFFSET and OFFSET_END - to store a byte count. Given that this is not used for anything critical - (you can't store a raw theora or vorbis stream in a file anyway), - this is what's being done for now. - -In practice ------------ -- all encoders of formats that can be muxed into Ogg produce a stream where: - - OFFSET is abused to be the timestamp corresponding exactly to the - granulepos - - OFFSET_END is abused to be the granulepos of the encoded theora buffer - - TIMESTAMP is the timestamp matching the begin of the buffer - - DURATION is the length in time of the buffer - -- initial delays should be handled in the GStreamer encoders by mangling - the granulepos of the encoded packet to take the delay into account as - best as possible and store that in OFFSET; - this then brings TIMESTAMP + DURATION to within less - than a frame period of the granulepos's time representation - The ogg muxer will then create new ogg packets with this OFFSET as - the granulepos. So in effect, the granulepos produced by the encoders - does not get used directly. - -TODO ----- -- decide on a proper mechanism for communicating extra per-buffer fields -- the ogg muxer sets timestamp and duration on outgoing ogg pages based on - timestamp/duration of incoming ogg packets. - Note that: - - since the ogg muxer *has* to output pages sorted by gp time, representing - end time of the page, this means that the buffer's timestamps are not - necessarily monotonically increasing - - timestamp + duration of buffers don't match up; the duration represents - the length of the ogg page *for that stream*. Hence, for a normal - two-stream file, the sum of all durations is twice the length of the - muxed file. - -TESTING -------- -Proper muxing can be tested by generating test files with command lines like: -- video and audio start from 0: -gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg - -- video starts after audio: -gst-launch -v videotestsrc timestamp-offset=500000000 ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg - -- audio starts after video: -gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc timestamp-offset=500000000 ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg - -The resulting files can be verified with oggz-validate for correctness. |