VMD Resource

Source: http://wiki.multimedia.cx/index.php?title=VMD

This page is about the VMD format used in Sierra computer games. See the Internet Wave page for information about a web reference format with the extension VMD.

This page is based on the document 'Description of the Sierra Video and Music Data (VMD) Format' by Mike Melanson and Vladimir "VAG" Gneushev at http://multimedia.cx/vmd-format.txt.

Extension: vmd
Company: Sierra Entertainment
Samples: http://samples.mplayerhq.hu/game-formats/sierra-vmd/

VMD is the file extension of a multimedia file format used in a number of Sierra CD-ROM computer games. The extension stands for Video and Music Data. The format is most notable for its use in Sierra's beloved 7-CD classic, Phantasmagoria, and is also used in other multimedia-heavy Sierra titles.

Format/Specifications

All multi-byte numbers are stored in little-endian format.

A VMD file starts with the following 816- (0x330-)byte header:

bytes 0-1      length of header, not including this length field; this
               length should be 0x32E (814)
bytes 2-3      placeholder for VMD handle
bytes 4-5      unknown
bytes 6-7      number of blocks in table of contents
bytes 8-9      top corner coordinate of video frame
bytes 10-11    left corner coordinate of video frame
bytes 12-13    width of video frame
bytes 14-15    height of video frame
bytes 16-17    flags
bytes 18-19    frames per block
bytes 20-23    absolute file offset of multimedia data
bytes 24-27    unknown (Urban Runner samples contain "iv32" there)
bytes 28-795   256 RGB palette entries, 3 bytes/entry in R-G-B order
bytes 796-799  recommended size (bytes) of data frame load buffer
bytes 800-803  recommended size (bytes) of unpack buffer for video decoding
bytes 804-805  audio sample rate
bytes 806-807  audio frame length/sample resolution
bytes 808-809  number of sound buffers
bytes 810-811  audio flags
bytes 812-815  absolute file offset of table of contents

Note that the RGB color components are 6-bit VGA palette components which means that they range from 0..63. The components need to be scaled if they are to be used in rendering typical RGB images where the components are 8 bits.

A VMD file has a table of contents describing all of the file's block and frame information. The absolute file offset of the table of contents is given in the file header and usually points to the end of the file. The table of contents contains 2 parts: The block offset table and the frame information table. Blocks and frames in VMD are different concepts. A frame contains audio or video. A block contains both a video frame and an audio frame. The block offset table consists of a series of 6-byte records. Each record has the following format:

bytes 0-1      unknown
bytes 2-5      absolute file offset of block

The number of entries in this table is specified by bytes 6-7 in the file header. After the block offset table is the frame information table. The frame information table consists of a series of 16-byte records with the following format:

byte 0         frame data type
  1 = audio frame
  2 = video frame
byte 1         unknown
bytes 2-5      frame data length

The meaning of the frame's remaining data depends on the frame type.
if this is an audio frame:
  byte 6       audio flags
  bytes 7-15   unknown

if this is a video frame:
  bytes 6-7    left coordinate of video frame
  bytes 8-9    top coordinate of video frame
  bytes 10-11  right coordinate of video frame
  bytes 12-13  bottom coordinate of video frame
  byte 14      unknown
  byte 15      bit 1 (byte[15] & 0x02) indicates a new palette

Generally, a frame information record needs to be made available to the audio or video decoder units of a VMD decoding application as the information is relevant to the decoding process.

Video Format

The VMD video coding method uses the Lempel-Ziv (LZ77) algorithm, run length encoding (RLE), and interframe differencing to compress 8-bit palettized video data.

VMD video embodies both intraframes (a.k.a. keyframes) and interframes. Intraframes update the entire frame. Interframes only update portions of the frame that have changed from the previous frame. The first video frame of a VMD file is implicitly intracoded (the first frame has to paint the entire viewing area). The successive frames are all intercoded.

The frame record for a video frame specifies the frame coordinates of the rectangular region that will be updated. For example, if the file header specifies that the video is 200 pixels wide and 100 pixels high, the left, top, right, and bottom coordinates of the rectangular update region will be 0, 0, 199, and 99, respectively, if the entire frame is to be updated. A subsequent interframe may choose to leave much of the previous frame unchanged and only update the block from (100, 10) -> (150, 40). In this case, the coordinates 100, 10, 150, and 40 would be encoded in the frame record.

The initial palette for decoding VMD video is transported in the main VMD file header. If bit 1 of frame record byte 15 (byte[15] & 0x02) is set to 1, the compressed video data chunk begins with a new palette. The palette data is transported as 770 bytes. The first byte contains the first palette index to modify. The second byte contains the number of palette entries to change. The remaining 768 bytes are 256 R-G-B palette triplets. Again, these are stored as 6-bit VGA DAC values and should be scaled accordingly.

The compressed video data begins with a byte describing how the data is encoded. The byte specifies the following information:

bit 7      specifies that data chunk is LZ compressed
bits 6-0   specifies 1 of 3 rendering methods

If bit 7 is 1, the data chunk must be passed through the LZ decoder before progressing to the rendering phase. If bit 7 is 0, the data chunk is passed directly to the rendering phase.

The VMD LZ decoding algorithm takes the compressed video buffer (after the coding method byte described above) as input and outputs a buffer of decoded bytes. The output buffer must be as large as indicated in bytes 800-803 of the main VMD header. The VMD LZ decoding algorithm operates as follows:

allocate a circular queue of 4096 (0x1000) bytes and initialize all
  elements to 0x20; note that the queue is addressed with a 12-bit
  number
initialize variable dataleft as the first 4 numbers in the block
if the next 4 bytes are (0x34 0x12 0x78 0x56)
  advance stream over the 4 marker bytes
  initialize queue position (qpos) to 0x111
  initialize special chain length (speclen) to 18
else
  initialize qpos to 0xFEE
  initialize speclen to nothing (any value above 18 will suffice in
    this example)
proceed to main decode loop...
while there is more data left (dataleft > 0)
  tag = the next byte in the stream
  if (tag is 0xFF) and (dataleft > 8)
    take the next 8 bytes from the stream and place them in both the
      output buffer and the circular queue
    subtract 8 from dataleft
  else
    foreach bit in tag byte, reading from right -> left
      if (bit is 1)
        take the next byte from the stream and place it in both the
          output buffer and the circular queue
        decrement dataleft
      else
        move a chain of bytes from the circular queue to the output
        get the length and beginning offset of the chain from the next
          2 bytes in the stream:
          byte 0: bits 7-0: lower 8 bits of beginning offset
          byte 1: bits 7-4: upper 4 bits of beginning offset
                  bits 3-0: length of chain, minus 3
        thus, add 3 to the length to obtain the actual length
        if (length is equal to speclen)
          length is 18 (max ordinary speclen value) + next byte in
            stream
          copy the byte chain from the circular queue to the output;
            in the process, add the chain back into the queue
          subtract length from dataleft

There are 3 rendering methods that a frame can use to paint the raw or LZ-decoded data (referred to as the video data buffer) onto the final output frame.

Method 1 iterates through each line in the output frame, as indicated by the dimensions specified in the frame information record. For each line:

offset = 0
repeat
  length = next byte in video data buffer
  if (bit 7 of length byte is 1)
    mask off bit 7 of length and add 1 (length = (length & 0x7F) + 1)
    copy length bytes from the video data buffer to the output frame
    advance offset by length
  else
    increment length
    copy length bytes from the same position in the previous frame to
      the current frame
while (offset < frame width)

Method 2 simply copies the entire video data buffer onto the output frame. This is the simplest rendering method, but be sure to take into account the frame's specific decoding dimensions as specified in the frame record.

Method 3 operates just like method 1 except for one small change. When bit 7 of the length byte is 1 and the length byte has been masked and incremented, the next byte in the video data buffer is examined. If the byte is not 0xFF, perform the same copy as in method 1. If the byte is 0xFF, apply a RLE decoding algorithm to unpack the data from the video data buffer into the output frame. The RLE unpacking algorithm operates as follows:

if the length is odd, copy the next byte from the video data buffer to
  the output frame and decrement length
divide length by 2
while (length > 0)
  fetch the next byte from the video data buffer
  if the top bit of the byte is 1 (byte & 0x80)
    drop the top bit of the byte and shift left by 1:
      byte = (byte & 0x7F) << 1
    copy (byte) bytes from video data buffer to output frame
  else
    foreach count in byte
      copy the next 2 bytes from video data buffer to output frame; in
        other words, bytes A and B from the video data buffer will be
        repeated n times: ABABABAB...

Audio Format

8-bits audio stored as a raw PCM samples, all 16-bits sound are 2:1 DPCM-encoded. First, you need to ensure VMD contains any sound by checking bit 12 (& 0x1000) of the file header' flags. Non-zero bit indicates file has sound. Fields audio_sample_rate and audio_frame_length contains playback rate and size of single (compressed) sound block respectively. Negative audio frame length used to indicate 16-bits sound data, in this case you need to invert this field to get the actual block length. Audio flags field keeps other important flags: bit 15 (& 0x8000) indicates old-style stereo-sound, while bit 9 (& 0x200) - new stereo sound format (introduced in Shivers 2 game). These formats a little different in the meaning of several fields, making original playback core is not backward compatible - Shivers 2 can not play old videos properly. Optimal way is check bit 15 first and if it's zero, additionally check bit 9 to determinate number of channels. The main difference between old and new formats - old vmds treat audio frame length field as the number of samples for both channels, but new version - only as number of samples for single channel (i.e. you need to multiply it by 2 for stereo sound).

When you encounter frame information record of type 1, proceed to audio decoding. First, analyze frame's audio_flags byte. It may be either:

- normal sound block
Decompress single audio block and continue to next frame
- multiple sound and silence blocks
Get next 4 bytes of the frame's data. This is a sound mask bits. Starting from bit 0, each non-zero bit indicates silence block. Zero means normal audio block. Thus, iterate number_of_sound_buffer times to fill chain of sound and/or silence blocks.
- single silence block
Fill whole block with silence.

The decompression scheme is quite trivial. As stated above, it has been used only if file contains 16-bits audio. First, you need to get one or two (depends on number of the channels) initial samples. For mono and new-stereo sound each initial sample is the first word(s) of audio stream data. Old-style stereo sound has no static samples, instead, they are initialized to zero at the beginning of the playback and carried between successive frames. This may cause some difficulties with random seeking over such kind of files. Using these samples, perform decoding of the rest of chunk' bytes using this formula:

if code & 0x80  sample = sample - Table[code & 0x7F]
else            sample = sample + Table[code & 0x7F]

Where code is the bytes of packed data. Interleave left-right samples decoding for stereo sound.

The delta table is:

   0,    8,   16,   32,   48,   64,   80,   96,  112,  128,  144,  160,  176,  192,  208,  224
 240,  256,  272,  288,  304,  320,  336,  352,  368,  384,  400,  416,  432,  448,  464,  480
 496,  512,  520,  528,  536,  544,  552,  560,  568,  576,  584,  592,  600,  608,  616,  624
 632,  640,  648,  656,  664,  672,  680,  688,  696,  704,  712,  720,  728,  736,  744,  752
 760,  768,  776,  784,  792,  800,  808,  816,  824,  832,  840,  848,  856,  864,  872,  880
 888,  896,  904,  912,  920,  928,  936,  944,  952,  960,  968,  976,  984,  992, 1000, 1008
1016, 1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920
1984, 2048, 2304, 2560, 2816, 3072, 3328, 3584, 3840, 4096, 5120, 6144, 7168, 8192,12288,16384

Games Using VMD

These are some of the Sierra computer games that are known to use the VMD file format:

Versions

VMD

VMD Resource

Contents

Format/Specifications

Video Format

Audio Format

Games Using VMD

Versions

VMD

See Also

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Toolbox