SCI Specifications: Chapter 5 - The SCI Virtual Machine

Table of Contents Introduction Interpreter initialization and the main execution loop The SCI Heap The Sierra PMachine Kernel functions

Introduction

Script resources

Like any processor, the SCI virtual machine is virtually useless without code to execute. This code is provided by script resources, which constitute the logic behind any SCI game.

In order to operate on the script resource, those first have to be loaded to the heap. The heap is the only memory space that the VM can work on directly (with some restrictions); all other memory spaces have to be used implicitly or explicitly by using kernel calls. The heap also contains a stack, which is heavily used by SCI bytecode.

Each script resource may contain one or several of various script objects, listed here:

Type 1: Object
Type 2: Code
Type 3: Synonym word lists
Type 4: Said specs
Type 5: Strings
Type 6: Class
Type 7: Exports
Type 8: Relocation table
Type 9: Preload text (a flag, rather than a real section)
Type 10: Local variables

Standard SCI0 scripts (of post-0.000.396 SCI0, approximately) consist of a four-byte header, followed by a list of bytes:

[00][01]: Block type as LE 16 bit value, or 0 to terminate script resource
[02][03]: Block size as LE 16 bit value; includes header size
[04]...: Data

The code blocks contain the SCI bytecode that actually gets executed. The export block (of which there may be only one (or none at all)) contains script-relative pointers to exported functions, which can be called by the SCI operations calle and callb. The local variables block, which stores one of the four variable types, is used to share variables among the objects and classes of one script.

But the most important script members are Objects and Classes. As in the usual OOP terms, Classes refer to object prototypes, and Objects are instantiated Classes. However, unlike most OOP languages, SCI treats the base class very similar to objects, so that they may actually get called by the SCI bytecode. Therefore, they also have their own space for selectors (see below). Also, each object or class knows which class it inherits from and which class it was instantiated from (in the case of objects).

Note that all script segments are optional and 16 bit aligned; they are described in more detail below:

Object segments

Objects look like this (LE 16 bit values):

[00][01]: Magic number 0x1234
[02][03]: Local variable offset (filled in at run-time)
[04][05]: Offset of the function selector list, relative to its own position
[06][07]: Number of variable selectors (= #vs)
[08][09]: The 'species' selector
[0a][0b]: The 'superClass' selector
[0c][0d]: The '--info--' selector
[0e][0f]: The 'name' selector (object/class name)
[10]...: (#vs-4) more variable selectors
[08+ #vs*2][09+ #vs*2]: Number of function selectors (= #fs)
[0a+ #vs*2]...: Selector IDs for the functions
[08+ #vs*2 + #fs*2][09+ #vs*2 + #fs*2]zero [0a+ #vs*2 + #fs*2]...: Function selector code pointers

For objects, the selectors are simply values for the selector IDs specified in their species class (which is either present by its offset (in-memory) or class ID (in-script)- the same for the species' superclass (superClass selector)). Info typically has one of the following values (although this does not appear to be relevant for SCI):

0x0000: Normal (statical) object
0x0001: Clone
0x8000: Class

Other values are used, but do not appear to be of relevance ^{[Note 1]}

Code segments

Code segments contain free-form SCI bytecode. Pointers into this code are held by objects, classes, and export entries; these entries are, in turn, referenced in the export segment.

Synonym word list segments

Inside these, synonyms for certain words may be found. A synonym is a tuple (a, b), where both a and b are word groups, and b is the replacement for a if this synonym is in use. They are stored as 16 bit LE values in sequence (first a, then b). Synonyms must be set explicitly by the kernel function SetSynonyms() (as described the Section called Kernel function 0x26: SetSynonyms(DblList). It is not possible to select synonyms selectively.

Said spec segments

This section contains said specs (explained in the Section called Said specs in Chapter 6), tightly grouped.

String segments

This segment contains a sequence of asciiz strings describing class and object names, debug information, and (occasionally) game text.

Class segments

Classes look similar to objects:

[00][01]: Magic number 0x1234
[02][03]: Local variable offset (filled in at run-time)
[04][05]: Offset of the function selector list, relative to its own position
[06][07]: Number of variable selectors (= #vs)
[08][09]: The 'species' selector
[0a][0b]: The 'superClass' selector
[0c][0d]: The '--info--' selector
[0e][0f]: The 'name' selector (object/class name)
[10]...: (#vs-4) more variable selectors
[08+ #vs*2][09+ #vs*2]: Selector ID of the first varselector (0)
[0a+ #vs*2]...: Selector ID of the second etc. varselectors
[08+ #vs*4][09+ #vs*4]: Number of function selectors (#fs)
[0a+ #vs*4]...: Function selector code pointers
[08+ #vs*4 + #fs*2][09+ #vs*4 + #fs*2]: 0
[0a+ #vs*4 + #fs*2]...: Selector ID of the first etc. funcselectors

Simply put, they look like objects with each selector section followed by a list of selector IDs.

Export segments

External symbols are contained herein, the number of which is described by the first (16 bit LE) value in the segment. All the values that follow point to addresses that the program counter will jump to when a calle operation is invoked. An exception is script 0, entry 0, which points to the first object whose 'play' method should be invoked during startup (a magical entry point like C's 'main())' function).

Relocation tables

This section contains script-relative pointers pointing to pointers inside the script. These refer to script-relative addresses and need to be relocated when the script is loaded to the heap; this is done by adding the offset of the first byte of the script on the heap to each of the values referenced in this section ^{[Note 2]}.

The section itself starts with a 16 bit LE value containing the number of pointers that follow, with each of the script-relative 16 bit pointers beyond having semantics as described above

The Preload Text flag

This is an actual script section, although it is always of size 4 (i.e. only consists of the script header). It is only checked for presence; if script.x is loaded and contains this section, the text.x resource is also loaded implicitly along with it ^{[Note 3]}.

Local variable segments

This section contains the script's local variable segment, which consists of a sequence of 16 bit little-endian values.

Selectors

Selectors are very important in SCI. They can be either methods or object/class-relative variables, and this makes the interpretation of SCI operations like send a bit tricky.

Each class comes with two two-dimensional tables. The first table contains selector values and selector indices^{[Note 4]} for each variable selector. The second table contains selector indices and script-relative method offsets. Objects look nearly identical, but they do not contain the list of selector indices for variable selectors, since those can be looked up at the class they were instantiated from (their "species", which happens to be one of the variable selectors).

Now, whenever a selector is sent for, the engine has to determine the right action to take. FreeSCI first determines whether the selector is a variable selector, by looking for it in the list of variable selector indices of the species class of the object that the "send" was sent to (classes use their own class number as their species class)^{[Note 5]} . If it is, the selector value is either read (if no parameter was provided to the send call) or set (if one parameter was provided). If the selector was not part of the variable selectors of the specified object, the object's methods are checked for this selector index. If they don't contain the selector index, either, then FreeSCI recurses into checking the method selectors of the object's superclasses. If it finds the selector value there, it calls the heap address corresponding to the selector index.

Function invocation

SCI provides three distinct ways for invoking a function^{[Note 6]}:

Calling exported functions (calle, callb)
Calling selector methods (send, self, super)
Calling PC-relative addresses (call)

Exported functions are called by providing a script number and an exported function number (which is then looked up in the script's Type 7 block). They use the object they were called from to look up local variables and selectors for self and super.

Selector methods are called by providing an object and a selector index. The selector index gets looked up in the object's selector tables, and, if it is used for a method, this method gets invocated. The provided object is used for local references.

PC-relative calls only make sense inside scripts, since they jump to a position relative to the call opcode. The calling object is used for local references.

Variable types

SCI bytecode can address four types of variables (not counting the variable selectors). Those variable types are:

Local variables

These are the variables stored in Type 10 script blocks. They are shared between the objects and classes of each script.

Global variables

These variables are the local variables of script 0.

Temporary variables

Those variables are stored on the stack. They are relative to the stack frame of the current method, so space for them must be allocated before they can be used. This is commonly done by using the link operation.

Parameters

Parameters are stored on the stack below the current stack frame, as they technically belong to the calling function. They can be modified, if necessary.^{[Note 7]}

Interpreter initialization and the main execution loop

By Lars Skovlund
Version 1.0, 7. July 1999

When the interpreter initializes, it sets up a timer for 60 hertz (one that "ticks" 60 times per second). This timer does two things: it lets the so-called servers execute (most notably, the sound player and input manager) and it "feeds" the internal game clock. This 60 hz. "systick" is used all over the place. For example, it is accessible using the KGetTime kernel function. Some graphic effects depend on it, for example the "shake screen" effect. In SCI1, it is also used for timing in the palette fades. And naturally, it is used in the KWait kernel call.

Basically, the initialization proceeds as follows:

Initialize the heap and hunk.
Parse the config file and the command line.
Load the drivers specified in the config file.
Initialize the graphics subsystem.
Initialize the event manager.
Initialize the window manager
Initialize the text parser (i.e. load the vocabulary files).
Initialize the music player.
Save the machine state for restarting the game later on^{[Note 8]}.
Allocate the PMachine stack on the heap.
Get a pointer to the game object.
And run, by executing the play or replay method.

The right game object is found by looking in the "dispatch table" of script 0. The dispatch table has block type 7, and is an array of words. The first entry is a pointer (script relative) to the game object, for instance SQ3. If the game was restarted, the interpreter executes the replay method, play otherwise.

After looking up the address of the method in the object block, execution is started. It can be viewed as a huge switch statement, which executes continuously. When the last ret statement (in the play or replay method) is met, the interpreter terminates.

The ExecuteCode function, which contains the mentioned switch statement, is called recursively. It lets other subroutines handle the object complexity, all the ExecuteCode function has is a pointer to the next instruction. Thus, it is easy to terminate the interpreter; just return from all running instances of ExecuteCode.

So, how does an SCI program execute? Well, the play method is defined in the Game class, and it is never overridden. It consists of a huge loop which calls Game::doit continuously, followed by a pause according to the selected animation speed. That is, the script, not the interpreter, handles animation speed. Notice how the debugger very often shows the statement sag $12 upon entering the debugger? This instruction resides in Game::play, and the break occurs here because of a KWait kernel call which is executed right before that instruction. This wait takes the most execution time, so therefore the debug break is most likely to be A game programmer would then override Game::doit and place the game specific main loop here (still, Game::doit is almost identical from game to game). Execution of the Game::play main loop stops when an event causes global variable 4 to be non-zero. The last ret instruction is met, and the interpreter terminates.

The SCI Heap

SCI0 (and probably SCI1 as well) uses a heap consisting of 0xffff bytes of memory; this size corresponds to the size of one i386 real-mode memory segment minus one.^{[Note 9]}

Heap structure

The original heap starts with 200 separate entries with a size of four bytes. Each of those entries appears to be a pointer to "hunk" memory, which is separate from the heap and not covered here. The actual heap base pointer points to the first byte that is not part of these pointers.

Memory handles

A memory handle consists of two consecutive unsigned 16 bit integers:

The memory block size
The heap address of the next memory handle

in this sequence.

Memory handles are stored inside of the heap; they delimit the holes in the heap by indexing each other, with the exception of the last handle, which points to zero.

Initialization

The list is initialized to 0. Memory handle #0 is set to contain 0xffff minus the size required by the memory handles (800 bytes) to a total of 0xfcdf, the pointer to the next free index is set to 0x0.

Memory allocation

The memory allocation function takes one parameter; this is the requested allocation block size. If it is 0, the function aborts. Otherwise, the size is increased by 2 (and then again by 1, if it is odd, for alignment purposes).

After the memory allocation algorithm finds a sufficiently large memory hole, it allocates its memory by splitting the memory hole and allocating the lower part (or by swallowing the upper part if its size would be less than four). It adjusts the previous memory handle (which used to point to the start of the now allocated part of the heap) to point to the next hole, and then goes on to write its size to the first two bytes of its newly allocated home.

If no sufficiently large memory hole can be found, the function returns 0; otherwise, it returns a heap pointer to the start of the allocated block (i.e. to the two bytes that carry the block's size).

Memory deallocation does this process in reverse; it also merges adjacent memory holes to prevent memory fragmentation.

The Sierra PMachine

Lars Skovlund, Dark Minister and Christoph Reichenbach
Version 1.0, 6. July 1999

This document describes the design of the Sierra PMachine (the virtual CPU used for executing SCI programs). It is a special CPU, in the sense that it is designed for object oriented programs.

There are three kinds of memory in SCI: Variables, objects, and stack space. The stack space is used in a Last-In-First-Out manner, and is primarily used for temporary space in a routine, as well as passing data from one routine to another. Note that the stack space is used bottom-up by the original interpreter, instead of the more usual top-down. I don't know if this has any significance for us.

Scripts are loaded into the PMachine by creating a memory image of it on the heap. For this reason, the script file format may seem a bit obscure at times. It is optimized for in-memory performance, not readability. It should be mentioned here that a lot of fix-up stuff is done by the interpreter. In the script files, all addresses are specified as script-relative. These are converted to absolute offsets. The species and superClass fields of all objects are converted into pointers to the actual class etc.

There are four types of variables. These are called global, local, temporary, and parameter. All four types are simple arrays of 16-bit words. A pointer is kept for each type, pointing to the list that is currently active. In fact, only the global variable list is constant in memory. The other pointers are changed frequently, as scripts are loaded/unloaded, routines called, etc. The variables are always referenced as an index into the variable list. I'll explain the four types below - the names in parentheses will be used occasionally in the rest of the text:

Local variables (LocalVar)

This variable type is called "local" because it belongs to a specific script. Each script may have its own set of local variables, defined by script block type 10. As long as the code from a specific script is running, the local variables for that script are "active" (pointed to by the mentioned pointer).

Global variables

These, like the local variables, reside in script space (in fact, they are the local variables of script 0!). But the pointer to them remains constant for the whole duration of the program.

Temporary variables

These are allocated by specific subroutines in a script. They reside on the PMachine stack and are allocated by the link opcode. The temp variables are automatically discarded when the subroutine returns.

Parameter variables

These variables also reside on the stack. They contain information passed from one routine to another. Any routine in SCI is capable of taking a variable number of parameters, if need be. This is possible because a list size is pushed as the first thing before calling a routine. In addition to this, a frame size is passed to the call* functions.

Objects

While two adjacent variables may be entirely unrelated, the contents of an object is always related to one task. The object, like the variable tables, provides storage space. This storage space is called properties. Depending on the instructions used, a property can be referred to by index into the object structure, or by property IDs (PIDs). For instance, the name property has the PID 17h, but the offset 6. The property IDs are assigned by the SCI compiler, and it is the "compatible" way of accessing object data. Whereas the offset method is used only internally by an object to access its own data, the PID method is used externally by objects to read/write the data fields of other objects. The PID method is also used to call methods in an object, either by the object itself, by another object, or by the SCI interpreter. Yes, this really happens sometimes.

The PMachine "registers"

The PMachine can be said to have a number of registers, although none of them can be accessed explicitly by script code. They are used/changed implicitly by the script opcodes:

Acc - the accumulator. Used for result storage and input for a number of opcodes.
IP - the instruction pointer.^{[Note 10]} Points to the currently executing instruction
Vars - an array of 4 values, pointing to the current variables of each mentioned type
Object - points to the currently executing object.
SP - the current stack pointer. Note that the stack in the original SCI interpreter is used bottom-up instead of the more usual top-down.

The PMachine, apart from the actual instruction pointer, keeps a record of which object is currently executing.

The instruction set

The PMachine CPU potentially has 128 instructions (however, a couple of these are invalid and generate an error). Some of these instructions have a flag which specify whether the opcode has byte- or word-sized operands (I will refer to this as variably-sized parameters, as opposed to constant parameters). Other instructions have only one calling form. These instructions simply disregard the operand size flag. Ideally, however, all script instructions should be prepared to take variably-sized operands. Yet another group of instructions take both a constant parameter and a variably-sized parameter. The format of an opcode byte is as follows:

bit 7-1	opcode number
bit 0	operand size flag

Relative addresses

Certain instructions (in particular, branching ones) take relative addresses as a parameter. The actual address is calculated based on the instruction after the branching instruction itself. In this example, the bnt instruction, if the branch is made, jumps over the ldi instruction.

        eq?
        bnt +2
        ldi byte 2
        push

Note: Relative addresses are signed values.

Dispatch addresses

The callb and calle instructions take a so-called dispatch index as a parameter. This index is used to look up an actual script address, using the so-called dispatch table. The dispatch table is located in script block type 7 in the script file. It is a series of words - the first one, as in so many other places in the script file, is the number of entries.

Frame sizes

In every call instruction, a value is included which determines the size of the parameter list, as an offset into the stack. This value discounts the list size pushed by the SCI code. For instance, consider this example from real SCI code:

     pushi 3 ; three parameters passed
     pushi 4 ; the screen flag
     pTos x ; push the x property
     pTos y ; push the y property
     callk OnControl, 6

Notice that, although the callk line specifies 6 bytes of parameters, the kernel routine has access to the list size (which is at offset 8)!

PErrors

These are internal errors in the interpreter. They are usually caused by buggy script code. The PErrors end up displaying an "Oops!" box in the original interpreter (it is interesting to see how Sierra likes to believe that PErrors are caused by the user - judging by the message "You did something we weren't expecting"!). In the original interpreter, specifying -d on the command line causes it to give more detailed information about PErrors, as well as activating the internal debugger if one occurs.

Class numbers and addresses

The key to finding a specific class lies in the class table. This class table resides in VOCAB.996, and contains the numbers of scripts that carry classes. If a script has more than one class definition, the script number is repeated as necessary. Notice how each script number is followed by a zero word? When the interpreter loads a script, it checks to see if the script has classes. If it does, a pointer to the object structure is put in this empty space.

The instructions

The instructions are described below. I have used Dark Minister's text on the subject as a starting point, but many things have changed; stuff explained more thoroughly, errors corrected, etc. The first 23 instructions (up to, but not including, bt) take no parameters.

These functions are used in the pseudocode explanations:

pop(): sp -= 2; return *sp;
push(x): *sp = x; sp += 2; return x;

The following rules apply to opcodes:

Parameters are signed, unless stated otherwise. Sign extension is performed.
Jumps are relative to the position of the next operation.
*TOS refers to the TOS (Top Of Stack) element.
"tmp" refers to a temporary register that is used for explanation purposes only.

op 0x00: bnot (1 byte)

op 0x01: bnot (1 byte)

Binary not:

acc ^= 0xffff;

op 0x02: add (1 byte)

op 0x03: add (1 byte)

Addition:

acc += pop();

op 0x04: sub (1 byte)

op 0x05: sub (1 byte)

Subtraction:

acc = pop() - acc;

op 0x06: mul (1 byte)

op 0x07: mul (1 byte)

Multiplication:

acc *= pop();

op 0x08: div (1 byte)

op 0x09: div (1 byte)

Division:

acc = pop() / acc;

Division by zero is caught => acc = 0.

op 0x0a: mod (1 byte)

op 0x0b: mod (1 byte)

Modulo:

acc = pop() % acc;

Modulo by zero is caught => acc = 0.

op 0x0c: shr (1 byte)

op 0x0d: shr (1 byte)

Shift Right logical:

acc = pop() >> acc;

op 0x0e: shl (1 byte)

op 0x0f: shl (1 byte)

Shift Left logical:

acc = pop() << acc;

op 0x10: xor (1 byte)

op 0x11: xor (1 byte)

Exclusive or:

acc ^= pop();

op 0x12: and (1 byte)

op 0x13: and (1 byte)

Logical and:

acc &= pop();

op 0x14: or (1 byte)

op 0x15: or (1 byte)

Logical or:

acc |= pop();

op 0x16: neg (1 byte)

op 0x17: neg (1 byte)

Sign negation:

acc = -acc;

op 0x18: not (1 byte)

op 0x19: not (1 byte)

Boolean not:

acc = !acc;

op 0x1a: eq? (1 byte)

op 0x1b: eq? (1 byte)

Equals?:

prev = acc;

acc = (acc == pop());

op 0x1c: ne? (1 byte)

op 0x1d: ne? (1 byte)

Is not equal to?

prev = acc;

acc = !(acc == pop());

op 0x1e: gt? (1 byte)

op 0x1f: gt? (1 byte)

Greater than?

prev = acc;

acc = (pop() > acc);

op 0x20: ge? (1 byte)

op 0x21: ge? (1 byte)

Greater than or equal to?

prev = acc;

acc = (pop() >= acc);

op 0x22: lt? (1 byte)

op 0x23: lt? (1 byte)

Less than?

prev = acc;

acc = (pop() < acc);

op 0x24: le? (1 byte)

op 0x25: le? (1 byte)

Less than or equal to?

prev = acc;

acc = (pop() <= acc);

op 0x26: ugt? (1 byte)

op 0x27: ugt? (1 byte)

Unsigned: Greater than?

acc = (pop() > acc);

op 0x28: uge? (1 byte)

op 0x29: uge? (1 byte)

Unsigned: Greater than or equal to?

acc = (pop() >= acc);

op 0x2a: ult? (1 byte)

op 0x2b: ult? (1 byte)

Unsigned: Less than?

acc = (pop() < acc);

op 0x2c: ule? (1 byte)

op 0x2d: ule? (1 byte)

Unsigned: Less than or equal to?

acc = (pop() <= acc);

op 0x2e: bt W relpos (3 bytes)

op 0x2f: bt B relpos (2 bytes)

Branch relative if true

if (acc) pc += relpos;

op 0x30: bnt W relpos (3 bytes)

op 0x31: bnt B relpos (2 bytes)

Branch relative if not true

if (!acc) pc += relpos;

op 0x32: jmp W relpos (3 bytes)

op 0x33: jmp B relpos (2 bytes)

Jump

pc += relpos;

op 0x34: ldi W data (3 bytes)

op 0x35: ldi B data (2 bytes)

Load data immediate

acc = data;

Sign extension is done for 0x35 if required.

op 0x36: push (1 byte)

op 0x37: push (1 byte)

Push to stack

push(acc)

op 0x38: pushi W data (3 bytes)

op 0x39: pushi B data (2 bytes)

Push immediate

push(data)

Sign extension for 0x39 is performed where required.

op 0x3a: toss (1 byte)

op 0x3b: toss (1 byte)

TOS subtract

pop();

For confirmation: Yes, this simply tosses the TOS value away.

op 0x3c: dup (1 byte)

op 0x3d: dup (1 byte)

Duplicate TOS element

push(*TOS);

op 0x3e: link W size (3 bytes)

op 0x3f: link B size (2 bytes)

sp += (size * 2);

op 0x40: call W relpos, B framesize (4 bytes)

op 0x41: call B relpos, B framesize (3 bytes)

Call inside script.

(See description below)

sp -= (framesize + 2 + &rest_modifier);

&rest_modifier = 0;

This calls a script subroutine at the relative position relpos, setting up the ParmVar pointer first. ParmVar points to sp-framesize (but see also the &rest operation). The number of parameters is stored at word offset -1 relative to ParmVar.

op 0x42: callk W kfunct, B kparams (4 bytes)

op 0x43: callk B kfunct, B kparams (3 bytes)

Call kernel function (see the Section called Kernel functions)

sp -= (kparams + 2 + &rest_modifier);

&rest_modifier = 0;

(call kernel function kfunct)

op 0x44: callb W dispindex, B framesize (4 bytes)

op 0x45: callb B dispindex, B framesize (3 bytes)

Call base script

(See description below)

sp -= (framesize + 2 + &rest_modifier);


&rest_modifier = 0;

This operation starts a new execution loop at the beginning of script 0, public method dispindex (Each script comes with a dispatcher list (type 7) that identifies public methods). Parameters are handled as in the call operation.

op 0x46: calle W script, W dispindex, B framesize (5 bytes)op 0x47: calle B script, B dispindex, B framesize (4 bytes)Call external script

(See description below)
sp -= (framesize + 2 + &rest_modifier);
&rest_modifier = 0;

This operation starts a new execution loop at the beginning of script scripts public method dispindex. The dispatcher list (the script's type 7 object) is used to dereference the requested method. Parameters are handled as described for the call operation.

op 0x48: ret (1 byte)op 0x49: ret (1 byte)Return: returns from an execution loop started by call, calle, callb, send, self or super.op 0x4a: send B framesize (2 bytes)op 0x4b: send B framesize (2 bytes)Send for one or more selectors. This is the most complex SCI operation (together with self and class).

Send looks up the supplied selector(s) in the object pointed to by the accumulator. If the selector is a variable selector, it is read (to the accumulator) if it was sent for with zero parameters. If a parameter was supplied, this selector is set to that parameter. Method selectors are called with the specified parameters.

The selector(s) and parameters are retrieved from the stack frame. Send first looks up the selector ID at the bottom of the frame, then retrieves the number of parameters, and, eventually, the parameters themselves. This algorithm is iterated until all of the stack frame has been "used up". Example:

 ; This is an example for usage of the SCI send operation
                             pushi x      ; push the selector ID of x
                             push1        ; 1 parameter: x is supposed to be set
                             pushi 42     ; That's the value x will get set to
                             pushi moveTo ; In this example, moveTo is a method selector.
                             push2        ; It will get called with two parameters-
                             push         ; The accumulator...
                             lofss 17     ; ...and PC-relative address 17.
                             pushi foo    ; Let's assume that foo is another variable selector.
                             push0        ; This will read foo and return the value in acc.
                             send 12      ; This operation does three quite different things.,

op 0x4c

op 0x4d
op 0x4e
op 0x4f

These opcodes don't exist in SCI.op 0x50: class W function (3 bytes)op 0x51: class B function (2 bytes)Get class address. Sets the accumulator to the memory address of the specified function of the current object.op 0x52op 0x53These opcodes don't exist in SCI.op 0x54: self B stackframe (2 bytes)op 0x55: self B stackframe (2 bytes)

Send to self. This operation is the same as the send operation, except that it sends to the current object instead of the object pointed to by the accumulator.

op 0x56: super W class, B stackframe (4 bytes)op 0x57: super B class, B stackframe (3 bytes)Send to any class. This operation is the same as the send operation, except that it sends to an arbitrary class. op 0x58: &rest W paramindex (3 bytes)op 0x59: &rest B paramindex (2 bytes)

Pushes all or part of the ParmVar list on the stack. The number specifies the first parameter variable to be pushed. I'll give a small example. Suppose we have two functions:

function a(y,z) and function b(x,y,z)

function b wants to call function a with its own y and z parameters. Easy job, using the the normal lsp instruction. Now suppose that both function a and b are designed to take a variable number of parameters:

function a(y,z,...) and function b(x,y,z,...)

Since lsp does not support register indirection, we can't just push the variables in a loop (as we would in C). Instead this function is used. In this case, the instruction would be &rest 2, since we want the copying to start from y (inclusive), the second parameter.

Note that the values are copied to the stack immediately. The &rest_modifier is set to the number of variables pushed afterwards.

op 0x5a: lea W type, W index ( bytes)op 0x5b: lea B type, B index ( bytes)Load Effective AddressThe variable type is a bit-field used as follows:bit 0

unused
bit 1-2

the number of the variable list to use

0 - globalVar
2 - localVar
4 - tempVar
6 - parmVar

bit 3

unused
bit 4

set if the accumulator is to be used as additional indexBecause it is so hard to explain, I have made a transcription of it here:

  < 
  short *vars[4];

  int acc;

  int lea(int vt, int vi)
  {
    return &((vars[(vt >> 1) & 3])[vt & 0x10 ? vi+acc : vi]);
  }

op 0x5c: selfID (1 bytes)op 0x5d: selfID (1 bytes)

Get 'self' identity: SCI uses heap pointers to identify objects, so this operation sets the accumulator to the address of the current object.

acc = object op 0x5eop 0x5fThese opcodes don't exist in SCI. op 0x60: pprev (1 bytes)op 0x61: pprev (1 bytes)Push prev: Pushes the value of the prev register, set by the last comparison bytecode (eq?, lt?, etc.), on the stack.push(prev) op 0x62: pToa W offset (3 bytes)op 0x63: pToa B offset (2 bytes)

Property To Accumulator: Copies the value of the specified property (in the current object) to the accumulator. The property is specified as an offset into the object structure.

op 0x64: aTop W offset (3 bytes)op 0x65: aTop B offset (2 bytes)

Accumulator To Property: Copies the value of the accumulator into the specified property (in the current object). The property number is specified as an offset into the object structure.

op 0x66: pTos W offset (3 bytes)op 0x67: pTos B offset (2 bytes)Property To Stack: Same as pToa, but pushes the property value on the stack instead. op 0x68: sTop W offset (3 bytes)op 0x69: sTop B offset (2 bytes)Stack To Property: Same as aTop, but gets the new property value from the stack instead. op 0x6a: ipToa W offset (3 bytes)op 0x6b: ipToa B offset (2 bytes)

Incement Property and copy To Accumulator: Increments the value of the specified property of the current object and copies it into the accumulator. The property number is specified as an offset into the object structure.

op 0x6c: dpToa W offset (3 bytes)op 0x6d: dpToa B offset (2 bytes)

Decrepent Property and copy to Accumulator: Decrements the value of the specified property of the current object and copies it into the accumulator. The property number is specified as an offset into the object structure.

op 0x6e: ipTos W offset (3 bytes)op 0x6f: ipTos B offset (2 bytes)Increment Property and push to Stack Same as ipToa, but pushes the result on the stack instead. op 0x70: dpTos W offset (3 bytes)op 0x71: dpTos B offset (2 bytes)Decrement Property and push to stack: Same as dpToa, but pushes the result on the stack instead. op 0x72: lofsa W offset (3 bytes)op 0x73: lofsa B offset (2 bytes)Load Offset to Accumulator:acc = pc + offsetAdds a value to the post-operation pc and stores the result in the accumulator. op 0x74: lofss W offset (3 bytes)op 0x75: lofss B offset (2 bytes)Load Offset to Stack:push(pc + offset)Adds a value to the post-operation pc and pushes the result on the stack. op 0x76: push0 (1 bytes)op 0x77: push0 (1 bytes)Push 0:push(0) op 0x78: push1 (1 bytes)op 0x79: push1 (1 bytes)Push 1:push(1) op 0x7a: push2 (1 bytes)op 0x7b: push2 (1 bytes)Push 2:push(2) op 0x7c: pushSelf (1 bytes)op 0x7d: pushSelf (1 bytes)Push self:push(object) op 0x7eop 0x7fThese operations don't exist in SCI. op 0x80 - 0xfe: [ls+-][as][gltp]i? W index (3 bytes)op 0x81 - 0xff: [ls+-][as][gltp]i? B index (2 bytes)

The remaining SCI operations work on one of the four variable types. The variable index is retrieved by taking the heap pointer for the specified variable type, adding the index and possibly the accumulator, and executing the operation according to the following table:

Bit 0Used as with all other opcodes with variably-sized parameters:

0: 16 bit parameter
1: 8 bit parameter

Bits 1,2The type of variable to operate on:

0: Global
1: Local
2: Temporary
3: Parameter

Bit 3Whether to use the accumulator or the stack for operations:

0: accumulator
1: Stack

Bit 4Whether to use the accumulator as a modifier to the supplied index:

0: Don't use accumulator as an additional index
1: Use the accumulator as an additional index

Bits 5,6The type of execution to perform:

0: Load the variable to the accumulator or stack
1: Store the accumulator or stack in the variable
2: Increment the variable, then load it into acc or on the stack
3: Decrement the variable, then load it into acc or on the stack

Bit 7Always 1 (identifier for these opcodes)

Example: "sagi 2" would Store the Accumulator in the Global variable indexed with 2 plus the current accumulator value (this rarely makes sense, obviously). "+sp 6" would increment the parameter at offset 6 (the third parameter, not counting the argument counter), and push it on the stack.

Notes


↑ See SQ3's inventory objects for an example

↑ Thanks to Francois Boyer for this information

↑ This is ignored by FreeSCI ATM, since all resources are present in memory all the time.

↑ Those can be used as an index into vocab.997, where the selector names are stored as strings.

↑ In practice, send looks up the heap position of the requested class in a global class table.

↑ Of course, "manual" invocation (using push and jump operations) could also be used, but there are no special provisions for it, and it does not appear to be used in the existing SCI bytecode.

↑ Obviously, SCI uses a call-by-value model for primitives and call-by-reference for objects

↑ This is quite interesting, the KRestartGame kernel call is implemented using a simple setjmp/longjmp pair.

↑ This appears to be the maximum size; the games generally require less heap space.

↑ FreeSCI calls this the "Program Counter" or PC, which is the more general term.

 
<  Previous: Chapter 4 - The Sound subsystemNext: Chapter 5 (Cont.) - Kernel functions >

[1] See SQ3's inventory objects for an example

[2] Thanks to Francois Boyer for this information

[3] This is ignored by FreeSCI ATM, since all resources are present in memory all the time.

[4] Those can be used as an index into vocab.997, where the selector names are stored as strings.

[5] In practice, send looks up the heap position of the requested class in a global class table.

[6] Of course, "manual" invocation (using push and jump operations) could also be used, but there are no special provisions for it, and it does not appear to be used in the existing SCI bytecode.

[7] Obviously, SCI uses a call-by-value model for primitives and call-by-reference for objects

[8] This is quite interesting, the KRestartGame kernel call is implemented using a simple setjmp/longjmp pair.

[9] This appears to be the maximum size; the games generally require less heap space.

[10] FreeSCI calls this the "Program Counter" or PC, which is the more general term.

[Note 1]

[Note 2]

[Note 3]

[Note 4]

[Note 5]

[Note 6]

[Note 7]

[Note 8]

[Note 9]

[Note 10]

SCI Specifications: Chapter 5 - The SCI Virtual Machine

Introduction

Script resources

Object segments

Code segments

Synonym word list segments

Said spec segments

String segments

Class segments

Export segments

Relocation tables

The Preload Text flag

Local variable segments

Selectors

Function invocation

Variable types

Interpreter initialization and the main execution loop

The SCI Heap

Heap structure

Memory handles

Initialization

Memory allocation

The Sierra PMachine

Local variables (LocalVar)

Global variables

Temporary variables

Parameter variables

Objects

The PMachine "registers"

The instruction set

Relative addresses

Dispatch addresses

Frame sizes

PErrors

Class numbers and addresses

The instructions

Navigation menu

Search