CEF32 Component Writer's Guide
Last edit: 10-February-2007
This document describes how to quickly create CEF-compliant components that will work with CEF32. The emphasis here is pragmatic, and any unimportant philosophical design issues will be ignored in the interest of clarity and speed of implementation. The intended audience is programmers who are writing CEF components - specifically to work with CEF32.
We will attempt to avoid quoting extensively from the CEF specification, so you should have a basic familiarity with the specification, and a copy of it handy. Also, the CEF32 document describes technical issues with the CEF32 environment, such as font file formats, documentation on existing components, and how to write emulator (.cef) files.
Finally, CEF32 was written to provide several ready-to-use components for emulator builders, but the sources are included to provide examples of how to write compliant, working CEF components. Use the sources as templates for new components that you write.
Language of choice
Although many CEF32 components are written in Delphi (Pascal), there is no need for this other than programmer preference. However, the use of the COM standard in CEF will require a language with COM support. The main language that supports this is C++.
General considerations
It should be noted that the goal of CEF is to provide useful emulation. To that end, one should make sure that the emulation is low-level enough that the user cannot tell how low-level the implementation may be. But it should not emulate any detail lower than that, as it would cause unnecessarily poor performance. An excellent example of this abstraction of hardware is the CEF TKeyboard component. This does not work exactly like any keyboard that the author is aware of. The CEF abstraction is necessary to provide a general keyboard interface that will work with any emulation. But it is likely that emulators making use of TKeyboards (such as a TRS-80 emulator) will have to reinterpret the data from the keyboard in a way that provides a true emulation to the rest of the system.
In some cases, the use of an alternate interface, due to the software nature of the emulation, may even be preferable for the user. For instance, instead of a baud rate knob with 2 choices, allow a drop-down menu with multiple choices - one of which should always be "Auto". You can provide the knob as well, but that should be of secondary importance to the easier and more general menu control.
Components are implemented in DLL files.
Assume that your component will have multiple instances created and implement it appropriately. Of special note is global data in a DLL that is intended to serve a specific instance of a component - don't do it. Also, for Delphi programmers, the first form created is marked by VCL as the "application main form". When this specific form is destructed, the entire DLL image unloads. If there are multiple instances of the component supplied by the DLL, all of them will become invalid and calls to them will most certainly abort the whole application. To handle this situation, create an invisible form in the DLL initialization code. This becomes the "application main form" and is never touched elsewhere.
CEF32 is NOT multi-thread-safe. Do NOT call it from multiple threads unless you are sure that only one thread can ever call it at the same time.
Signal handling
Hardware uses many signals to synchronize activities between various hardware components. This is also true of CEF emulation. Certain very common signals are implied - such as memory read/write and I/O, which are inherent in the Read and Write methods of components. Other signals have no impact on the software environment and can be left out of the emulation. Sometimes more than one hardware component is emulated by a single CEF component (such as a CPU/FPU combination), in which case the signals between them are either not needed, or are entirely internal to the CEF component. On the other hand, some signals need to be communicated between various CEF components. A common example is Interrupt.
One way to propogate signals between components is to have the components simply call the Signal_Change_Notice method of the components that they are connected to. This is problematic in systems where the components are indirectly connected through other components. Consider the example in Figure 1.
Figure 1
In this example, if Component 3 sends a signal to Component 2, then Component 2 must propogate the signal on to Component 1 or else Component 1 will never see the signal. Obviously, Component 2 also needs to forward the signal from Component 1 to Component 3.
This approach, however, has some drawbacks. First, if you are writing general components for the use of anyone, all of your components need to be able to pass on all signals they receive since you don't know how they are connected to other components in someone else's emulator. Second, it complicates writing of components both in that each component must have code to propogate the signals, and also code to handling "ringing" (that is, knowing to ignore signals received while propogating signals). Ringing can occur because when a component receives a signal from another component and propogates it to all connected components, it may pass it back to the sending component. This can result in an infinite sending of the same signal back and forth between components. A better way to handle signals is to use the UI_Interface object, as in figure 2.
Figure 2
In this example, Component 3 sends a signal to the UI_Interface, which then passes it on to all components that have asked to be notified of signals (by calling the UI_Interface's Want_Signals method). This approach has the advantages of making component writing simpler, eliminating ringing (since the UI_Interface won't pass the signal back to the sending component), and it is more efficient since components which don't need/want signals do not have signals passed to them. Note that the UI_Interface makes two calls to the components, Signal_Change_Notice, followed by Set_Signal. However, Set_Signal is only called if the target component supports the signal. The UI_Interface determines this by attempting to convert the signal name to a signal index for the component. If the component returns -1 for the index, then the UI_Interface doesn't call the Set_Signal method.
Anatomy of an emulator
Before we write an emulator for any computer, we must identify the main hardware components of that computer. Let's take the simple example of a DEC Rainbow-100 computer from the 1980s. It came with either a color RGB monitor or a monochrome monitor, a DEC LK201 keyboard, and a system box. In the system box were two processors: a Z80 and an 8088. It had a maximum memory capacity of 1 Mb (896Kb of RAM, and the rest was ROM), a serial port, and dual RX50 floppy disk drives. An optional graphics card was available. Also, an optional hard disk controller and drive were available. For simplicity of the initial emulator, we will ignore optional hardware.
We will need an LK201 keyboard component, which is included in the standard CEF32 distribution.
We will also need a monitor component. In addition, we will need a font file that contains the character set used by the Rainbow. For a first step, we can make use of the generic Panel component.
We will need up to 896 KB of RAM. For this, we can use an existing CEF32 memory component. But for the ROM, here is the difficult part of creating any emulator. Even on obsolete hardware, the ROM images are usually still owned by someone and it may be difficult or impossible to obtain a legal copy of one. However, if you have a working model of the computer being emulated, it should not be too difficult to extract this information directly from that machine. In the worst-case scenario, you may need to write an equivalent ROM program yourself.
The 8088 is simply an 8086 with an 8-bit bus, so we can get away with either a CPU emulator for the 8088 or the 8086. An existing Z80 CPU emulator exists for CEF32, so we can use that for the Z80.
We will need an RX50 controller component.
Finally, we will need to create a Rainbow-100 system hardware emulator component. Next to CPU components, these are often the most difficult to create because they do so much work. In the case of the Rainbow, we need the system component to provide the serial port, the MHFU circuit, all the logic necessary to switch control between the two CPUs, etc. This is where having detailed technical descriptions of the hardware is essential.
Once we have created, or found, all the required components, we need to write an emulator command file (.cef) which CEF32 can use to automatically construct our virtual computer. The CEF32 document describes these files in detail so we will not cover that here. Basically, the command file indicates which components to connect to each other, how to initialize them, and where to start execution.
Once we have a working Rainbow emulator, we may wish to add the optional graphics card, disk controller, and even an LA50 printer.
We will discuss individual components in the following sections, but now you should have an idea of what is required for emulation of a computer system.
Terminals and printers
Terminals should be implemented as TCable components as a general rule. Why? Because the only access to a terminal from any other part of a computer system is via the serial (or parallel) interface. Since that interface is a cable end, it is simplest to implement the terminal as a cable. It is a good idea to make a User_Interface interface to your terminal as well.
Printers and hardcopy terminals are similar to video terminals in all respects
except the following:
Make use of the CEFUtil utility class TCharacter_Set so that the user can make use of any font he wishes for the device, unless this is clearly unworkable due to oddities in the emulated hardware.
Default settings for terminals and printers (unless specified otherwise by the emulated hardware):
Baud = 0 (Auto)
Flow control = None
Keyboards
Keyboard components are one of the simplest CEF components to implement. However, there are a few useful hints that can be provided here.
First, realize that the point of a keyboard component is to allow the user to enter special keystrokes, or to remind them of the available options. Ideally, the user will use the keyboard of the computer which is doing the emulation. This may include key-mapping function keys.
CEF uses a convention of a "NKP_" prefix for otherwise normal characters that occur on a numeric keypad. Thus "1" indicates a normal "1" key, but "NKP_1" indicates a "1" key from a numeric keypad. Likewise, "LEFT_" prior to a key name indicates the left of two instances of a key (for instance, SHIFT), and "RIGHT_" indicates the right of the two keys.
It is useful to allow the user to specify which keyboard is used for a given emulation. But to maximize this capability, follow these rules: When receiving data from a keyboard, the receiver should try to handle simple variants of key names. For instance, "SETUP", "SET-UP", and "SET UP" should all be recognized as the same "key". Of course, if the receiver doesn't handle any form of "SETUP", then this can be ignored. But consider that the same condition exists with other keys, such as "LINEFEED" and "LINE FEED". Therefore, it is best to have the receiver strip all spaces and dashes from key names and then compare against that normalized value.
CPUs
CPUs are, by far, the most difficult CEF component to write - as they are also the most complicated piece of any computer. To make it easier to create these components, many features (or aspects of features) are optional. However, the more optional features that are provided, the better the user-experience. Some of the features that can be provided to the user include dsiassembly view (a view of source associated with object code), immediate mode execution, register view, watchpointing, and Stack views.
There are four major, independant, functions of a CPU component:
I. Execution
Proper execution of the instructions is the most essential function of a CPU component. Without this, nothing else matters. The first concern is the register set. The registers are accessed via Examine/Deposit, execution, and watchpoints. For these reasons, it is best to provide Get and Set functions for each register so that watchpoints can easily be implemented. However, one should be careful in the implementation of instruction execution so that no more than one watchpoint is triggered on a given register per instruction execution. As a simple instance, an instruction to increment a register would require reading the register, incrementing the value and writing it back out. In this case, since an increment is logically only changing the register we don't want to trigger a read watchpoint, but we may trigger a write watchpoint. So we read the register variable directly to avoid watchpoint checks in the Get function, but use the Set function to change the value. Also, by convention, we only trigger a watchpoint when the value is changed from its current value. Thus, if we update a register with its current value, no watchpoint should be triggered. Finally, Examine and Deposit should never trigger a watchpoint.
Insructions each take a certain amount of time to execute. In order to support software which makes assumptions about the amount of time it takes to execute code (such as timing loops) as well as supporting profiling, after each instruction executes, the master clock should be called to block the CPU for that time (in nanosecond increments). The CPU then sets a flag and doesn't execute any other instructions until the clock unblocks it. The clock will call the CPU's Wake method to indicate that it is no longer blocked. It is especially important that the CPU's block flag be set before calling the clock's Block method since, depending on the clock's mode, Wake may be called before the Block method returns to the CPU. If the flag is set after the call to Block, the CPU will remain blocked forever.
Instructions are normally executed from memory, however, the CPU's Execute method can also be passed a data stream object. The stream is a source of read-only serial data that is used to execute CPU instructions under certain conditions, such as immediate mode execution (see following text).
The execution can also be specified as single-step, which means only one instruction will execute before the function returns. Note that execution from a data stream is always a single instruction execution. Some CPUs may support single-step operations inherently, which is something different than CEF single-stepping. CPUs with inherent single-step support have state information to support the feature (for debuggers running on that CPU, for example). When the single-step flag is passed to the Execute routine, the routine must not affect the CPU's single-step state.
II. Assembly
Assembly of instruction mnuemonics is not strictly required. However, if this feature is not implemented, the person using the emulator will also have to have a separate cross-assembler. Fortunately CEF helps us significantly in writing an assembler due to the support of the Master Assembler, an instance of which is provided by CEF32. Note that the assembler is also used determine instruction length and layout for the disassembly view. Finally, if the UI supports immediate mode execution, as does CEF32, the assembler is required. Immediate mode execution allows the user to enter an instruction mnuemonic, have the assembler assemble it to a data stream, and then have the CPU execute the instruction from that data stream.
In fact, all assembly is to a data stream. This is used to provide a level of indirection so that the assembled code can be directed anywhere the caller desires, whether into memory, into a file, or into a buffer for immediate mode execution.
For CPUs which provide virtual address support (where the physically addressable memory exceeds the memory available to a program running on that CPU without remapping operations), it is recommended that the size of the Program Counter (Instruction Pointer) used for assembly be as large as the maximum physical memory so that code can be assembled into higher memory addresses.
When assembling an instruction, the assembler needs to save segment information. This is information that can be queried from the UI for better disassembly display. This ability is not required, but is useful. Many instructions are made up of both op-codes and one, or more, operands. Often these start on byte or word boundaries, but this is not universal. Let's take the case of an imaginary CPU with a ZERO instruction that takes one memory address. We'll say that this instruction consists of a 1 byte op-code, and a 4-byte address. FOr the disassembly view to properly show the op-code and address as separate items, the assembler needs to be able to indicate that the instruction has two segments: an 8-bit (op-code) and a 32-bit (address). The segment information always applies only to the most recently assembled instruction. Without this information, the disassembly view would display a single 5-byte (40-bit) value.
The assembler is single-pass. This means that forward references (references to undefined symbols) must be specified to the master assembler so that the proper values can be backpatched when the assembly is finished. However, in assembly of immediate-mode execution instructions, the assembler must deliver a complete, valid, instruction for immediate execution. Thus, references to undefined symbols is an error condition in immediate mode.
III. Disassembly
This is actually more important than assembly since the CEF32 interface has a disassembly tab that allows the user to view the contents of program memory. This view is rendered nearly useless without disassembly support.
IV. Component support
This is a broad category of functionality with includes normal component support for CEF compliance, plus optional profiling.