In-Circuit Emulation: How the Microprocessor Evolved Over Time

Thursday, June 11, 2009

Microprocessors weren't always designed with in-circuit emulation in mind. But as the microprocessors evolved, the need to support in-circuit emulation within the microprocessors became obvious. Without microprocessor support, it would be very difficult, if not impossible, to halt the microprocessor anywhere on a specified breakpoint event, let alone reconstruct an instruction disassembly trace. As time went on, many more emulation features were built into the microprocessor. On the 80186 a few pins were implemented. On the 80286, a few pins and a few instructions were implemented. The 80386 expanded these support pins, added a few more instructions, some debug registers, and a few special bus cycles. The 80486 refined these same features, while the Pentium scrapped them all, and completely redesigned all of the ICE features.

In the July "Undocumented Corner," I outlined the differences and advantages that in-circuit emulators held over their software-based debugging counterparts. I delved into the history of Intel's ICE offerings, and promised to continue my ICE discussion by describing how the microprocessor design has evolved with in-circuit emulation in mind. This month, I will continue my discussion of ICEs by describing these changes that occurred within the microprocessor.

ICE Evolution: Internal Hardware Support

Intel's x86 microprocessors seem to have made three evolutionary changes for ICE support. In the beginning, the ICE used a standard microprocessor. This type of ICE achieved its capabilities strictly by monitoring the microprocessor bus. I'm not sure exactly how breakpoints were signaled to the CPU. Most likely they were signaled through some type of non-maskable interrupt, or by asserting a pin that puts the CPU into a hold state. This type of ICE would include the 8088/8086.

The next type of ICE used a modified CPU, commonly called a "bond-out CPU." The bond-out CPU was an ordinary microprocessor with some extraordinary capabilities. This type of CPU was given its name because certain pins were bonded out from the silicon to the external microprocessor pins. These pins were marked as no-connects by Intel, and were usually not connected to anything on the ordinary production silicon. The bond-out version of the microprocessor connected some of these pins to pads of the silicon, giving the microprocessor its special ICE features. This would include all CPUs were produced for all Intel x86 processors from the 80186/88 to the 80486.

The third evolutionary change occurred with the advent of the Pentium processor, and remains in the Pentium Pro processor as well. These microprocessors have "Probe Mode," a special debug mode which is implemented by connecting a handful of signals to the ICE host. These signals are collectively called the probe port, or debug port. (Intel manuals most commonly use the term debug port.) Using the debug port, the ICE may communicate with the CPU at any time, but not necessarily perform all ICE functions until the CPU is in "Probe Mode."

Clearly, Intel was searching for the perfect balance of cost and features needed to support in-circuit emulation. A drastic evolutionary change occurred between the 8086 and 80186. Another big evolutionary change occurred between the 80486 and the Pentium. I believe Intel hit gold with their current ICE implementation. It's a very simple, elegant, and powerful solution. Intel reduced complexity without sacrificing a single ICE-mode feature - the best kind of trade-off.

ICE Evolution: Triggering and Resuming

I know very little about the earliest days of in-circuit emulation support for Intel x86 microprocessors. I believe the 8086 didn't have any direct support for in-circuit emulation. Instead, I believe that breakpoints were triggered by monitoring the bus, and signaling the CPU to hold, or generate a certain type of non-maskable interrupt. Minimal ICE support didn't begin until the 80186 implementation. Full-blown ICE support didn't begin until the 80286 implementation.

The 80286 didn't have any means to trigger a breakpoint on an exact trigger event. Instead, a breakpoint event was specified by the user to the ICE software. The ICE hardware would monitor the bus for this event and assert a special microprocessor pin to indicate when a breakpoint match had occurred. In return, the 80286 would halt emulation and enter ICE mode. Because of the lack of microprocessor breakpoint support, it was impossible to halt emulation at the exact point of the breakpoint event. Instead, the halt occurred a few instructions later. This was the best the 80286 could do. Once in ICE mode, the 80286 stored its internal microprocessor state to a hard-coded memory location within the ICE host hardware. For the purposes of this discussion, I'll call this special memory location the "state save map." The ICE host software could query and modify the internal processor state by modifying values in the state save map.

Resuming from ICE mode was accomplished by executing the undocumented LOADALL instruction. LOADALL restores the microprocessor state from the state save map that is saved during the transition from user mode to ICE mode. LOADALL loads enough of the microprocessor state to ensure return to any processor operating mode. LOADALL is a very powerful instruction in its own right, which explains why it has been the topic of many magazine articles (mine included), and chapters in booksAfter LOADALL is executed, the CPU exits ICE mode, and returns to user mode.

The 80386 greatly expanded the ICE support features. Like the 80286, the 80386 ICEs supported bus event breakpoints (those events that required monitoring the microprocessor bus signals to detect breakpoint events). However, the 80386 expanded its breakpoint capabilities by introducing a series of breakpoint event registers (collectively known as debug registers) and other debugging support features. The debug registers could store up to four different breakpoint events. When a code execution breakpoint event occurred, the microprocessor immediately stopped execution without overshooting the breakpoint event. This was a dramatic improvement over the 80286 bus-snooping method. The debug registers also had the ability to trigger a breakpoint on a limited set of memory-access events. In addition to the debug register breakpoints, the 80386 also had other hard-coded breakpoint events. There are various breakpoint events that can occur on the 80386 (and later) microprocessors, including:

  • An event match on any of the four debug registers.
  • An attempt to write to any of the debug registers when the DR7.GD bit equals one (bit 13 of debug register 7). This breakpoint type was primarily designed to prevent user code from modifying the breakpoint events (in the debug registers) while the ICE was in use.
  • Switching to a 386-style task whose T-Bit is set in the task state segment. (The T-bit is unique to 32-bit task state segments.)
  • Executing any instruction while the EFLAGS Trap Flag bit is set (EFLAGS. TF=1),
  • Executing the ICE BreakPoint instruction (ICEBP). This undocumented instruction was implemented specifically to allow for a convenient way to halt the ICE. It's strange that it has never been officially documented
  • Executing an INT-01 instruction (opcode CD 01).
Under normal operating conditions (when an ICE isn't connected), any of these events will cause the microprocessor to invoke the debug exception handler (exception type 01). However, when the ICE is connected, any of these events causes the ICE to halt emulation - that is, when the ICE is instructed to halt emulation. When the ICE is connected, these exceptions are optionally diverted to the ICE, instead of being vectored to the INT1 handler. This behavior is governed by an undocumented bit in the DR7 register - bit-12for a description of secret bits in the DR7 register.) This bit is automatically set by the ICE host software whenever the user issues a command that instructs the ICE to expect a breakpoint event to occur. Therefore, the user needn't know about undocumented debug register bits, or write any software to set them.

External breakpoints are triggered when the ICE hardware asserts an undocumented microprocessor pin. When asserted, the CPU finishes executing its current instruction, then immediately enters ICE mode. Like the 80286, the ICE hardware connects to the entire microprocessor bus, and has the ability to recognize bus events (such as special CPU cycles) and trigger the CPU as a means to enter ICE mode.

Resuming from ICE mode is accomplished in the same way as the 80286. The ICE host executes the 80386 version of the LOADALL instruction to exit from ICE mode and return to user mode. The 80386 version of the LOADALL instruction has a different data format and opcode than its 80286 counterpart.

The 80486 has virtually the same ICE support as the 80386. Entering and exiting ICE mode is virtually identical between these processors. A similar set of undocumented pins supports ICE breakpoint events, and the same LOADALL instruction is used to resume from ICE mode to user mode. Even though the ICE support is virtually the same, the 80486 had one major change - the LOADALL instruction could no longer be executed outside of ICE mode. The A-step of the 80486 allowed LOADALL execution in user mode, but Intel decided that was too dangerous. Instead, attempting to execute LOADALL in user mode caused an invalid opcode exception. When executing in ICE mode, LOADALL worked like it always did. The Pentium processor slightly expanded the debug architecture, and completely redesigned the ICE support features.

The Pentium processor sported a new breakpoint event - I/O breakpoints. An I/O breakpoint could be triggered when an I/O instruction read, wrote, or accessed a specific I/O port register. This was a welcome addition, though it was very limited in its abilities. If you wanted to specify a breakpoint on a specific port value, or port size (byte, word, or double-word), then a full-blown ICE with bus event recognition circuitry was needed.

The ICE support features of the Pentium processor were completely redesigned. The Pentium stiI1 has an internal and external means to enter ICE mode, but it is completely different from its older x86 brethren. From the outside world, special probe-mode instructions trigger the entrance and exit of ICE mode. Internally, the Pentium can be triggered into ICE mode in much the same fashion as the 80386 and 80486 (though the undocumented bits in DR7 don't have a role in this function). I'll cover the Pentium's new ICE features in my next column.

ICE Evolution: ICE Mode

Once a breakpoint is triggered on processors ranging from the 80186 to 80486, the CPU enters an alternate operating state called "ICE mode." During ICE mode, all memory cycles appear to stop, and the CPU appears to be dormant. This behavior is only a facade. Actually, the CPU is executing special ICE software. The CPU appears to be dormant because the normal pins that indicate bus activity - ADS and RDY - aren't used. In ICE mode, the microprocessor uses alternate ADS and RDY (bus cycle) signals. When viewing bus activity on a logic analyzer, the lack of the normal ADS/RDY handshake gives the appearance of the lack of bus activity.

Once in ICE mode, the microprocessor asserts an ICE-MODE output pin. This output pin is decoded by the ICE host hardware. Its purpose is to indicate to the ICE hardware to use an alternate address space. The state save map is saved to this alternate address space, and the ICE kernel code resides there too. Some people have described the ICE-MODE pin as a 33rd address line (which would be A32, in CPU nomenclature). When A32 is asserted, all memory accesses occur to this alternate memory space.

Even though ICE mode executes out of a protected memory space, the ICE kernel has the ability to read and write memory to the user's memory space. This is done through the UMOV undocumented instruction. UMOV was introduced on the 80386, and remained on the 80486UMOV transferred the data contents between a register and user memory. When executed, UMOV always asserted the standard ADS/RDY signals for the memory transfer. Using the standard ADS/RDY handshake guaranteed that user memory was always accessed. As a side-effect of using ADS/RDY, UMOV would operate like any other MOV instruction when executed outside of the ICE mode environment.

ICE Mode on the Pentium is completely different than any of its predecessors, and I will discuss it in my next column.

ICE Evolution: Code Tracing

Most ICEs have the ability to reconstruct

an instruction trace, which is done with the aid of the microprocessor. There are two components that aid in trace reconstruction:

  • A special (undocumented) bus cycle is generated - called a Branch Trace Message (BTM). The address and data bus of this special cycle contain the source and/or destination address of any instruction discontinuities. This data aids in trace reconstruction by allowing the trace reconstruction software to ignore spurious instruction pre-fetches that contain code that was never executed.
  • The length of the instruction is also helpful, though I don't think it is absolutely necessary for trace reconstruction.

The destination branch is emitted by the microprocessor as a BTM. BTMs are officially documented in the Pentium and Pentium Pro manuals, but they remain undocumented for the 80386 and 80486 microprocessors. The 80386 and 80486 both implement branch trace messages as special microprocessor bus cycles using the ICE-ADS signal as a qualifier. Qualifying BTMs with ICE-ADS enables the trace reconstruction software to easily recognize these special cycles, which are stored in the ICE trace memory. Generating these cycles is optional to the microprocessor, and may be enabled and disabled. For normal operation (without an ICE connected), the default operation is BTMs disabled. They are enabled and disabled on the 80386 and 80486 by writing to an undocumented bit in DR7 (bit 14 of debug register 7), or by writing to a bit in the TR12 register in the Pentium.

Knowing the instruction length is also very helpful in reconstructing the instruction execution trace. For this purpose, the 80286 through 80486 have a set of four pins that are sampled at every clock boundary. When an instruction completes execution, these four pins are encoded with the instruction opcode lengths. Whereas x86 instructions cannot exceed 15 bytes, four pins are precisely enough for a binary encoding of 0 to 15. These pins are always inactive on non-execution boundaries (when an instruction is currently executing). As far as I know, this feature was removed on the Pentium processor because the internal execution of the Pentium runs at a faster speed than its external input clock. The asynchronous relationship of the input clock to the core clock makes it is impossible to sample a similar set of pins on an instruction boundary.

ICE Evolution: Instruction Set Support

The 80286 was the first Intel x86 microprocessor to have special bond-out instructions. The LOADALL instruction was used for in-circuit emulation support, and possibly chip testing purposes. LOADALL continued its evolution on the 80386 and 80486 microprocessors. For these processors, LOADALL took on a new opcode (it changed from OF05 to OF07), had expanded functionality (saved and restored 32-bit registers, and so on), and had slightly different semantics (ES:EDI pointed to the LOADALL table instead of the table being hard-coded at a fixed address). The 80386 also added two more instructions that aided ICEs - ICEBP provided a software mechanism to trigger an ICE breakpoint event (though it remains undocumented today), and UMOV provided a means to exchange memory between ICE mode and USER mode. Most of this was removed in the Pentium and replaced with a completely different ICE architecture (though the ICEBP instruction still remains).

Conclusion

As you can see, the microprocessor has undergone many changes in support of in-circuit emulation. Special versions of the microprocessor, called bond-out versions, provided these specialized functions that appear to be non-existent in the production versions of the chip. In truth, the production versions are the same as the bond-out versions, except these ICE features aren't bonded out to the external pins of the microprocessor package. Just try setting DR7.bit12=1 and executing an INT-01 and see how fast you hang your 80386 or 80486. It hangs because the CPU is attempting to invoke the ICE hardware.

As the CPU evolved, extra pins were added, and extra opcodes were implemented - all in support of in-circuit emulation. Even though these features required quite a lot of work on Intel's part, they threw all of it away when they implemented the Pentium processor. The Pentium abandoned all of these prior ICE functions, and completely redesigned the ICE architecture. Next time, I will continue my ICE discussion by examining the Pentium, and discussing its version of ICE support.

0 comments:

Post a Comment