COVER FEATURE
PA-RISC to
IA-64:
Transparent Execution, No Recompilation
Cindy Zheng
Carol
Thompson
Hewlett-Packard
Dynamic
translator
The
dynamic translator translates a basic block of PA-RISC instructions
into dyncode and stores the translated code into the Aries code cache
for subsequent use. It has four subcomponents.
PA-RISC
preprocessor. The PA-RISC preprocessor scans through each PA-RISC
instruction in the block and records useful information for
subsequent code generation. It also performs some pretranslation
optimizations that are specific to PA-RISC architecture. For example,
most PA-RISC arithmetic instructions generate carry/borrow bits that
are rarely used. To minimize the redundant generation of carry/borrow
bits, the preprocessor tracks information about where a resource
(like a register) is defined and where it is being used in an
execution sequence. Thus, the code generator produces only the
necessary carry/borrow bits. The preprocessor also performs constant
and copy propagation to reduce dependencies among PA-RISC
instructions so that the scheduler can exploit more ILP.
Code generator.
The code generator translates the preprocessed PA-RISC instructions
into native IA-64 instructions. Because Aries maps all PA-RISC
general registers onto designated IA-64 registers in dyncode, the
code generator can use the corresponding IA-64 registers to reference
the PA-RISC general registers directly. This eliminates the need to
fetch register values from memory. The code generator also resolves
any mode differences between PA-RISC and IA-64 processes. For
example, if Aries is emulating a 32-bit PA-RISC application, it must
adjust address references to 64 bits. For each memory-related PA-RISC
instruction, the code generator must generate an extra IA-64
instruction, addp4, to do the conversion before it generates a load
or store instruction. This process is called address swizzling.
Optimizer and
scheduler. The code generator then passes the IA-64 instructions
to the lightweight optimizer for optimizing and scheduling. Because
it performs optimizations at runtime, they must be fast and
effective. Aries uses several techniques to promote efficient
optimization, including, among others:
•
Dead code elimination. Aries removes redundant instructions to
reduce the final translated code size.
•
Address swizzling reduction. Aries replaces certain addp4/load or
addp4/store instruction pairs with a single load or store instruction
to reduce total code size and total execution cycles. Figure 4 shows
how the optimizer uses address swizzling reduction to optimize a
sequence of consecutive memory access instructions.
•
Memory aliasing reduction.
Aries distinguishes the memory access instructions it generates for
register fetching from the normal memory instructions it generates
for the emulated PA-RISC application. These two types of memory
instructions access different memory segments and do not overlap.
Aries can safely move one type of memory instructions across the
other type to improve ILP.
Aries’ list scheduler bundles
instructions for each generated IA-64 block so that all instructions
fit into IA-64 templates. It starts by building a directed acyclic
graph (DAG) to capture all IA-64-specific and
microarchitecture-specific dependencies such as write-after-read
(WAR), read-after-write (RAW), and write-after-write (WAW) hazards
between instructions. On the basis of the DAG, it then selects
instructions that are free to be scheduled in each cycle. Finally,
the scheduler uses a state machine to bundle instructions in each
cycle, inserting NOPs (no operations) as necessary. It also
uses a heuristic to reduce the number of NOPs inserted.
Instruction
packer. The instruction packer packs the scheduled IA-64
instructions into binary code and writes it into the Aries code
cache. The Aries runtime module then updates the address map table to
reflect the state change for the translated PA-RISC block. For
subsequent emulations of that block, the Aries runtime will use the
translated code instead of invoking the interpreter.
Aries
implements a backpatch technique that allows a dyncode block
to directly branch to another dyncode block without going through a
target lookup, making it more efficient to transition between blocks.
The Aries runtime module keeps track of the dyncode block that has
just been executed. If the next block to be executed is the target
block of the previous one, Aries modifies the final branch
instruction in the previously executed dyncode block so that it can
jump directly to the target dyncode block.
Aries
can also translate dynamically generated code and self-modifying code
in an emulated PA-RISC application, treating both types of code as
regular PA-RISC blocks in an emulated application. When Aries
encounters a sync instruction, which indicates the existence of
self-modifying code, it simply erases the current content of the code
cache so that subsequent emulations will not use any translations of
the old code.
Environment
emulation module
The
most common system services that the environment emulation module
must handle are system calls and signal delivery.
System calls.
All HP-UX system calls enter kernel space through a common
system-call-gateway page. The environment emulation module captures
system calls made in an emulated PA-RISC application at the gateway
page and calls the corresponding emulation routines. Most system-call
emulation routines are simple stubs that invoke the native system
calls directly on the IA-64/HP-UX platform. Other system calls
require special handling before the native system calls are made. For
example, when a thread in a multithreaded PA-RISC application
requests the operating system to suspend another thread, Aries cannot
simply pass this request to the underlying IA-64 kernel because it
could cause a deadlock on shared Aries resources. Aries must first
acquire all the Aries shared resources before sending the native
suspension request to the kernel.
Signal
delivery. Signal delivery also requires special handling. The
HP-UX operating system can deliver both synchronous and asynchronous
signals to a PA-RISC application. It delivers a synchronous signal,
such as a protection violation on a load, immediately to an
application at the instruction that caused the exception. An
asynchronous signal, such as a kill or suspend, on the other hand, is
not associated with a particular instruction, so the system can
deliver it to the application any time, and the time at which it
arrives may differ from run to run.
Aries
registers a master signal handler to handle the delivery of all
signals it receives, whether synchronous or asynchronous, to an
emulated PA-RISC application. When the HP-UX kernel detects an
exception that a PA-RISC application generates, it delivers a signal
to the Aries process that is emulating the PA-RISC application by
invoking the Aries master signal handler. The signal handler then
determines how the signal should be delivered to the emulated
application. Aries does not always deliver asynchronous signals as
soon as they occur. Instead, it queues up the asynchronous signals it
receives and delivers them to the emulated application at the
earliest locations where it can construct a correct PA-RISC signal
context for the emulated application. Aries handles the synchronous
signal delivery immediately. When Aries receives a synchronous signal
in dyncode, where the PA-RISC signal context may not be up to date,
Aries constructs a recovery block for the dyncode. It then executes
that recovery block to synchronize the PA-RISC context before
delivering the signal to the emulated application.
Verifying
Emulation and translation
One
of Aries’ most important goals is to emulate all user-level
PA-RISC applications on IA-64 platforms—including applications not
yet developed. It is impossible to even run all the available PA-RISC
applications on Aries to verify its emulation correctness. Moreover,
most existing PA-RISC applications are compiler generated. Because
compilers use only a subset of the ISA to generate executables, it
would be hard to get 100 percent coverage on ISA emulation using
application testing. We therefore adopted other ways to verify Aries.
We
developed a random testing framework to stress test the correctness
of Aries ISA emulation. We used this framework to randomly generate
PA-RISC instruction sequences and then execute each instruction
sequence twice—once on a PA-RISC processor and once under Aries
running on an IA-64 system. We compared the final states and labeled
any inconsistency between them as an Aries emulation failure. With
this framework, we thoroughly verified all ISA emulations, including
scenarios that can never be generated in a real application.
We
also built a runtime cross-verification mechanism into Aries so that
we would know the exact location of any emulation failure. This is
important when the application is large and complex, because a
failure may not show up in an identifiable format (such as output)
until some time after it has occurred. We can identify the
instruction block where Aries emulation failed. With this mechanism,
Aries can run any large and complex application and report emulation
failure at the exact place it occurred—without user intervention.
In
contrast, verifying translation correctness has been a challenge,
because we did not have a real IA-64 system when we developed Aries.
To overcome this problem, we injected an IA-64 instruction emulator
into Aries to act as an execution bed only for dyncode. We built the
rest of Aries’ emulation components—the interpreter, dynamic
translator, environment emulation module, and runtime module—as a
PA-RISC application and ran them on a PA-RISC platform. This
verification approach improved our dyncode testing efficiency by up
to 300 times, compared to the traditional verification method using a
full-blown IA-64 simulator.
Aries
can emulate most user-level applications built for HP-UX/PA-RISC
systems, including ones still under development. However, there are a
few exceptions. For example, it cannot correctly emulate a debugger
built for HP-UX/PA-RISC systems because of optimizations in the
translated code. Also, it cannot yet emulate applications that link
in both PA-RISC and IA-64 shared libraries.
We
view dynamic translation as an important migration method, but
software migration is only one of the many areas that could benefit.
This technology could aid runtime instrumentation and profile
gathering in a performance analysis toolkit, for example. It could
also become central to runtime optimization in products such as the
Java virtual machine. In the meantime, we see it as essential in
helping HP customers enjoy an effortless and successful transition to
more powerful IA-64 systems.
Cindy (Qinghua)
Zheng
is a senior software design engineer in the Adaptive Systems section
at Hewlett-Packard’s Enterprise Java Lab. She received a BSc and an
MEng in electrical engineering and computer science from the
Massachusetts Institute of Technology. Contact her at cindy_zheng@hp.com.
Carol
Thompson is the C/C++ compiler architect in Hewlett-Packard’s
Development Environment Solutions Lab. Her background includes
optimization and architecture definition for the IA-64 and PA-RISC
architectures. She received an MS in computer science from the
University of California, Davis. Contact her at carol_thompson@hp.com.
Computer Home
Send general
comments and questions about the IEEE Computer Society's Web site to webmaster@computer.org.
This site and all
contents (unless otherwise noted) are Copyright
© 2000, Institute of Electrical and Electronics Engineers, Inc. All
rights reserved.
|