Trace-based
								Just-in-Time Type Specialization for Dynamic
							Languages
							Andreas
								Gal
							∗
							+
							,
								Brendan Eich
							∗
							,
								Mike Shaver
							∗
							,
								David Anderson
							∗
							,
								David Mandelin
							∗
							,
							Mohammad
								R. Haghighat
							$
							,
								Blake Kaplan
							∗
							,
								Graydon Hoare
							∗
							,
								Boris Zbarsky
							∗
							,
								Jason Orendorff
							∗
							,
							Jesse
								Ruderman
							∗
							,
								Edwin Smith
							#
							,
								Rick Reitmaier
							#
							,
								Michael Bebenita
							+
							,
								Mason Chang
							+#
							,
								Michael Franz
							+
							Mozilla
								Corporation
							∗
							{
							gal,brendan,shaver,danderson,dmandelin,mrbkap,graydon,bz,jorendorff,jruderman
							}
							@mozilla.com
							Adobe
								Corporation
							#
							{
							edwsmith,rreitmai
							}
							@adobe.com
							Intel
								Corporation
							$
							{
							mohammad.r.haghighat
							}
							@intel.com
							University
								of California, Irvine
							+
							{
							mbebenit,changm,franz
							}
							@uci.edu
							Abstract
							Dynamic
								languages such as JavaScript are more difficult to com-
							pile
								than statically typed ones. Since no concrete type information
							is
								available, traditional compilers need to emit generic code that
								can
							handle
								all possible type combinations at runtime. We present an al-
							ternative
								compilation technique for dynamically-typed languages
							that
								identifies frequently executed loop traces at run-time and then
							generates
								machine code on the fly that is specialized for the ac-
							tual
								dynamic types occurring on each path through the loop. Our
							method
								provides cheap inter-procedural type specialization, and an
							elegant
								and efficient way of incrementally compiling lazily discov-
							ered
								alternative paths through nested loops. We have implemented
							a
								dynamic compiler for JavaScript based on our technique and we
							have
								measured speedups of 10x and more for certain benchmark
							programs.
							Categories
								and Subject Descriptors
							D.3.4
								[
							Programming
								Lan-
							guages
							]:
								Processors —
							Incremental
								compilers, code generation
							.
							General
								Terms
							Design,
								Experimentation, Measurement, Perfor-
							mance.
							Keywords
							JavaScript,
								just-in-time compilation, trace trees.
							1.
								Introduction
							Dynamic
								languages
							such
								as JavaScript, Python, and Ruby, are pop-
							ular
								since they are expressive, accessible to non-experts, and make
							deployment
								as easy as distributing a source file. They are used for
							small
								scripts as well as for complex applications. JavaScript, for
							example,
								is the de facto standard for client-side web programming
							Permission
								to make digital or hard copies of all or part of this work for
								personal or
							classroom
								use is granted without fee provided that copies are not made or
								distributed
							for
								profit or commercial advantage and that copies bear this notice
								and the full citation
							on
								the first page. To copy otherwise, to republish, to post on
								servers or to redistribute
							to
								lists, requires prior specific permission and/or a fee.
							PLDI’09,
							June
								15–20, 2009, Dublin, Ireland.
							Copyright
							c
							©
							2009
								ACM 978-1-60558-392-1/09/06...$5.00
							and
								is used for the application logic of browser-based productivity
							applications
								such as Google Mail, Google Docs and Zimbra Col-
							laboration
								Suite. In this domain, in order to provide a fluid user
							experience
								and enable a new generation of applications, virtual ma-
							chines
								must provide a low startup time and high performance.
							Compilers
								for statically typed languages rely on type informa-
							tion
								to generate efficient machine code. In a dynamically typed pro-
							gramming
								language such as JavaScript, the types of expressions
							may
								vary at runtime. This means that the compiler can no longer
							easily
								transform operations into machine instructions that operate
							on
								one specific type. Without exact type information, the compiler
							must
								emit slower generalized machine code that can deal with all
							potential
								type combinations. While compile-time static type infer-
							ence
								might be able to gather type information to generate opti-
							mized
								machine code, traditional static analysis is very expensive
							and
								hence not well suited for the highly interactive environment of
							a
								web browser.
							We
								present a trace-based compilation technique for dynamic
							languages
								that reconciles speed of compilation with excellent per-
							formance
								of the generated machine code. Our system uses a mixed-
							mode
								execution approach: the system starts running JavaScript in a
							fast-starting
								bytecode interpreter. As the program runs, the system
							identifies
							hot
							(frequently
								executed) bytecode sequences, records
							them,
								and compiles them to fast native code. We call such a se-
							quence
								of instructions a
							trace
							.
							Unlike
								method-based dynamic compilers, our dynamic com-
							piler
								operates at the granularity of individual loops. This design
							choice
								is based on the expectation that programs spend most of
							their
								time in hot loops. Even in dynamically typed languages, we
							expect
								hot loops to be mostly
							type-stable
							,
								meaning that the types of
							values
								are invariant. (12) For example, we would expect loop coun-
							ters
								that start as integers to remain integers for all iterations.
								When
							both
								of these expectations hold, a trace-based compiler can cover
							the
								program execution with a small number of type-specialized, ef-
							ficiently
								compiled traces.
							Each
								compiled trace covers one path through the program with
							one
								mapping of values to types. When the VM executes a compiled
							trace,
								it cannot guarantee that the same path will be followed
							or
								that the same types will occur in subsequent loop iterations.
						Hence,
								recording and compiling a trace
							speculates
							that
								the path and
							typing
								will be exactly as they were during recording for subsequent
							iterations
								of the loop.
							Every
								compiled trace contains all the
							guards
							(checks)
								required
							to
								validate the speculation. If one of the guards fails (if control
							flow
								is different, or a value of a different type is generated), the
							trace
								exits. If an exit becomes hot, the VM can record a
							branch
							trace
							starting
								at the exit to cover the new path. In this way, the VM
							records
								a
							trace
								tree
							covering
								all the hot paths through the loop.
							Nested
								loops can be difficult to optimize for tracing VMs. In
							a
								na
							
								̈
							ıve
								implementation, inner loops would become hot first, and
							the
								VM would start tracing there. When the inner loop exits, the
							VM
								would detect that a different branch was taken. The VM would
							try
								to record a branch trace, and find that the trace reaches not
								the
							inner
								loop header, but the outer loop header. At this point, the VM
							could
								continue tracing until it reaches the inner loop header again,
							thus
								tracing the outer loop inside a trace tree for the inner loop.
							But
								this requires tracing a copy of the outer loop for every side
								exit
							and
								type combination in the inner loop. In essence, this is a form
							of
								unintended tail duplication, which can easily overflow the code
							cache.
								Alternatively, the VM could simply stop tracing, and give up
							on
								ever tracing outer loops.
							We
								solve the nested loop problem by recording
							nested
								trace
							trees
							.
								Our system traces the inner loop exactly as the na
							
								̈
							ıve
								version.
							The
								system stops extending the inner tree when it reaches an outer
							loop,
								but then it starts a new trace at the outer loop header. When
							the
								outer loop reaches the inner loop header, the system tries to
								call
							the
								trace tree for the inner loop. If the call succeeds, the VM
								records
							the
								call to the inner tree as part of the outer trace and finishes
							the
								outer trace as normal. In this way, our system can trace any
							number
								of loops nested to any depth without causing excessive tail
							duplication.
							These
								techniques allow a VM to dynamically translate a pro-
							gram
								to nested, type-specialized trace trees. Because traces can
							cross
								function call boundaries, our techniques also achieve the ef-
							fects
								of inlining. Because traces have no internal control-flow joins,
							they
								can be optimized in linear time by a simple compiler (10).
							Thus,
								our tracing VM efficiently performs the same kind of op-
							timizations
								that would require interprocedural analysis in a static
							optimization
								setting. This makes tracing an attractive and effective
							tool
								to type specialize even complex function call-rich code.
							We
								implemented these techniques for an existing JavaScript in-
							terpreter,
								SpiderMonkey. We call the resulting tracing VM
							Trace-
							Monkey
							.
								TraceMonkey supports all the JavaScript features of Spi-
							derMonkey,
								with a 2x-20x speedup for traceable programs.
							This
								paper makes the following contributions:
							•
							We
								explain an algorithm for dynamically forming trace trees to
							cover
								a program, representing nested loops as nested trace trees.
							•
							We
								explain how to speculatively generate efficient type-specialized
							code
								for traces from dynamic language programs.
							•
							We
								validate our tracing techniques in an implementation based
							on
								the SpiderMonkey JavaScript interpreter, achieving 2x-20x
							speedups
								on many programs.
							The
								remainder of this paper is organized as follows. Section 3 is
							a
								general overview of trace tree based compilation we use to cap-
							ture
								and compile frequently executed code regions. In Section 4
							we
								describe our approach of covering nested loops using a num-
							ber
								of individual trace trees. In Section 5 we describe our trace-
							compilation
								based speculative type specialization approach we use
							to
								generate efficient machine code from recorded bytecode traces.
							Our
								implementation of a dynamic type-specializing compiler for
							JavaScript
								is described in Section 6. Related work is discussed in
							Section
								8. In Section 7 we evaluate our dynamic compiler based on
							1
								for (var i = 2; i < 100; ++i) {
							2
								if (!primes[i])
							3
								continue;
							4
								for (var k = i + i; i < 100; k += i)
							5
								primes[k] = false;
							6
								}
							Figure
								1. Sample program: sieve of Eratosthenes.
							primes
							is
							initialized
								to an array of 100
							false
							values
								on entry to this code
							snippet.
							Interpret
							Bytecodes
							Monitor
							Record
							LIR
							
							T
							race
							Execute
							Compiled
							
							T
							race
							Enter
							Compiled
							
							T
							race
							Compile
							LIR
							
							T
							race
							Leave
							Compiled
							
							T
							race
							loop
							
							edge
							hot
							loop/exit
							abort
							
							recording
							fi
							nish
								at
							loop
								header
							cold/blacklisted
							loop/exit
							compiled
								trace
							ready
							loop
								edge with
							same
								types
							side
								exit to
							existing
								trace
							side
								exit,
							no
								existing trace
							Overhead
							Interpreting
							Native
							Symbol
								Key
							Figure
								2.
							State
								machine describing the major activities of Trace-
							Monkey
								and the conditions that cause transitions to a new activ-
							ity.
								In the dark box, TM executes JS as compiled traces. In the
							light
								gray boxes, TM executes JS in the standard interpreter. White
							boxes
								are overhead. Thus, to maximize performance, we need to
							maximize
								time spent in the darkest box and minimize time spent in
							the
								white boxes. The best case is a loop where the types at the loop
							edge
								are the same as the types on entry–then TM can stay in native
							code
								until the loop is done.
							a
								set of industry benchmarks. The paper ends with conclusions in
							Section
								9 and an outlook on future work is presented in Section 10.
							2.
								Overview: Example Tracing Run
							This
								section provides an overview of our system by describing
							how
								TraceMonkey executes an example program. The example
							program,
								shown in Figure 1, computes the first 100 prime numbers
							with
								nested loops. The narrative should be read along with Figure 2,
							which
								describes the activities TraceMonkey performs and when it
							transitions
								between the loops.
							TraceMonkey
								always begins executing a program in the byte-
							code
								interpreter. Every loop back edge is a potential trace point.
							When
								the interpreter crosses a loop edge, TraceMonkey invokes
							the
							trace
								monitor
							,
								which may decide to record or execute a native
							trace.
								At the start of execution, there are no compiled traces yet, so
							the
								trace monitor counts the number of times each loop back edge is
							executed
								until a loop becomes
							hot
							,
								currently after 2 crossings. Note
							that
								the way our loops are compiled, the loop edge is crossed before
							entering
								the loop, so the second crossing occurs immediately after
							the
								first iteration.
							Here
								is the sequence of events broken down by outer loop
							iteration: