Preface
Sapphire is a toy compiler framework, heavily inspired by both LLVM and the Cranelift compiler in Wasmtime.
Both of those are much better projects to look at for modern optimizing compilers than this is, you can find the actual source code at the following:
API Documentation
This book does not contain any API documentation for the Sapphire library. If you want that, you need to go here.
SIR
SIR is the intermediary language that everything1 inside of Sapphire speaks in. All optimizing transformations
are SIR -> SIR, all analyses work on SIR, etc. SIR is your standard SSA-based IR with memory operations, the only
remotely interesting thing is that it uses block parameters for φ functions instead of a phi
-like instruction.
fn i32 @loopFactorial(i32) fastcc {
entry(i32 %n):
%0 = iconst i32 1
br loop.head(i32 %n, i32 %n)
loop.head(i32 %x, i32 %y):
%1 = icmp eq i32 %x, %0
condbr bool %1, loop.body, exit(%y)
loop.body:
%2 = isub i32 %x, %0
%3 = imul i32 %y, %2
br loop.head(%2, %3)
exit(i32 %result):
ret i32 %result
}
The details of the IR can be found here.
[1]: Except the back-end, that deals in a machine-specific IR instead
Compiler Internals
Details of the compiler's inner workings can be found in the Internals section.
Sapphire IR (SIR) Reference
type %boxed.i32 = { ptr }
fn ptr @boxed.allocate(i64 %0)
fn { ptr } @boxed.add.i32({ ptr } %0, { ptr } %1) {
entry:
%2 = extract ptr, { ptr } %0, 0
%3 = extract ptr, { ptr } %1, 0
%4 = load i32, ptr %2
%5 = load i32, ptr %3
%6 = iadd i32, %4, %5
%7 = iconst i64, 4
%8 = call @boxed.allocate(i64 %7)
store i32 %6, ptr %8
ret ptr %8
}
Structure
Memory Model
Bytes, Fundamental Sizes
Bytes in SIR are 8-bit, all integers are 2's complement and have bit-width/sizes that are powers of two.
Alignment
All objects are aligned to powers of 2. Unaligned access is undefined behavior, unless it is through specific means provided by the IR.
Formally, the alignment \( A \) of any object \( O \) is an integer \( 2^N \) such that \( N \in \mathbb{N} \).
Memory & Reachability
The memory available to a program is considered to be an array of i8
of an unspecified length. Each byte in memory is initially considered to be unreachable.
A given section (a contiguous subset of the larger available memory) of memory is considered to be reachable if one of the following conditions is true:
- The memory has been returned from an implementation-defined allocation function, such as
malloc
,alloca
or::operator new
. - A pointer to the memory is made available to the program through the IR, i.e. a global was used (globals are available to the program as
ptr
values), analloc
instruction evaluated to a givenptr
, etc. - The implementation exposes the memory through known integral addresses (e.g the implementation guarantees that all memory is ‘reachable’ because its freestanding, the implementation guarantees that a section is available at address
0xdeadbeef
, etc.)
Object Storage
A given section of memory can be used as the storage of an object, however SIR does not enforce any restrictions to what can be stored in a given section based on type alone (other than that the size of the object must be \( \le \) the size of the section).
Storage is simply considered to be a set of bytes that happen to store a given set of values.
Types
Function Signatures
These are not quite "types"
Aggregates v. Primitives
Primitive types are the atomic values of the IR, e.g. integers, booleans, floats and the like. Aggregates are complex types that are made up of one or more primitives (or other aggregates). Each value that is contained in an aggregate is said to be a member the aggregate, with each member being identified by an index (starting at zero).
bool
: Booleans
bool
models the pure idea of a boolean, with two states: true
and false
.
They are not 'integers' as they are in LLVM and some other IRs, although they can be
converted into integers and back easily with the btoi
and itob
instructions.
All bool
s are exactly byte sized and byte aligned, i.e. they are equivalent to an i8
in size, alignment, and other storage expectations.
iN
: Integers
iN
models the concept of two's complement integers, and acts the same way that
integers do on the vast majority of computers. Unlike integers in most languages, they
are sign-less. Note that this does not mean unsigned, instead, sign is determined by the
instruction. This more closely models the hardware, and allows more granular control over
the behavior / guarantees that are desired.
Integers are in the form iN
, such that \( N \in {8, 16, 32, 64} \). Support for
wider integers (i.e. i128
) or arbitrary-width integers (i.e. i37
) may come in the future.
The size of an integer is exactly as many bytes as is required to store \( N \) bits,
and the alignment of an integer is exactly the same as its size.
The endianness of integers is unspecified, but is defined by a given target.
fN
: Floats
fN
model floating-point numbers that follow the IEEE-754 standard. Currently, only two
are supported:
f32
: IEEE single-precision, i.e.binary32
f64
: IEEE double-precision, i.e.binary64
In the future, others may be supported (e.g. f128
for IEEE quads).
The size of a float is exactly as many bytes as is required to store \( N \) bits,
and the alignment of a float is exactly the same as its size.
ptr
: Pointers
ptr
models the idea of a pointer, and nothing else. Unlike many high-level languages, pointers
are untyped.
[T x N]
: Arrays
[T x N]
models contiguous blocks of storage, approximately equivalent to arrays in C. Arrays are considered to be aggregate types, and thus aggregate operations can be used on them. Each array index is considered to be a distinct aggregate member.
an independent element.
They take up exactly sizeof(T) * N
bytes of storage, with alignment that is equal to alignof(T)
.
Note that this means that arrays that satisfy one of the following conditions are zero-sized:
N == 0
sizeof(T) == 0
[i8, 0]
[{}, 64]
[i64, 512]
[[[i64, 16], 16], 16]
{ T... }
: Structures
{ T... }
models a struct
in C, and conform to the same ABI as C structures would for a given target. They are aggregate types, and thus each member of the structure is one member of the aggregate and can be accessed using aggregate instructions. Members are indexed according to their order, i.e. given the structure { ptr, i32, f64 }
, ptr
is at index 0
, i32
is at index 1
, and so on.
Structures are padded, and their size is thus determined by the order of each element, and the size/align of each element. Note that structures satisfying one of the following conditions are zero-sized:
- No elements
- Every element is zero-sized
{ ptr, i64, i64 }
{ }
{ i32, i32 }, f64, [i8; 64] }
Functions
Functions are made up of a name, a call signature, a list of stack slots, and a list of basic blocks.
Call Signature
Return Value
The return value can have the noalias
and nonnull
attributes. They work the same as with arguments.
Function Arguments
Function arguments can have a type and an attribute list, the following attributes exist and can all only be applied to pointers.
noalias
This says that a pointer will not alias with any other pointers accessible to the function it is passed to, i.e.
there is no way it can get another pointer that aliases the same memory (without itop
/offset
/etc).
If the function does get memory that aliases the
noalias
pointer, the behavior is undefined.
nonnull
This says that the pointer is guaranteed to not be null.
If the pointer is null, the behavior is undefined.
byval(<n>)
Note: This attribute is not intended for public use.
The reason is that this exists exclusively for internal ABI legalization code to generate, not for user code to generate.
If you want to get this behavior, just pass an aggregate type by-value and the backend will handle it.
This is a way of encoding a specific ABI constraint, i.e. "by-value" passing on the stack.
When this is used, any callers are expected to push <n>
bytes onto the stack, the pointer value
itself is the beginning of that stack allocation.
Stack Slots
Stack slots are how stack memory is allocated in SIR. They explicitly mark all the (static) stack memory that will be needed by a function, all this memory is allocated before the entry block of a function is entered.
$name = stack T
T
is what defines the specific stack slot, as the slot is allocated to have exactly enough space for a T
, and
has the correct alignment for a T
.
Note: While data of types besides
T
can be stored into/read from the slot, the type of the data must fit within the layout bounds ofT
. Do keep in mind that doing so will prevent the data from being promoted into virtual registers.If it does not, the behavior is undefined due to out-of-bounds accesses or unaligned accesses.
Pointers into this memory are obtained through the stackslot
instruction, which yields a ptr
to a specified slot.
Stack memory is not guaranteed to be maintained unless it escapes a function, it simply provides a way for languages
like C to easily represent variables and whatnot. Stack memory can be legally promoted into SSA values provided
the pointer value is not observed in any way (i.e. is not used in any way besides either load
ing from that memory
or store
ing to that memory).
Consider this implementation of max
in C:
int max(int x, int y) {
if (x < y) {
return y;
} else {
return x;
}
}
A C frontend could translate it naively into code that uses stack slots for every variable and the return value, and simply generates loads/stores when those values are accessed/modified. This allows the front-end to be much simpler, and the middle-end can use the correct algorithms it already has to promote these values into registers where possible.
One such translation looks like this:
fn i32 @max(i32, i32) {
$x = stack i32
$y = stack i32
$ret = stack i32
entry(i32 %0, i32 %1):
%x = stackslot $x
%y = stackslot $y
%ret = stackslot $ret
store i32 %0, ptr %x
store i32 %1, ptr %y
%5 = load i32, ptr %x
%6 = load i32, ptr %y
%7 = icmp slt i32 %5, %6
condbr bool %7, if.then, if.else
if.then:
%8 = load i32, ptr %y
store i32 %8, ptr %ret
br exit
if.else:
%9 = load i32, ptr %x
store i32 %9, ptr %ret
br exit
exit:
%10 = load i32, ptr %ret
ret i32 %10
}
While this is very ugly code, it's also very fast for a front-end to generate and is obviously correct. This
can then be optimized by Sapphire with mem2reg
(the stack -> register promotion pass) into the code that the
front-end wanted:
fn i32 @max(i32, i32) {
entry(i32 %x, i32 %y):
%0 = icmp slt i32 %x, %y
condbr bool %7, if.then, if.else
if.then:
br exit(i32 %y)
if.else:
br exit(i32 %x)
exit(i32 %ret):
ret i32 %ret
}
Other optimizations could turn this into sel
, but the initial transform into SSA values (and therefore virtual registers)
is the key one.
Basic Blocks
A basic block is a container for a list of instructions, all of which are executed in the order they appear in a given block.
This effectively makes basic blocks a small-scale linear IR, since a given basic block contains exclusively IR that is executed linearly without branching.
; implements the quadratic formula, given f64 %a, f64 %b and f64 %c
block:
%0 = fneg f64 %b ; -b
%1 = fconst f64 2.0
%2 = call f64 @pow(f64 %b, f64 %1) ; b^2
%3 = fmul f64 %a, %c ; ac
%4 = fconst f64 4.0
%5 = fmul f64 %3, %4 ; 4ac
%6 = fsub f64 %2, %5 ; b^2 - 4ac
%7 = call f64 @sqrt(f64 %6) ; sqrt(b^2 - 4ac)
%8 = fmul f64 %a, %1 ; 2a
%9 = fadd f64 %0, %7 ; -b + sqrt(b^2 - 4ac)
%x1 = fdiv f64 %10, %1 ; (-b + sqrt(b^2 - 4ac)) / 2.0
%10 = fsub f64 %0, %7 ; -b - sqrt(b^2 - 4ac)
%x2 = fdiv f64 %10, %1 ; (-b - sqrt(b^2 - 4ac)) / 2.0
; x1 = + solution
; x2 = - solution
Basic blocks have exactly one terminator, these are the last instruction in a given basic block. They are some sort of branching-ish instruction that moves control somewhere else.
Basic block can also contain one or more parameters, these implement the φ (phi
) function found in SSA-based IRs. When
jumping to a block with a parameter, different control flow paths can pass different values for the parameter, effectively
implementing phi
s while automatically enforcing the ideal phi
properties just through the structure of the IR.
Note: This avoids the usual special-casing of instructions in transform passes, LLVM has to treat
phi
as magic and move it around differently than anything else, but it's still an instruction.This also lends well to "magic" instructions, so things like
landingpad
andinvoke
would be representable in a normal way instead of adding magical rules like LLVM had to.
Consider the following IR:
entry:
condbr bool %0, one, two
one:
%1 = iconst i32 16
br merge(%1)
two:
%2 = iconst i32 24
br merge(%2)
merge(i32 %3):
ret i32 %3
It correctly implements the following C code:
int x;
if (%0)
x = 16;
else
x = 24;
return x;
Instructions
Miscellaneous
'call
‘ - Call Function
Calls a function, passing zero or more arguments to that function and eventually returning when the called function returns.
fn i32 @add(i32, i32)
fn void @do_something()
fn i32 @something_else() {
%entry:
%0 = call i32 @add(i32 0, i32 1)
call void @do_something()
ret i32 %0
}
Calls to void
functions cannot be bound to a name, as they do not really have a 'value’.
Syntax:
(<val> =)? call <ty> <fn-name>((<ty> <val>) (, <ty> <val>)*)
'indirectcall
' - Indirect Call
Calls a function pointer. Otherwise, equivalent to call
but taking a function type and a pointer
instead of taking the name of a (declared) function.
%0 = globaladdr @printf
%1 = globaladdr @some_string
%2 = iconst i32 42
%1 = indirectcall i32 (ptr, ...), ptr %0(ptr %1, i32 %2)
Syntax:
(<val> =)? call <fn-sig>, <ty> <val>((<ty> <val>) (, <ty> <val>)*)
'icmp
‘ - Integral Comparison
Compares two given integer, boolean or pointer values using a given comparison operation, and returns a bool
representing the result of the comparison.
%0 = icmp eq i32 %a, %b
%1 = icmp sgt i32 %a, %b
%2 = and bool %0, %1
condbr bool %2, if.true, if.false
There are several possible comparisons:
Opcode | Operation |
---|---|
eq | \( = \) |
ne | \( \ne \) |
ugt | \( > \) (unsigned) |
ult | \( < \) (unsigned) |
uge | \( \ge \) (unsigned) |
ule | \( \le \) (unsigned) |
sgt | \( > \) (signed) |
slt | \( < \) (signed) |
sge | \( \ge \) (signed) |
sle | \( \le \) (signed) |
Note that for
icmp
usingbool
s,true
acts as-if it was the value1
andfalse
acts as-if it was the value0
. This means thattrue > false
whether the comparison is signed or unsigned.
Syntax:
<val> = icmp <opcode> <ty> <val>, <val>
‘fcmp
‘ - Floating-point Compare
Compares two given floating-point values using a given comparison operation, and returns a bool
representing the result of the comparison.
%0 = fcmp eq f32 %a, %b
%1 = fcmp ogt f32 %a, %b
%2 = and bool %0, %1
condbr bool %2, if.true, if.false
Comparisons can be ordered or unordered. Ordered comparisons check that neither operand is a NaN value, whereas unordered comparisons check if either operand is a NaN value.
There are several possible comparisons:
Opcode | Operation |
---|---|
ord | Ordered |
uno | Unordered |
oeq | Ordered and \( = \) |
one | Ordered and \( \ne \) |
ogt | Ordered and \( > \) |
olt | Ordered and \( > \) |
oge | Ordered and \( > \) |
ole | Ordered and \( > \) |
ueq | Unordered or \( = \) |
une | Unordered or \( \ne \) |
ugt | Unordered or \( > \) |
ult | Unordered or \( > \) |
uge | Unordered or \( > \) |
ule | Unordered or \( > \) |
Syntax:
<val> = fcmp <opcode> <ty> <val>, <val>
'sel
‘ - Select based on Condition
Selects one of two given values based on a bool
condition.
%0 = sel bool %b, i64, %a, i64 %b
This is effectively a hint to the backend to try to use a cmov
(x86), csel
(arm64) or similar.
Syntax:
<val> = sel <ty>, <ty> <val>, <val>, <val>
Terminators
‘br
’ - Unconditional Branch
Unconditionally moves execution to another basic block.
br next
This is effectively the goto
construct.
Syntax:
br <label>
‘condbr
’ - Conditional Branch
Conditionally chooses one of two branches to move execution to, depending on a bool
condition.
; equivalent to the following C code:
;
; if (%0)
; goto %if.true;
; else
; goto %if.false;
;
%0 = iconst i32 0
%1 = icmp eq i32 %x, %0
condbr bool %1, if.true, if.false
condbr
is only able to branch based on bool
values. The first branch is always taken for a true
value, and the second is always taken for a false
value.
if.true
and if.false
cannot be the same block (even with different arguments), they must be different blocks.
Syntax:
condbr <ty> <val>, <label1>( (<args>) )?, <label2>( (<args>) )?
‘unreachable
‘ - Unreachable instruction
Similar to __builtin_unreachable
: Effectively asserts that the end of the block cannot be reached in a valid program. If the containing block is reached and the unreachable
is executed, the program's behavior is undefined.
unreachable
This can be used for aggressive optimizations, as blocks that have
unreachable
as their terminator and do not call any other functions can be optimized into a block only containingunreachable
, and any branches that go to an empty (except forunreachable
) block can be assumed to be never taken.
This can lead to some extremely aggressive transformations. unreachable
is effectively "viral,” one block containing it can "infect" many blocks that go to it (directly or not).
When optimizations are enabled, branches that are determined to be unreachable may be removed, or may remain as blocks that simply contain unreachable
.
When the actual compilation step is performed, unreachable
may be transformed into a trap instruction, or the block may simply be removed if it was unable to be removed during optimization.
‘ret
‘ - Return
Returns from a function to a given caller.
ret i32 64
ret
can return either void
(in which case nothing is returned), or a value of a given type.
Syntax:
ret ((void) | (<ty> <val>))
Bitwise Operations
‘and
’ - Bitwise AND
Performs a bitwise AND between two integer or boolean values.
%0 = and i32 %x, %y
The operands must have the same type. For booleans, this is equivalent to %x && %y
. For integers, each bit position in the new value is calculated using the AND truth table.
Syntax:
<result> = and <ty> <val>, <val>
‘or
’ - Bitwise OR
Performs a bitwise OR between two integer or boolean values.
%0 = or i32 %x, %y
The operands must have the same type. For booleans, this is equivalent to %x || %y
. For integers, each bit position in the new value is calculated using the OR truth table.
Syntax:
<result> = or <ty> <val>, <val>
‘xor
’ - Bitwise XOR
Performs a bitwise XOR between two integer or boolean values.
%0 = xor i32 %x, %y
The operands must have the same type. For booleans, this is equivalent to %x != %y
. For integers, each bit position in the new value is calculated using the XOR truth table.
Syntax:
<result> = xor <ty> <val>, <val>
‘shl
’ - Shift Left
Shifts a given value left by a number of bits.
%0 = shl i32 %a, %b
The operands must be the same type. Formally, this returns exactly \( ({a \cdot 2^{b}}) \) \( \mathrm{mod} \) \( 2^{N} \), where \( a \) is the first operand, \( b \) is the second, and \( N \) is the width (in bits) of the integer type.
It is undefined behavior if \( b > N \).
Syntax:
<result> = shl <ty> <val>, <val>
‘lshr
’ - Logical Shift Right
Performs a logical (unsigned) right shift. This evaluates to the first operand shifted right by the second operand, with zero fill.
%0 = lshr i32 %a, %b
The operands must be the same type. Formally, this returns exactly \( ({a \cdot 2^{b}}) \) \( \mathrm{mod} \) \( 2^{N} \), where \( a \) is the first operand, \( b \) is the second, and \( N \) is the width (in bits) of the integer type.
It is undefined behavior if \( b > N \).
Syntax:
<result> = lshr <ty> <val>, <val>
‘ashr
’ - Arithmetic Shift Right
Performs an arithmetic (signed) right shift. This evaluates to the first operand shifted right by the second operand, with the sign bit of the result being filled by the sign bit of the first operand.
%0 = ashr i32 %a, %b
The operands must be the same type.
It is undefined behavior if \( b > N \), where \( b \) is the second operand, and \( N \) is the width (in bits) of the operand type.
Syntax:
<result> = ashr <ty> <val>, <val>
Integer Arithmetic
‘iadd
’ - Integer Addition
Returns the sum of the two arguments.
%0 = iconst i32 32
%1 = iconst i32 4
%2 = iadd i32 %0, %1
‘isub
’ - Integer Subtraction
Returns the difference of the two arguments.
%0 = iconst i32 32
%1 = iconst i32 4
%2 = isub i32 %0, %1
‘imul
’ - Integer Multiplication
Returns the product of the two arguments.
%0 = iconst i32 32
%1 = iconst i32 4
%2 = imul i32 %0, %1
‘udiv'
- Integer Division (Unsigned)
Returns the quotient of the unsigned division of the two arguments.
%0 = iconst i32 32
%1 = iconst i32 4
%2 = udiv i32 %0, %1
‘sdiv
’ - Integer Division (Signed)
Returns the quotient of the unsigned division of the two arguments.
%0 = iconst i32 32
%1 = iconst i32 4
%2 = sdiv i32 %0, %1
‘urem
’ - Integer Remainder (Unsigned)
Returns the remainder of the unsigned division of the two arguments.
%0 = iconst i32 32
%1 = iconst i32 4
%2 = urem i32 %0, %1
‘srem
’ - Integer Remainder (Signed)
Returns the remainder of the signed division of the two arguments.
%0 = iconst i32 32
%1 = iconst i32 4
%2 = srem i32 %0, %1
Floating-Point Arithmetic
‘fneg
’ - Floating-point Negation
Returns the negation of a floating-point value.
%0 = fconst f64 -1.2
%1 = fneg f64 %0
‘fadd
’ - Floating-point Addition
Returns the sum of the two floating-point values.
%0 = fconst f64 -1.2
%1 = fconst f64 5.5532309
%2 = fadd f64 %0, %1
‘fsub
’ - Floating-point Subtraction
Returns the difference of the two floating-point values.
%0 = fconst f64 -1.2
%1 = fconst f64 5.5532309
%2 = fsub f64 %0, %1
‘fmul
’ - Floating-point Multiplication
Returns the product of the two floating-point arguments.
%0 = fconst f64 -1.2
%1 = fconst f64 5.5532309
%2 = fmul f64 %0, %1
‘fdiv
’ - Floating-point Division
Returns the quotient of the floating-point division on the two arguments.
%0 = fconst f64 -1.2
%1 = fconst f64 5.5532309
%2 = fdiv f64 %0, %1
‘frem
’ - Floating-point Remainder
Returns the remainder of the floating-point division on the two arguments.
%0 = fconst f64 -1.2
%1 = fconst f64 5.5532309
%2 = frem f64 %0, %1
Memory
'alloca
‘ - Dynamically Allocate in Stack Frame
Allocates memory in the current function’s stack frame. The memory is always automatically returned when the function in which the memory was allocated returns to its caller.
This is effectively the
alloca
function in C.
%0 = alloca [i64, 512]
Syntax:
<val> = alloca <ty>
‘load
’ - Load Value from Address
Loads a value of a given type from a given address.
; equivalent C code:
;
; int32_t x = *((int32_t*)p);
;
%x = load i32, ptr %p
Syntax:
<result> = load (volatile)? <ty>, <ty> <val>
Given load T, ptr %p
, the pointer %p
must be valid and point to an allocation of least sizeof(T)
bytes.
It is undefined behavior to store to a misaligned pointer, or to a pointer that points outside the bounds of objects allocated by the program.
A load can be marked as volatile
, which signals that the load must not be moved relative to any other volatile
loads or stores (or any operations that could potentially execute volatile
loads or stores, e.g. calling an optimizer-opaque function). It also signals that the load cannot be omitted under any circumstances.
‘store
’ - Store Value to Address
Stores a value of a given type to a given address.
; equivalent C code:
;
; *((int32_t*)p) = x;
;
store i32 %x, ptr %p
Syntax:
store (volatile)? <ty> <val>, <ty> <val>
Given store T %t, ptr %p
, the pointer %p
must be valid and point to at least sizeof(T)
bytes.
It is undefined behavior to store to a misaligned pointer, or to a pointer that points outside the bounds of objects allocated by the program.
A store can be marked as volatile
, which signals that the store must not be moved relative to any other volatile
loads or stores. It also signals that the store cannot be omitted under any circumstances.
Example:
%0 = call ptr @malloc(i64 4)
store i32 %x, ptr %0
‘offset
’ - Calculate Pointer Offset
Performs C-like pointer arithmetic.
; equivalent C code:
;
; void* new = ((int32_t*)p) + offset;
;
%new = offset i32, ptr %p, i64 %offset
There are no restrictions on what pointer values may be computed, but keep in mind that the resulting pointer may point outside the bounds of allocated objects may be misaligned, etc. Loading or storing to such addresses is undefined, but this is irrelevant to offset
.
The second operand must be an integer. If the integer is smaller than the native pointer size, sign-extending occurs. If the integer is larger, truncation is performed.
Aggregate Access
‘extract
’ - Extract Value from Aggregate
Extracts a value from an aggregate value at a given index.
%0 = extract i64, { i64, i64, i64 } %obj, 0
The index is always a constant, even for arrays. If runtime indexing is required, the array/structure must
be stored in memory and the offset
instruction should be used.
Syntax:
<val> = extract <ty>, <ty> <val>, <index>
‘insert
’ - Insert Value into Aggregate
Inserts a value into an aggregate at a given index.
%0 = undef { ptr, i8, f64 }
%1 = insert { ptr, i8, f64 } %0, f64 %x, 2
The index is always a constant, even for arrays. If runtime indexing is required, the array/structure must
be stored in memory and the offset
instruction should be used.
‘elemptr
’ - Get Pointer to Element
This is the way of getting pointers to the members of in-memory aggregates.
%0 = alloca { i64, i32, ptr, i8 }
%1 = elemptr { i64, i32, ptr, i8 }, ptr %0, 3
Conversions
‘sext
’ - Sign-Extend Integer
Sign-extends an integer of a smaller width to one of a larger width.
%0 = iconst i8 -15
%1 = sext i32, i8 %0
‘zext
’ - Zero-Extend Integer
Zero-extends an integer to a larger type.
‘trunc
’ - Truncate Integer
Truncates an integer to a smaller integer type.
‘itob
‘- Integer to Boolean
Converts an integer into a bool
. Non-zero values ⇒ true
, while zero ⇒ false
%0 = iconst i32 15
%1 = itob bool, i32 %0
‘btoi
‘ - Boolean to Integer
Converts a boolean into an integer. true
⇒ 1
while false
⇒ 0
.
%0 = bconst bool true
%1 = btoi i32, bool %0
‘sitof
‘- Signed Integer to Floating-point
Converts a signed integer into the nearest floating-point value.
%0 = iconst i32 -3
%1 = sitof f32, i32 %0
‘uitof
‘- Unsigned Integer to Floating-point
Converts an unsigned integer into the nearest floating-point value.
%0 = iconst i32 16
%1 = uitof f32, i32 %0
‘ftosi
‘ - Floating-point to Signed Integer
Converts a floating-point value into the nearest signed integer
%0 = fconst f32 -1.2
%1 = ftosi i32, f32 %0
‘ftoui
‘ - Floating-point to Unsigned Integer
Converts a floating-point value into the nearest unsigned integer
%0 = fconst f32 1.4
%1 = ftoui i32, f32 %0
'fext
' - Floating-point Extend
Extends a smaller floating-point type to a larger floating-point type.
%0 = fconst f32 1.4
%1 = fext f64, f32 %0
'ftrunc
' - Floating-point Truncate
Truncates a larger floating-point type to a smaller floating-point type.
%0 = fconst f64 1.4
%1 = ftrunc f32, f32 %0
‘itop
‘ - Integer to Pointer
Converts an integer into a pointer with the equivalent bit-pattern. If the integer is smaller than the native pointer size, zero-extending occurs. If the integer is larger, truncation is performed.
%1 = iconst i32 15
%0 = itop ptr, i32 %1
‘ptoi
‘ - Pointer to Integer
Converts a pointer into the equivalent bit-pattern in an integer. If the integer result type is smaller, truncation is performed. If the result type is larger, zero-extending occurs.
%0 = ptoi i64, ptr %1
Constant Materialization
'stackslot
' - Pointer to Stack Memory
Materializes a ptr
that points to memory allocated by a stack
slot. These will always
produce the same ptr
for the duration that a given function is executing (the value across
multiple calls to the function containing this is unspecified).
; given that $x = stack i32
%0 = stackslot $x
Syntax:
<val> = stackslot <stack slot name>
'globaladdr
' - Pointer to Global
Materializes a ptr
that points to some global entity. This could be a function, a global
variable, etc.
; given that fn i32 @printf(ptr, ...)
%0 = globaladdr @printf
Syntax:
<val> = globaladdr <global name>
'bconst
' - Boolean Constant
Materializes a bool
with either true
or false
.
%0 = bconst bool true
%1 = bonst bool false
Syntax:
<val> = bconst <ty> ((true) | (false))
'iconst
' - Integer Constant
Materializes an integer with a given constant value.
Integer constants are made up of digits and an optional prefix, and can be in one of four forms:
- Binary, must have prefix
0b
and is made up of0
and1
- Octal, must have prefix
0o
and is made up of[0-7]
. - Decimal, must have no prefix and no leading zeroes and is made up of
[0-9]
. They can have a leading-
which will make them negative. - Hex, must have prefix
0x
and is made up of[0-9a-zA-Z]
Constants can be negative or positive, but they are converted to their unsigned bit-pattern regardless of sign. Since integers are two’s complement, this means that the constant -1
is equivalent to 0xFF
or 255
for i8
, 0xFFFF
for i16
, etc.
%0 = iconst i32 0b11
%1 = iconst i64 0xFA
%2 = iconst i8 -2
%3 = iconst i16 16
%4 = iconst i32 0o777
Syntax:
<val> = iconst <ty> (-)?((0b) | (0o) | (0x))[0-9a-fA-F]+
'fconst
' - Floating-point Constant
Materializes a floating-point constant from a given floating-point literal.
Floating-point literals can be in decimal form with a .
, scientific notation, C's hex float format, or raw hex (prefix 0f
to make distinct from hex integer values) denoting the underlying byte values.
- Standard decimal form:
([0-9]+).([0-9]+)
(ex.0.0039
) - Scientific notation:
.([0-9]+)(.[0-9]+)?e(+|-)([0-9]+)
(ex.1.749e-3
) - Raw hex:
0f([0-9a-fA-F]+)
(ex.0f3FD55558B21DC9EA
) NaN
for an unspecified NaN value
%0 = fconst f64 0f3FD55558B21DC9EA
%1 = fconst f32 3.14195
%3 = fconst f32 1.3e100
‘undef
’ - Undefined Value
Materializes an uninitialized object. Reading the value yields a non-deterministic value, but it is not undefined behavior.
Reading the same undef
value multiple times will yield the same value.
%0 = undef { i32, i32 }
%1 = undef ptr
%2 = undef f64
'null
' - Null Value
Materializes a zero-ed object. Reading the value results in whatever an all-zero-bits representation means for that type.
%0 = null ptr
%1 = null i32
%2 = null { ptr, i64, i64 }
Compiler Internals
SIR Parser
Parsing SIR from input text is a effectively divided into two stages:
- Running a Pest-generated parser to get the input text into a parse tree
- Running a dual type-checker/parser on the parse tree to generate SIR from it
The second part is the part actually implemented in the compiler.
Preprocessing
The parser effectively runs multiple stages of pre-processing instead of walking the parse tree a single time. This pre-processing enables use-before-definition at both the function and the block level.
Each time a function subtree is found, the function is declared and the function's body is put onto a work list, and once every function in the file is declared the parser goes through the list and actually parses the bodies (and properly defines the functions).
The same is done at the basic-block level inside function bodies to make it possible to branch to blocks that are defined after the branch instruction that targets them. This also makes it easier to preserve the block order that was parsed, even though technically this is irrelevant (as long as the entry block is in the right place).
What this actually means for the instruction parsing code is that all valid block/function names are known by the time they execute, so the names not existing can only mean that the input is invalid.
Names -> Values
The parser relies on the SSA property of the program to map names (e.g. %x
) back to
the actual values that it has generated for those names. While this property is checked (when
new values are introduced it's checked to make sure they did not exist before), it is checked
while instructions are being parsed.
When an instruction or block parameter is parsed, its name (referred to by a LocalIdent
) is
put into the resolver
map inside of SIRParser
. That entry is then mapped to the value referred
to by the name, so that later that value can be found.
SIR In-Memory Representation
SIR is represented as a (relatively obscured) graph in-memory. Representation for a given SIR function is divided between two different types:
DataFlowGraph
: owns all the "things" in a function, and models data-flow between valuesLayout
: orders the entities in a data-flow graph into lists of blocks, and lists of instructions in those blocks.
DataFlowGraph
: The DFG
This is basically a massive overly-complicated lookup table. It stores lots of arenas that store everything in a function that matters, whether it be instructions, blocks, referenced function signatures, etc. Everything is stored inside of arenas, and 'references' to those entities is passed around by indices into those arenas.
Layout
: The Layout
Codegen Pipeline
Register Allocation
Register allocators in Sapphire operate on MIR in a very specific format. This document will use x86-64 MIR syntax, but the basic idea applies to all architectures.
Expected Input
The expectation is that the MIR input is in non-SSA form using an unlimited set of virtual registers (v-regs).
All φ nodes are expected to have already been eliminated and replaced with mov
instructions. Effectively,
it is assumed that if the MIR was executed on a machine that had an infinite number of registers, it would
have the behavior of the original SIR.
φ-elimination can be performed however the backend sees fit (e.g. the x86-64 backend performs critical edge
splitting over the SIR before lowering and translates φ-nodes into naive copies), but the behavior must be modeled
through copies (and therefore be visible to the register allocator with MachInst::is_move
).
Constraints
Register constraints are expected to be expressed via instructions that operate directly on specific physical registers. The register allocator is expected to obey these constraints.
Stack Frames
It is expected that the prologue/epilogue code not be present (if necessary), as the rewriter will emit the prologue and epilogue where necessary.
Parameters & Return Values
Parameters located in registers are the most rigid expectation the register allocator has, they are expected to be
implemented via mov
s from physical registers into v-regs. The stack frame stores which ones are used for this.
Return values must work in a similar way if they write to registers, they are expected to be mov
s
from v-regs into physical registers as determined by the ABI.
Example:
foo:
mov vi4, edi ; parameter `mov`s start here
mov vi3, esi ; .
mov vi2, edx ; .
mov vi1, ecx ; .
mov vi0, vi1 ; regular codegen starts here
add vi0, vi2
; ...
mov eax, vi0 ; return `mov`s start here
ret ; regular codegen continues here
Tools
Sapphire comes with several CLI tools for working with SIR, this is the documentation on how those tools work and how to use them.
filetest
: A file-driven testing toolsirc
: A CLI interface to the Sapphire backend
filetest
: File-driven Test Runner
This is a very similar (albeit extremely cut-down) tool to FileCheck
from LLVM. It's a simple, fast way of writing
data-driven regression tests for the compiler.
Tests and Runners
Tests are defined in terms of "subtests" and "cases" for those subtests. Subtests are subsets of
the complete list of test cases that are run through a specific "runner" that is built into
filetest
. These runners are written in the runners/
subdirectory of filetest/
, and are
enabled in runner.rs
.
Note: The reason it is built this way is to allow better access to compiler APIs in regression tests, as writing the runners in Rust gives full access to both the Sapphire API and the Rust language.
This tool may end up changing to more of a command-based model where tests are defined in terms of commands to execute that output data to be matched on, but as of right now runners are hard-coded.
Runners operate on a subset of the total list of test cases, each subdirectory in the tests/
directory
is considered to correspond to a subtest. tests/parse/
=> parse
subtest, tests/domtree/
=> domtree
subtest, etc. These are automatically discovered when filetest
is executed.
Format
filetest
works with the idea of "checks," and it provides several types of checks
that can be useful in different circumstances.
All filetest
test cases (files) must start with a header of the following format:
; <CHECK TYPE>
where <CHECK TYPE>
is one of the following:
STANDARD
: NormalCHECK
directive style, the output should contain the content of each CHECK lineMATCH-ENTIRE
: The output of a given test should exactly match the rest of the file (starting at the beginning of the next line)MATCH-SECTION
: The output should contain a given output
Note that the entire file is still given as input for the test, comments are not stripped out. This should not matter for your test.
Check Types
STANDARD
This is very similar to CHECK
directives in LLVM's FileCheck
. Any line that (after whitespace
is trimmed from the beginning) follows the following format is considered to be a CHECK
directive:
; CHECK: <check content>
Each of these checks must appear in order in the output of the test, although they may have other lines inserted between them. Consider the following tests:
Test Case | Output |
---|---|
llvm<br/>; STANDARD<br/>; CHECK: a<br/>; CHECK: b | whatever<br/>a<br/>whatever<br/>b |
This would be considered to pass the test. While a
and b
are not one-after-the-other in the test's output,
they are in the correct order relative to one another, and they are both present.
Test Case | Output |
---|---|
llvm<br/>; STANDARD<br/>; CHECK: a<br/>; CHECK: b | whatever<br/>b<br/>whatever<br/>a |
This would be considered to not pass the test. While a
and b
are both present in the output, they
are not in the correct relative order.
If it is necessary to not have anything between checks, use MATCH-SECTION
.
MATCH-ENTIRE
This directive is simple. It takes everything in the file after the ; MATCH-ENTIRE
directive
and expects the output to match this exactly. This is really only used for the parser regression tests
where the parser/writer should output the same thing that was input.
; MATCH-ENTIRE
fn i32 @main(i32, ptr) {
entry(i32 %argc, ptr %argv):
%0 = iconst i32 0
ret i32 %0
}
MATCH-SECTION
This directive allows defining an exact textual structure that should appear somewhere in the output. After
the MATCH-SECTION
directive, an ;;
line denotes the end of the section to look for in the output, and
the first line after is expected to be an empty ;
line.
Note: While a bit odd, this makes it easier to read the test cases and allows blank lines to be part of the tests.
; MATCH-SECTION
;
; fn void @test() {
; entry:
; ret void
; }
;;
fn void @test() {
entry:
ret void
}
sirc
: SIR Compiler
x86-64 Backend
This section documents the different constraints, assumptions and limitations of the current x86-64 backend. It also documents the specific implementation quirks of the backend.
φ Elimination
φ elimination (also known as SSA destruction) is performed in the naive "translate into copies" method, but this has
known limitations (see lost copy problem). It is assumed that the split-crit-edges
pass has been run over the SIR module before codegen begins, if this is not done
incorrect code can be emitted.
Type Representation
bool
bool
is represented the same way as an i8
(i.e. in byte
-sized partial registers), with false
being zero
and true
being 1
.
condbr
and icmp
/fcmp
Interactions
Currently, the backend has two states for implementing condbr
:
- The comparison is the instruction directly before the
condbr
- The comparison is not directly before the
condbr
This is a bit of a hack, but it's to ensure that CPU flags are maintained correctly. The two states are implemented like so:
Directly Before
This is how you'd want it to be emitted, a cmp
-like instruction followed by
a jump that models the condition in the comparison. You get code like this emitted:
cmp rax, rdx
jl .TRUE
jmp .FALSE
In this case, the icmp
/fcmp
does not have a constrained register, and is
only emitted to manipulate CPU flags.
Not Directly Before
This is the case that needs to be optimized better, but for now this is what it does. The condbr
places a register constraint on the comparison instruction, and it emits instructions to jump to the
true branch if that register is 1, and jump to the false branch otherwise. It effectively
generates this code:
test R, R
jnz .TRUE
jmp .FALSE
When the comparison is emitted, it is expected to also emit a set*
instruction to copy the condition
to a register, so that the test
previously emitted will work. The end result should look like this:
cmp rax, rdx
xor R, R
setle R
...
test R, R
jnz .TRUE
jmp .FALSE
Calls
Calls are treated as-if they didn't affect the stack layout at all, and are lowered such that they don't affect the prologue/epilogue at all.
All calls that require stack space for parameter passing and whatnot are expected to generate their own stack manipulation code.
System V
For System V, this is implemented via push
instructions and
an add rsp, N
after the call
that fixes back up the stack.
ABI Legalization
The x86-64 backend expects a specific subset of SIR as input.
The gist of it is the follows:
- All aggregates that would be put into registers according to calling convention rules remain as aggregates in parameters
- All other aggregates that would be passed in memory