mirror of
				https://github.com/golang/go.git
				synced 2025-10-31 08:40:55 +00:00 
			
		
		
		
	 012917afba
			
		
	
	
		012917afba
		
	
	
	
	
		
			
			The architecture-specific details will be updated and expanded in a subsequent CL (or series thereof). Update #10096 Change-Id: I59c6be1fcc123fe8626ce2130e6ffe71152c87af Reviewed-on: https://go-review.googlesource.com/11954 Reviewed-by: Russ Cox <rsc@golang.org>
		
			
				
	
	
		
			654 lines
		
	
	
	
		
			24 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			654 lines
		
	
	
	
		
			24 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!--{
 | ||
| 	"Title": "A Quick Guide to Go's Assembler",
 | ||
| 	"Path":  "/doc/asm"
 | ||
| }-->
 | ||
| 
 | ||
| <h2 id="introduction">A Quick Guide to Go's Assembler</h2>
 | ||
| 
 | ||
| <p>
 | ||
| This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler.
 | ||
| The document is not comprehensive.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail
 | ||
| <a href="http://plan9.bell-labs.com/sys/doc/asm.html">elsewhere</a>.
 | ||
| If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
 | ||
| The current document provides a summary of the syntax and the differences with
 | ||
| what is explained in that document, and
 | ||
| describes the peculiarities that apply when writing assembly code to interact with Go.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
 | ||
| Some of the details map precisely to the machine, but some do not.
 | ||
| This is because the compiler suite (see
 | ||
| <a href="http://plan9.bell-labs.com/sys/doc/compiler.html">this description</a>)
 | ||
| needs no assembler pass in the usual pipeline.
 | ||
| Instead, the compiler operates on a kind of semi-abstract instruction set,
 | ||
| and instruction selection occurs partly after code generation.
 | ||
| The assembler works on the semi-abstract form, so
 | ||
| when you see an instruction like <code>MOV</code>
 | ||
| what the tool chain actually generates for that operation might
 | ||
| not be a move instruction at all, perhaps a clear or load.
 | ||
| Or it might correspond exactly to the machine instruction with that name.
 | ||
| In general, machine-specific operations tend to appear as themselves, while more general concepts like
 | ||
| memory move and subroutine call and return are more abstract.
 | ||
| The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The assembler program is a way to parse a description of that
 | ||
| semi-abstract instruction set and turn it into instructions to be
 | ||
| input to the linker.
 | ||
| If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
 | ||
| are many examples in the sources of the standard library, in packages such as
 | ||
| <a href="/pkg/runtime/"><code>runtime</code></a> and
 | ||
| <a href="/pkg/math/big/"><code>math/big</code></a>.
 | ||
| You can also examine what the compiler emits as assembly code
 | ||
| (the actual output may differ from what you see here):
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| $ cat x.go
 | ||
| package main
 | ||
| 
 | ||
| func main() {
 | ||
| 	println(3)
 | ||
| }
 | ||
| $ GOOS=linux GOARCH=amd64 go tool compile -S x.go        # or: go build -gcflags -S x.go
 | ||
| 
 | ||
| --- prog list "main" ---
 | ||
| 0000 (x.go:3) TEXT    main+0(SB),$8-0
 | ||
| 0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB)
 | ||
| 0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB)
 | ||
| 0003 (x.go:4) MOVQ    $3,(SP)
 | ||
| 0004 (x.go:4) PCDATA  $0,$8
 | ||
| 0005 (x.go:4) CALL    ,runtime.printint+0(SB)
 | ||
| 0006 (x.go:4) PCDATA  $0,$-1
 | ||
| 0007 (x.go:4) PCDATA  $0,$0
 | ||
| 0008 (x.go:4) CALL    ,runtime.printnl+0(SB)
 | ||
| 0009 (x.go:4) PCDATA  $0,$-1
 | ||
| 0010 (x.go:5) RET     ,
 | ||
| ...
 | ||
| </pre>
 | ||
| 
 | ||
| <p>
 | ||
| The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
 | ||
| for use by the garbage collector; they are introduced by the compiler.
 | ||
| </p> 
 | ||
| 
 | ||
| <!-- Commenting out because the feature is gone but it's popular and may come back.
 | ||
| 
 | ||
| <p>
 | ||
| To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker:
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| $ go tool 6l -a x.6        # or: go build -ldflags -a x.go
 | ||
| codeblk [0x2000,0x1d059) at offset 0x1000
 | ||
| 002000	main.main            | (3)	TEXT	main.main+0(SB),$8
 | ||
| 002000	65488b0c25a0080000   | (3)	MOVQ	2208(GS),CX
 | ||
| 002009	483b21               | (3)	CMPQ	SP,(CX)
 | ||
| 00200c	7707                 | (3)	JHI	,2015
 | ||
| 00200e	e83da20100           | (3)	CALL	,1c250+runtime.morestack00
 | ||
| 002013	ebeb                 | (3)	JMP	,2000
 | ||
| 002015	4883ec08             | (3)	SUBQ	$8,SP
 | ||
| 002019	                     | (3)	FUNCDATA	$0,main.gcargs·0+0(SB)
 | ||
| 002019	                     | (3)	FUNCDATA	$1,main.gclocals·0+0(SB)
 | ||
| 002019	48c7042403000000     | (4)	MOVQ	$3,(SP)
 | ||
| 002021	                     | (4)	PCDATA	$0,$8
 | ||
| 002021	e8aad20000           | (4)	CALL	,f2d0+runtime.printint
 | ||
| 002026	                     | (4)	PCDATA	$0,$-1
 | ||
| 002026	                     | (4)	PCDATA	$0,$0
 | ||
| 002026	e865d40000           | (4)	CALL	,f490+runtime.printnl
 | ||
| 00202b	                     | (4)	PCDATA	$0,$-1
 | ||
| 00202b	4883c408             | (5)	ADDQ	$8,SP
 | ||
| 00202f	c3                   | (5)	RET	,
 | ||
| ...
 | ||
| </pre>
 | ||
| 
 | ||
| -->
 | ||
| 
 | ||
| <h3 id="constants">Constants</h3>
 | ||
| 
 | ||
| <p>
 | ||
| Although the assembler takes its guidance from the Plan 9 assemblers,
 | ||
| it is a distinct program, so there are some differences.
 | ||
| One is in constant evaluation.
 | ||
| Constant expressions in the assembler are parsed using Go's operator
 | ||
| precedence, not the C-like precedence of the original.
 | ||
| Thus <code>3&1<<2</code> is 4, not 0—it parses as <code>(3&1)<<2</code>
 | ||
| not <code>3&(1<<2)</code>.
 | ||
| Also, constants are always evaluated as 64-bit unsigned integers.
 | ||
| Thus <code>-2</code> is not the integer value minus two,
 | ||
| but the unsigned 64-bit integer with the same bit pattern.
 | ||
| The distinction rarely matters but
 | ||
| to avoid ambiguity, division or right shift where the right operand's
 | ||
| high bit is set is rejected.
 | ||
| </p>
 | ||
| 
 | ||
| <h3 id="symbols">Symbols</h3>
 | ||
| 
 | ||
| <p>
 | ||
| Some symbols, such as <code>R1</code> or <code>LR</code>,
 | ||
| are predefined and refer to registers.
 | ||
| The exact set depends on the architecture.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| There are four predeclared symbols that refer to pseudo-registers.
 | ||
| These are not real registers, but rather virtual registers maintained by
 | ||
| the tool chain, such as a frame pointer.
 | ||
| The set of pseudo-registers is the same for all architectures:
 | ||
| </p>
 | ||
| 
 | ||
| <ul>
 | ||
| 
 | ||
| <li>
 | ||
| <code>FP</code>: Frame pointer: arguments and locals.
 | ||
| </li>
 | ||
| 
 | ||
| <li>
 | ||
| <code>PC</code>: Program counter:
 | ||
| jumps and branches.
 | ||
| </li>
 | ||
| 
 | ||
| <li>
 | ||
| <code>SB</code>: Static base pointer: global symbols.
 | ||
| </li>
 | ||
| 
 | ||
| <li>
 | ||
| <code>SP</code>: Stack pointer: top of stack.
 | ||
| </li>
 | ||
| 
 | ||
| </ul>
 | ||
| 
 | ||
| <p>
 | ||
| All user-defined symbols are written as offsets to the pseudo-registers
 | ||
| <code>FP</code> (arguments and locals) and <code>SB</code> (globals).
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
 | ||
| is the name <code>foo</code> as an address in memory.
 | ||
| This form is used to name global functions and data.
 | ||
| Adding <code><></code> to the name, as in <span style="white-space: nowrap"><code>foo<>(SB)</code></span>, makes the name
 | ||
| visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
 | ||
| Adding an offset to the name refers to that offset from the symbol's address, so
 | ||
| <code>a+4(SB)</code> is four bytes past the start of <code>foo</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The <code>FP</code> pseudo-register is a virtual frame pointer
 | ||
| used to refer to function arguments.
 | ||
| The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
 | ||
| Thus <code>0(FP)</code> is the first argument to the function,
 | ||
| <code>8(FP)</code> is the second (on a 64-bit machine), and so on.
 | ||
| However, when referring to a function argument this way, it is necessary to place a name
 | ||
| at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
 | ||
| (The meaning of the offset—offset from the frame pointer—distinct
 | ||
| from its use with <code>SB</code>, where it is an offset from the symbol.)
 | ||
| The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
 | ||
| The actual name is semantically irrelevant but should be used to document
 | ||
| the argument's name.
 | ||
| It is worth stressing that <code>FP</code> is always a
 | ||
| pseudo-register, not a hardware
 | ||
| register, even on architectures with a hardware frame pointer.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
 | ||
| and offsets match.
 | ||
| On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
 | ||
| a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
 | ||
| If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The <code>SP</code> pseudo-register is a virtual stack pointer
 | ||
| used to refer to frame-local variables and the arguments being
 | ||
| prepared for function calls.
 | ||
| It points to the top of the local stack frame, so references should use negative offsets
 | ||
| in the range [−framesize, 0):
 | ||
| <code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| On architectures with a hardware register named <code>SP</code>,
 | ||
| the name prefix distinguishes
 | ||
| references to the virtual stack pointer from references to the architectural
 | ||
| <code>SP</code> register.
 | ||
| That is, <code>x-8(SP)</code> and <code>-8(SP)</code>
 | ||
| are different memory locations:
 | ||
| the first refers to the virtual stack pointer pseudo-register,
 | ||
| while the second refers to the
 | ||
| hardware's <code>SP</code> register.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| On machines where <code>SP</code> and <code>PC</code> are
 | ||
| traditionally aliases for a physical, numbered register,
 | ||
| in the Go assembler the names <code>SP</code> and <code>PC</code>
 | ||
| are still treated specially;
 | ||
| for instance, references to <code>SP</code> require a symbol,
 | ||
| much like <code>FP</code>.
 | ||
| To access the actual hardware register use the true <code>R</code> name.
 | ||
| For example, on the ARM architecture the hardware
 | ||
| <code>SP</code> and <code>PC</code> are accessible as
 | ||
| <code>R13</code> and <code>R15</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| Branches and direct jumps are always written as offsets to the PC, or as
 | ||
| jumps to labels:
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| label:
 | ||
| 	MOVW $0, R1
 | ||
| 	JMP label
 | ||
| </pre>
 | ||
| 
 | ||
| <p>
 | ||
| Each label is visible only within the function in which it is defined.
 | ||
| It is therefore permitted for multiple functions in a file to define
 | ||
| and use the same label names.
 | ||
| Direct jumps and call instructions can target text symbols,
 | ||
| such as <code>name(SB)</code>, but not offsets from symbols,
 | ||
| such as <code>name+4(SB)</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| Instructions, registers, and assembler directives are always in UPPER CASE to remind you
 | ||
| that assembly programming is a fraught endeavor.
 | ||
| (Exception: the <code>g</code> register renaming on ARM.)
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| In Go object files and binaries, the full name of a symbol is the 
 | ||
| package path followed by a period and the symbol name:
 | ||
| <code>fmt.Printf</code> or <code>math/rand.Int</code>.
 | ||
| Because the assembler's parser treats period and slash as punctuation,
 | ||
| those strings cannot be used directly as identifier names.
 | ||
| Instead, the assembler allows the middle dot character U+00B7
 | ||
| and the division slash U+2215 in identifiers and rewrites them to
 | ||
| plain period and slash.
 | ||
| Within an assembler source file, the symbols above are written as
 | ||
| <code>fmt·Printf</code> and <code>math∕rand·Int</code>.
 | ||
| The assembly listings generated by the compilers when using the <code>-S</code> flag
 | ||
| show the period and slash directly instead of the Unicode replacements
 | ||
| required by the assemblers.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| Most hand-written assembly files do not include the full package path
 | ||
| in symbol names, because the linker inserts the package path of the current
 | ||
| object file at the beginning of any name starting with a period:
 | ||
| in an assembly source file within the math/rand package implementation,
 | ||
| the package's Int function can be referred to as <code>·Int</code>.
 | ||
| This convention avoids the need to hard-code a package's import path in its
 | ||
| own source code, making it easier to move the code from one location to another.
 | ||
| </p>
 | ||
| 
 | ||
| <h3 id="directives">Directives</h3>
 | ||
| 
 | ||
| <p>
 | ||
| The assembler uses various directives to bind text and data to symbol names.
 | ||
| For example, here is a simple complete function definition. The <code>TEXT</code>
 | ||
| directive declares the symbol <code>runtime·profileloop</code> and the instructions
 | ||
| that follow form the body of the function.
 | ||
| The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
 | ||
| (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
 | ||
| After the symbol, the arguments are flags (see below)
 | ||
| and the frame size, a constant (but see below):
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| TEXT runtime·profileloop(SB),NOSPLIT,$8
 | ||
| 	MOVQ	$runtime·profileloop1(SB), CX
 | ||
| 	MOVQ	CX, 0(SP)
 | ||
| 	CALL	runtime·externalthreadhandler(SB)
 | ||
| 	RET
 | ||
| </pre>
 | ||
| 
 | ||
| <p>
 | ||
| In the general case, the frame size is followed by an argument size, separated by a minus sign.
 | ||
| (It's not a subtraction, just idiosyncratic syntax.)
 | ||
| The frame size <code>$24-8</code> states that the function has a 24-byte frame
 | ||
| and is called with 8 bytes of argument, which live on the caller's frame.
 | ||
| If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
 | ||
| the argument size must be provided.
 | ||
| For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
 | ||
| argument size is correct.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
 | ||
| static base pseudo-register <code>SB</code>.
 | ||
| This function would be called from Go source for package <code>runtime</code> using the
 | ||
| simple name <code>profileloop</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| Global data symbols are defined by a sequence of initializing
 | ||
| <code>DATA</code> directives followed by a <code>GLOBL</code> directive.
 | ||
| Each <code>DATA</code> directive initializes a section of the
 | ||
| corresponding memory.
 | ||
| The memory not explicitly initialized is zeroed.
 | ||
| The general form of the <code>DATA</code> directive is
 | ||
| 
 | ||
| <pre>
 | ||
| DATA	symbol+offset(SB)/width, value
 | ||
| </pre>
 | ||
| 
 | ||
| <p>
 | ||
| which initializes the symbol memory at the given offset and width with the given value.
 | ||
| The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The <code>GLOBL</code> directive declares a symbol to be global.
 | ||
| The arguments are optional flags and the size of the data being declared as a global,
 | ||
| which will have initial value all zeros unless a <code>DATA</code> directive
 | ||
| has initialized it.
 | ||
| The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| For example,
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| DATA divtab<>+0x00(SB)/4, $0xf4f8fcff
 | ||
| DATA divtab<>+0x04(SB)/4, $0xe6eaedf0
 | ||
| ...
 | ||
| DATA divtab<>+0x3c(SB)/4, $0x81828384
 | ||
| GLOBL divtab<>(SB), RODATA, $64
 | ||
| 
 | ||
| GLOBL runtime·tlsoffset(SB), NOPTR, $4
 | ||
| </pre>
 | ||
| 
 | ||
| <p>
 | ||
| declares and initializes <code>divtab<></code>, a read-only 64-byte table of 4-byte integer values,
 | ||
| and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
 | ||
| contains no pointers.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| There may be one or two arguments to the directives.
 | ||
| If there are two, the first is a bit mask of flags,
 | ||
| which can be written as numeric expressions, added or or-ed together,
 | ||
| or can be set symbolically for easier absorption by a human.
 | ||
| Their values, defined in the standard <code>#include</code>  file <code>textflag.h</code>, are:
 | ||
| </p>
 | ||
| 
 | ||
| <ul>
 | ||
| <li>
 | ||
| <code>NOPROF</code> = 1
 | ||
| <br>
 | ||
| (For <code>TEXT</code> items.)
 | ||
| Don't profile the marked function.  This flag is deprecated.
 | ||
| </li>
 | ||
| <li>
 | ||
| <code>DUPOK</code> = 2
 | ||
| <br>
 | ||
| It is legal to have multiple instances of this symbol in a single binary.
 | ||
| The linker will choose one of the duplicates to use.
 | ||
| </li>
 | ||
| <li>
 | ||
| <code>NOSPLIT</code> = 4
 | ||
| <br>
 | ||
| (For <code>TEXT</code> items.)
 | ||
| Don't insert the preamble to check if the stack must be split.
 | ||
| The frame for the routine, plus anything it calls, must fit in the
 | ||
| spare space at the top of the stack segment.
 | ||
| Used to protect routines such as the stack splitting code itself.
 | ||
| </li>
 | ||
| <li>
 | ||
| <code>RODATA</code> = 8
 | ||
| <br>
 | ||
| (For <code>DATA</code> and <code>GLOBL</code> items.)
 | ||
| Put this data in a read-only section.
 | ||
| </li>
 | ||
| <li>
 | ||
| <code>NOPTR</code> = 16
 | ||
| <br>
 | ||
| (For <code>DATA</code> and <code>GLOBL</code> items.)
 | ||
| This data contains no pointers and therefore does not need to be
 | ||
| scanned by the garbage collector.
 | ||
| </li>
 | ||
| <li>
 | ||
| <code>WRAPPER</code> = 32
 | ||
| <br>
 | ||
| (For <code>TEXT</code> items.)
 | ||
| This is a wrapper function and should not count as disabling <code>recover</code>.
 | ||
| </li>
 | ||
| <li>
 | ||
| <code>NEEDCTXT</code> = 64
 | ||
| <br>
 | ||
| (For <code>TEXT</code> items.)
 | ||
| This function is a closure so it uses its incoming context register.
 | ||
| </li>
 | ||
| </ul>
 | ||
| 
 | ||
| <h3 id="runtime">Runtime Coordination</h3>
 | ||
| 
 | ||
| <p>
 | ||
| For garbage collection to run correctly, the runtime must know the
 | ||
| location of pointers in all global data and in most stack frames.
 | ||
| The Go compiler emits this information when compiling Go source files,
 | ||
| but assembly programs must define it explicitly.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| A data symbol marked with the <code>NOPTR</code> flag (see above)
 | ||
| is treated as containing no pointers to runtime-allocated data.
 | ||
| A data symbol with the <code>RODATA</code> flag
 | ||
| is allocated in read-only memory and is therefore treated
 | ||
| as implicitly marked <code>NOPTR</code>.
 | ||
| A data symbol with a total size smaller than a pointer
 | ||
| is also treated as implicitly marked <code>NOPTR</code>.
 | ||
| It is not possible to define a symbol containing pointers in an assembly source file;
 | ||
| such a symbol must be defined in a Go source file instead.
 | ||
| Assembly source can still refer to the symbol by name
 | ||
| even without <code>DATA</code> and <code>GLOBL</code> directives.
 | ||
| A good general rule of thumb is to define all non-<code>RODATA</code>
 | ||
| symbols in Go instead of in assembly.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| Each function also needs annotations giving the location of
 | ||
| live pointers in its arguments, results, and local stack frame.
 | ||
| For an assembly function with no pointer results and
 | ||
| either no local stack frame or no function calls,
 | ||
| the only requirement is to define a Go prototype for the function
 | ||
| in a Go source file in the same package. The name of the assembly
 | ||
| function must not contain the package name component (for example,
 | ||
| function <code>Syscall</code> in package <code>syscall</code> should
 | ||
| use the name <code>·Syscall</code> instead of the equivalent name
 | ||
| <code>syscall·Syscall</code> in its <code>TEXT</code> directive).
 | ||
| For more complex situations, explicit annotation is needed.
 | ||
| These annotations use pseudo-instructions defined in the standard
 | ||
| <code>#include</code> file <code>funcdata.h</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| If a function has no arguments and no results,
 | ||
| the pointer information can be omitted.
 | ||
| This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
 | ||
| on the <code>TEXT</code> instruction.
 | ||
| Otherwise, pointer information must be provided by
 | ||
| a Go prototype for the function in a Go source file,
 | ||
| even for assembly functions not called directly from Go.
 | ||
| (The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
 | ||
| At the start of the function, the arguments are assumed
 | ||
| to be initialized but the results are assumed uninitialized.
 | ||
| If the results will hold live pointers during a call instruction,
 | ||
| the function should start by zeroing the results and then 
 | ||
| executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
 | ||
| This instruction records that the results are now initialized
 | ||
| and should be scanned during stack movement and garbage collection.
 | ||
| It is typically easier to arrange that assembly functions do not
 | ||
| return pointers or do not contain call instructions;
 | ||
| no assembly functions in the standard library use
 | ||
| <code>GO_RESULTS_INITIALIZED</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| If a function has no local stack frame,
 | ||
| the pointer information can be omitted.
 | ||
| This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
 | ||
| on the <code>TEXT</code> instruction.
 | ||
| The pointer information can also be omitted if the
 | ||
| function contains no call instructions.
 | ||
| Otherwise, the local stack frame must not contain pointers,
 | ||
| and the assembly must confirm this fact by executing the 
 | ||
| pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
 | ||
| Because stack resizing is implemented by moving the stack,
 | ||
| the stack pointer may change during any function call:
 | ||
| even pointers to stack data must not be kept in local variables.
 | ||
| </p>
 | ||
| 
 | ||
| <h2 id="architectures">Architecture-specific details</h2>
 | ||
| 
 | ||
| <p>
 | ||
| It is impractical to list all the instructions and other details for each machine.
 | ||
| To see what instructions are defined for a given machine, say 32-bit Intel x86,
 | ||
| look in the top-level header file for the corresponding linker, in this case <code>8l</code>.
 | ||
| That is, the file <code>$GOROOT/src/cmd/8l/8.out.h</code> contains a C enumeration, called <code>as</code>,
 | ||
| of the instructions and their spellings as known to the assembler and linker for that architecture.
 | ||
| In that file you'll find a declaration that begins
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| enum	as
 | ||
| {
 | ||
| 	AXXX,
 | ||
| 	AAAA,
 | ||
| 	AAAD,
 | ||
| 	AAAM,
 | ||
| 	AAAS,
 | ||
| 	AADCB,
 | ||
| 	...
 | ||
| </pre>
 | ||
| 
 | ||
| <p>
 | ||
| Each instruction begins with a  initial capital <code>A</code> in this list, so <code>AADCB</code>
 | ||
| represents the <code>ADCB</code> (add carry byte) instruction.
 | ||
| The enumeration is in alphabetical order, plus some late additions (<code>AXXX</code> occupies
 | ||
| the zero slot as an invalid instruction).
 | ||
| The sequence has nothing to do with the actual encoding of the machine instructions.
 | ||
| Again, the linker takes care of that detail.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
 | ||
| <code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
 | ||
| This convention applies even on architectures where the usual mode is the opposite direction.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| Here follows some descriptions of key Go-specific details for the supported architectures.
 | ||
| </p>
 | ||
| 
 | ||
| <h3 id="x86">32-bit Intel 386</h3>
 | ||
| 
 | ||
| <p>
 | ||
| The runtime pointer to the <code>g</code> structure is maintained
 | ||
| through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
 | ||
| A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes
 | ||
| an architecture-dependent header file, like this:
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| #include "zasm_GOOS_GOARCH.h"
 | ||
| </pre>
 | ||
| 
 | ||
| <p>
 | ||
| Within the runtime, the <code>get_tls</code> macro loads its argument register
 | ||
| with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
 | ||
| contains the <code>m</code> pointer.
 | ||
| The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| get_tls(CX)
 | ||
| MOVL	g(CX), AX     // Move g into AX.
 | ||
| MOVL	g_m(AX), BX   // Move g->m into BX.
 | ||
| </pre>
 | ||
| 
 | ||
| <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
 | ||
| 
 | ||
| <p>
 | ||
| The assembly code to access the <code>m</code> and <code>g</code>
 | ||
| pointers is the same as on the 386, except it uses <code>MOVQ</code> rather than
 | ||
| <code>MOVL</code>:
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| get_tls(CX)
 | ||
| MOVQ	g(CX), AX     // Move g into AX.
 | ||
| MOVQ	g_m(AX), BX   // Move g->m into BX.
 | ||
| </pre>
 | ||
| 
 | ||
| <h3 id="arm">ARM</h3>
 | ||
| 
 | ||
| <p>
 | ||
| The registers <code>R10</code> and <code>R11</code>
 | ||
| are reserved by the compiler and linker.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| <code>R10</code> points to the <code>g</code> (goroutine) structure.
 | ||
| Within assembler source code, this pointer must be referred to as <code>g</code>;
 | ||
| the name <code>R10</code> is not recognized.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| To make it easier for people and compilers to write assembly, the ARM linker
 | ||
| allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
 | ||
| that may not be expressible using a single hardware instruction.
 | ||
| It implements these forms as multiple instructions, often using the <code>R11</code> register
 | ||
| to hold temporary values.
 | ||
| Hand-written assembly can use <code>R11</code>, but doing so requires
 | ||
| being sure that the linker is not also using it to implement any of the other
 | ||
| instructions in the function.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
 | ||
| tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
 | ||
| </p>
 | ||
| 
 | ||
| <p>
 | ||
| The name <code>SP</code> always refers to the virtual stack pointer described earlier.
 | ||
| For the hardware register, use <code>R13</code>.
 | ||
| </p>
 | ||
| 
 | ||
| <h3 id="unsupported_opcodes">Unsupported opcodes</h3>
 | ||
| 
 | ||
| <p>
 | ||
| The assemblers are designed to support the compiler so not all hardware instructions
 | ||
| are defined for all architectures: if the compiler doesn't generate it, it might not be there.
 | ||
| If you need to use a missing instruction, there are two ways to proceed.
 | ||
| One is to update the assembler to support that instruction, which is straightforward
 | ||
| but only worthwhile if it's likely the instruction will be used again.
 | ||
| Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
 | ||
| and <code>WORD</code> directives
 | ||
| to lay down explicit data into the instruction stream within a <code>TEXT</code>.
 | ||
| Here's how the 386 runtime defines the 64-bit atomic load function.
 | ||
| </p>
 | ||
| 
 | ||
| <pre>
 | ||
| // uint64 atomicload64(uint64 volatile* addr);
 | ||
| // so actually
 | ||
| // void atomicload64(uint64 *res, uint64 volatile *addr);
 | ||
| TEXT runtime·atomicload64(SB), NOSPLIT, $0-8
 | ||
| 	MOVL	ptr+0(FP), AX
 | ||
| 	LEAL	ret_lo+4(FP), BX
 | ||
| 	BYTE $0x0f; BYTE $0x6f; BYTE $0x00	// MOVQ (%EAX), %MM0
 | ||
| 	BYTE $0x0f; BYTE $0x7f; BYTE $0x03	// MOVQ %MM0, 0(%EBX)
 | ||
| 	BYTE $0x0F; BYTE $0x77			// EMMS
 | ||
| 	RET
 | ||
| </pre>
 |