Journal Articles
Browse in : |
All
> Journals
> CVu
> 312
(7)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Assembly Club
Author: Bob Schmidt
Date: 04 May 2019 23:29:37 +01:00 or Sat, 04 May 2019 23:29:37 +01:00
Summary: Ian Bruntlett compares dialects of assembly code.
Body:
The first rule of Assembly Club is that no-one writes assembly. This article is not intended to teach anyone assembly language, but to help them on that journey. For these particular adventures, I used Ubuntu 18.04.2 LTS (64 bit, x86_64) on a refurbished ThinkPad.
Here are the packages I installed:
- emacs – the one true text editor
- make – for building executables
- nasm – a nice assembler
- yasm – an even nicer assembler
- gas – the assembler we have to put up with because it is available everywhere
- gdb – the GNU debugger.
When it comes to assembler, there are two major dialects – Intel and AT&T. Intel syntax is supported by the bulk of the tutorials and the GNU tools default to AT&T syntax. However, they can be persuaded to accept Intel syntax (experience indicates it is a bit of a compromise).
The other reasons why programmers prefer Intel syntax over AT&T syntax can be found from the manual:
- AT&T immediate operands are preceded by
$
; Intel immediate operands are undelimited (Intelpush 4
is AT&Tpushl $4
). AT&T register operands are preceded by%
; Intel register operands are undelimited. AT&T absolute (as opposed to PC relative) jump/call operands are prefixed by*
; they are undelimited in Intel syntax. - AT&T and Intel syntax use the opposite order for source and destination operands. Intel
add eax, 4
isaddl $4, %eax
. The source, dest convention is maintained for compatibility with previous Unix assemblers. Note that instructions with more than one source operand, such as the enter instruction, do not have reversed order. - In AT&T syntax, the size of memory operands is determined from the last character of the instruction mnemonic. Mnemonic suffixes of
b
,w
,l
andq
specify byte (8-bit),word
(16-bit),long
(32-bit) andquadruple word
(64-bit) memory references. Intel syntax accomplishes this by prefixing memory operands (not the instruction mnemonics) withbyte ptr
,word ptr
,dword ptr
andqword ptr
. Thus, Intelmov al, byte ptr foo
ismovb foo, %al
in AT&T syntax. - Immediate form long jumps and calls are
lcall
/ljmp $section, $offset
in AT&T syntax; the Intel syntax iscall
/jmp far section:offset
. Also, thefar
return instruction islret $stack-adjust
in AT&T syntax; Intel syntax isret far stack-adjust
. - The AT&T assembler does not provide support for multiple section programs. Unix style systems expect all programs to be single sections.
For me, I have a basic rule of thumb to handle the differences between Intel code and AT&T code. For AT&T syntax code, think of the comma between operands as ‘to’ and for Intel syntax code think of the comma as ‘equals’.
How did I get here?
I learned assembly language on the Sinclair QL and Intel x86 processors. I am not an expert these days. I got to grips with Linux by reading How Linux Works [1] and The Linux Command Line [2]. I am currently reading two books and referring to online resources. The books are Introduction to 64 bit Assembly Programming with Linux and OSX [3] and Low-Level Programming [4]. As I progress, I am assembling a scrap book consisting of my findings.
Versions of Hello World
- Listing 1 shows Hello World written for NASM/YASM
This program was built in stages. I used YASM to convert the assembler to an object file and the GNU linker (ld) to create an executable file. The parameters for YASM tell it to format its output as an ELF (Executable and Linking Format) file, with debug records in DWARF2 format (Debugging With Attributed Record Formats). The GNU linker takes the hello.o object file and creates an output executable called
program
.yasm -f elf64 -g dwarf2 hello.asm ld -o program hello.o
- Listing 2 shows Hello World written for the GNU assembler. It only runs on 64-bit Linux. I edited the original [5] to calculate string length when assembled.
I will explain how it gets built later on, using a makefile for GNU Make.
- Listing 3 shows Hello World, written in part-Intel part-AT&T syntax. This is so you can use the GNU assembler whilst almost writing your code in Intel format. A kind of poorly documented poor relation to the previous examples.
It is based on the ‘hello world’ program from Introduction to 64 bit Assembly Programming for Linux and OSX [3].
- Listing 4 shows the Makefile for the previous examples. It is from the makefile for [5] and [6].
It actually does a bit more than that. Typing
make hello-intel
makes the Intel/AT&T variant,make hello
makes the AT&T variant.
%include "syscalls.inc" global _start section .data message: db 'hello, world!', 10 section .text _start: ; 1 system call number should be stored in rax mov rax, __NR_write ; argument #1 in rdi: where to write (descriptor)? mov rdi, 1 ; argument #2 in rsi: where does the string start? mov rsi, message ; argument #3 in rdx: how many bytes to write? mov rdx, 14 ; this instruction invokes a system call syscall quit: mov rax, __NR_exit ; 60 exit mov rdi, 0 ; exit code syscall |
Listing 1 |
#--------------------------------------------------------- # Writes "Hello, World" to the console. # To assemble and run: # gcc -c hello.s && ld hello.o && ./a.out or # gcc -nostdlib hello.s && ./a.out or # as -a=hello.lis --gstabs -o hello.o hello.s # ld -o hello hello.o #--------------------------------------------------------- .include "syscalls-att.inc" .global _start .text _start: # write(1, message, 13) mov $__NR_write, %rax # system call code mov $1, %rdi # file handle 1 is stdout mov $message, %rsi # address of string to output mov $message_len, %rdx # number of bytes syscall # invoke operating system to do the write mov $__NR_exit, %rax # system call code xor %rdi, %rdi # we want return code 0 syscall # invoke operating system to exit .data message: .ascii "Hello, world\n" .equ message_len, . - message |
Listing 2 |
.intel_syntax .global _start # was global start .data # was section .rodata msg: .ascii "Hello, world!\n" .equ msglen, . - msg .text # was section.text _start: mov %rax, 1 #; write( mov %rdi, 1 #; STDOUT_FILENO, lea %rsi, msg #; "Hello, world!\n", mov %rdx, msglen #; sizeof("Hello, world!\n") syscall #; ); mov %rax, 60 #; exit( mov %rdi, 0 #; EXIT_SUCCESS syscall #; ); |
Listing 3 |
ASFLAGS= -a=$*.lis --gstabs hello-intel : hello-intel.o ld -o hello-intel hello-intel.o fib : fib.s gcc -ggdb -no-pie -o fib fib.s hola : hola.s gcc -ggdb -no-pie -o hola hola.s hello : hello.o ld -o hello hello.o all : hello hola fib clean: rm -f hello *.o *.lis rm -f printf hola fib hello-intel |
Listing 4 |
Debugging – gdb
I am no expert on gdb. I have spent a lot of energy just getting things to build and run properly. However, I’ve downloaded a copy of the GDB Quick Reference and pasted it into my assembly scrap book. You start the debugger with gdb executable-name
. If gdb reports that it is ‘reading symbols’, you have managed to create an executable with debug symbols available. You can also check this by typing file executable-name
at a shell prompt. Once in the debugger, there are a whole load of commands available (see the available online documentation). Once in the debugger, I tend to set things up with these commands:
break _start start layout src
and then I use either the step
command or the next
commands to trace through the code.
The future
There is a free book available online – Intel 64-bit Assembly Language Programming with Ubuntu [7] – that I will be working through. And I will be using Google and accu-general for help as well.
Thank you
I would like to thank Tom Hughes, Bill Somerville, Jonathan Wakely and Ahtu Truu for their patience and help on accu-general.
References
[1] Ward, Brian (2014) How Linux Works: What Every Superuser Should Know (2nd ed.), No Starch Press, ISBN-13: 978-1593275679
[2] Shotts, William E. Jr. (2019) The Linux Command Line: A Complete Introduction (2nd ed.), No Starch Press, ISBN-13: 978-1593279523
[3] Seyfarth, Ray (2014) Introduction to 64 Bit Assembly Programming for Linux and OS X (3rd ed.), CreateSpace Independent Publishing Platform, ISBN-13: 978-1484921906
[4] Zhirkov, Igor (2017) Low-Level Programming: C, Assembly, and Program Execution on Intel 64 Architecture, Apress, ISBN-13: 978-1484224021
[5] The source of the example, and also a learning resource: http://cs.lmu.edu/~ray/notes/gasexamples/
[6] https://www.devdungeon.com/content/how-mix-c-and-assembly
[7] x86-64 Assembly Language Programming with Ubuntu by Ed Jorgensen (2019), http://www.egr.unlv.edu/~ed/assembly64.pdf
Other resources used when learning assembly
- ABI for x64 architecture: http://refspecs.linuxbase.org/elf/index.html
- Assembly language manuals: https://software.intel.com/en-us/articles/intel-sdm
- ‘Bluff your way in x64 assembler’ by Roger Orr from ACCU Conference 2017, available on YouTube
- ‘Enough x86 assembly to be dangerous’ by Charles Bailey from CPPCON 2017, available on YouTube
- GNU gdb manual: https://www.gnu.org/software/gdb/documentation/
- GNU toolchain manuals (make, as, ld): https://www.gnu.org/manual/manual.html
- Introduction to Assembly: https://software.intel.com/en-us/articles/introduction-to-x64-assembly
- YASM manual: http://yasm.tortall.net/
www.contactmorpeth.org.uk). He is learning low-level and other, higher-level, aspects of programming.
On and off, Ian has been programming for some years. He is a volunteer system administrator (among other things) for a mental health charity called Contact (Notes:
More fields may be available via dynamicdata ..