ARM模式、THUMB模式若干工程实践问题

ARM架构中有一个CPSR寄存器,它的的bit-5是T位(Thumb state flag),置1表示THUMB模式,置0表示ARM模式。二者区别很多,对于逆向工程来说,可以简单理解成ARM模式都是4字节指令,THUMB模式尽可能采用2字节指令编码方案;这种说法很不严谨,但不影响大局。此处只考虑32-bits ARM。

☆ ARM模式、THUMB模式简介

ARM架构中有一个CPSR寄存器,它的的bit-5是T位(Thumb state flag),置1表示THUMB模式,置0表示ARM模式。二者区别很多,对于逆向工程来说,可以简单理解成ARM模式都是4字节指令,THUMB模式尽可能采用2字节指令编码方案;这种说法很不严谨,但不影响大局。此处只考虑32-bits ARM。

一段代码中可以先出现ARM模式的指令,接着设法修改CPSR寄存器T位,切入THUMB模式,在THUMB模式下执行一系列指令后,再次设法修改CPSR寄存器T位,切回ARM模式。这两种模式可以混着用。

☆ GDB中判断当前CPU模式

GDB里如何知道当前CPU模式是ARM模式还是THUMB模式?

参看:

《ARMv5 Architecture Reference Manual》

A1.1.3 Status registers (P31)
A2.5 Program status registers (P49)
A2.5.8 The T and J bits (P53)

CPSR寄存器的bit-5是T位(Thumb state flag),置1表示THUMB模式,置0表示ARM模式。

GDB中可以直接查看CPSR寄存器:

(gdb) p/x $cpsr&0x20
$1 = 0x20

如果等于0x20,表示是THUMB模式,如果等于0,表示是ARM模式。

☆ ARM模式与THUMB模式的切换

1) 切换方案

参看:

《ARMv5 Architecture Reference Manual》
A2.6 Exceptions
A2.8.1 Unaligned instruction fetches (P76)
A3.3 Branch instructions (P113)
A3.10.1 CPSR value (P127)
A4.1.10 BX (P170)
A6.1.1 Entering Thumb state (P496)
A6.1.2 Exceptions (P497)
A6.3.3 Branch with exchange (P501)
A7.1.19 BX (P548)
A7.1.49 POP (P598)

异常处理始终在ARM模式进行,异常处理完成后靠SPSR恢复T位。

有多种方案修改CPSR寄存器T位,最常用的是BX指令,它无论如何都会修改T位,不管当前是哪种模式。BX指令的伪操作如下:

CPSR T bit = Rm[0]
PC = Rm AND 0xFFFFFFFE

现实世界中很多代码利用BX指令从ARM模式切至THUMB模式,上述第一条伪操作给很多人带来误解,认为PC寄存器的bit-0用于确定CPU模式;事实上只有CPSR的T位用于确定CPU模式,仅仅是BX的Rm[0]可以为1,以此修改T位,而Rm被装载到PC寄存器时,Rm[0]被掩码按位与掉了,PC寄存器的bit-0永远为0,无论哪种模式。

ARM模式与THUMB模式的切换有多种具体实现,对于编写shellcode的人群,小结两种实现:

--------------------------------------------------------------------------
ARM->THUMB

.arm
add ip,pc,#1
bx ip
.thumb

THUMB->ARM

.thumb
mov ip,pc
bx ip
.arm
--------------------------------------------------------------------------

两种实现没有考虑规避’\0’或者出现在可打印字符范围这类问题。

如果觉得这里坑多,参看:

《ARMv5 Architecture Reference Manual》
A2.4.3 Register 15 and the program counter (P47)

When an instruction reads the PC, the value read depends on whichinstruction set it comes from:

For an ARM instruction, the value read is the address of the instructionplus 8 bytes. Bits [1:0] of this value are always zero, because ARM instructions are always word-aligned.

For a Thumb instruction, the value read is the address of the instruction plus 4 bytes. Bit [0] of this value is always zero, because Thumb instructions are always halfword-aligned.

2) arm_thumb_switch_1.s

$ vi arm_thumb_switch_1.s

--------------------------------------------------------------------------
.syntax divided

.arch armv5te

.section .text

.globl _start

_start:

.arm

mov r2,#14
adr r1,msg_0
mov r0,#1
mov r7,#4
svc #0

add r0,pc,#1
bx r0

.thumb

mov r2,#12
add r1,pc,#0x38
add r1,#2
mov r0,#1
mov r7,#4
svc #0x2f

mov r0,pc
bx r0

.align 2

.arm

mov r2,#10
adr r1,msg_2
mov r0,#1
mov r7,#4
svc #0

eor r0,r0,r0
mov r7,#1
svc #0

msg_0:

.ascii "Hello, world.\n"

msg_1:

.ascii "thumb mode.\n"

msg_2:

.ascii "arm mode.\n"
--------------------------------------------------------------------------

$ as -o arm_thumb_switch_1.o arm_thumb_switch_1.s
$ ld -N -o arm_thumb_switch_1 arm_thumb_switch_1.o

$ ./arm_thumb_switch_1
Hello, world.
thumb mode.
arm mode.

arm_thumb_switch_1首先在ARM模式下运行,调用:

write( stdout, msg_0, 14 )

接着利用BX指令修改CPSR寄存器T位切入THUMB模式,在THUMB模式下调用:

write( stdout, msg_1, 12 )

再次利用BX指令修改CPSR寄存器T位切回ARM模式,在ARM模式下调用:

write( stdout, msg_2, 10 )
_exit( 0 )

3) ELF for the ARM Architecture

$ file -b arm_thumb_switch_1
ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped

ld时我故意没有指定”-s”,如果strip过,后面的实验会变。

$ readelf -Wa arm_thumb_switch_1
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x10054
Start of program headers: 52 (bytes into file)
Start of section headers: 684 (bytes into file)
Flags: 0x5000200, Version5 EABI, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 5

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00010054 000054 000070 00 WAX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 0000c4 00001b 00 0 0 1
[ 3] .symtab SYMTAB 00000000 0000e0 000130 10 4 11 4
[ 4] .strtab STRTAB 00000000 000210 00006b 00 0 0 1
[ 5] .shstrtab STRTAB 00000000 00027b 000031 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)

There are no section groups in this file.

Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000054 0x00010054 0x00010054 0x00070 0x00070 RWE 0x4

Section to Segment mapping:
Segment Sections...
00 .text

There is no dynamic section in this file.

There are no relocations in this file.

There are no unwind sections in this file.

Symbol table '.symtab' contains 19 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00010054 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 FILE LOCAL DEFAULT ABS arm_thumb_switch_1.o
4: 00010054 0 NOTYPE LOCAL DEFAULT 1 $a // arm code
5: 000100a0 0 NOTYPE LOCAL DEFAULT 1 msg_0
6: 00010070 0 NOTYPE LOCAL DEFAULT 1 $t // thumb code
7: 00010080 0 NOTYPE LOCAL DEFAULT 1 $a // arm code
8: 000100ba 0 NOTYPE LOCAL DEFAULT 1 msg_2
9: 000100a0 0 NOTYPE LOCAL DEFAULT 1 $d // literal data
10: 000100ae 0 NOTYPE LOCAL DEFAULT 1 msg_1
11: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 _bss_end__
12: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 __bss_start__
13: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 __bss_end__
14: 00010054 0 NOTYPE GLOBAL DEFAULT 1 _start
15: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 __bss_start
16: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 __end__
17: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 _edata
18: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 _end

No version information found in this file.
Attribute Section: aeabi
File Attributes
Tag_CPU_name: "5TE"
Tag_CPU_arch: v5TE
Tag_ARM_ISA_use: Yes
Tag_THUMB_ISA_use: Thumb-1

$ objdump -d arm_thumb_switch_1

arm_thumb_switch_1: file format elf32-littlearm


Disassembly of section .text:

00010054 <_start>:
10054: e3a0200e mov r2, #14
10058: e28f1040 add r1, pc, #64 ; 0x40
1005c: e3a00001 mov r0, #1
10060: e3a07004 mov r7, #4
10064: ef000000 svc 0x00000000
10068: e28f0001 add r0, pc, #1
1006c: e12fff10 bx r0
10070: 220c movs r2, #12
10072: a10e add r1, pc, #56 ; (adr r1, 100ac <msg_0+0xc>)
10074: 3102 adds r1, #2
10076: 2001 movs r0, #1
10078: 2704 movs r7, #4
1007a: df2f svc 47 ; 0x2f
1007c: 4678 mov r0, pc
1007e: 4700 bx r0
10080: e3a0200a mov r2, #10
10084: e28f102e add r1, pc, #46 ; 0x2e
10088: e3a00001 mov r0, #1
1008c: e3a07004 mov r7, #4
10090: ef000000 svc 0x00000000
10094: e0200000 eor r0, r0, r0
10098: e3a07001 mov r7, #1
1009c: ef000000 svc 0x00000000

000100a0 <msg_0>:
100a0: 6c6c6548 .word 0x6c6c6548
100a4: 77202c6f .word 0x77202c6f
100a8: 646c726f .word 0x646c726f
100ac: 0a2e .short 0x0a2e

000100ae <msg_1>:
100ae: 6874 .short 0x6874
100b0: 20626d75 .word 0x20626d75
100b4: 65646f6d .word 0x65646f6d
100b8: 0a2e .short 0x0a2e

000100ba <msg_2>:
100ba: 7261 .short 0x7261
100bc: 6f6d206d .word 0x6f6d206d
100c0: 0a2e6564 .word 0x0a2e6564

objdump能识别出中部(0x10070)的THUMB模式代码,是因为符号表中有几个特殊符号($a、$t):

$ objdump --sym --special-syms arm_thumb_switch_1

arm_thumb_switch_1: file format elf32-littlearm

SYMBOL TABLE:
00010054 l d .text 00000000 .text
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 l df *ABS* 00000000 arm_thumb_switch_1.o
00010054 l .text 00000000 $a // arm code
000100a0 l .text 00000000 msg_0
00010070 l .text 00000000 $t // thumb code
00010080 l .text 00000000 $a // arm code
000100ba l .text 00000000 msg_2
000100a0 l .text 00000000 $d // literal data
000100ae l .text 00000000 msg_1
000100c4 g .text 00000000 _bss_end__
000100c4 g .text 00000000 __bss_start__
000100c4 g .text 00000000 __bss_end__
00010054 g .text 00000000 _start
000100c4 g .text 00000000 __bss_start
000100c4 g .text 00000000 __end__
000100c4 g .text 00000000 _edata
000100c4 g .text 00000000 _end

arm_thumb_switch_1中保留了as产生的$a、$t,objdump靠这些信息识别出中部(0x10070)的THUMB模式代码。

关于.symtab section中的$a、$t、$d,参看:

参看:

《ELF for the ARM Architecture》
《ARM Mapping Symbols》

$ cp arm_thumb_switch_1 arm_thumb_switch_1_strip
$ strip arm_thumb_switch_1_strip
$ file -b arm_thumb_switch_1_strip
ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, stripped

arm_thumb_switch_1_strip是strip过的,没有$a、$t。

$ objdump --sym --special-syms arm_thumb_switch_1_strip

arm_thumb_switch_1_strip: file format elf32-littlearm

SYMBOL TABLE:
no symbols

此时objdump无法识别出中部的THUMB模式代码:

$ objdump -d arm_thumb_switch_1_strip

arm_thumb_switch_1_strip: file format elf32-littlearm


Disassembly of section .text:

00010054 <.text>:
10054: e3a0200e mov r2, #14
10058: e28f1040 add r1, pc, #64 ; 0x40
1005c: e3a00001 mov r0, #1
10060: e3a07004 mov r7, #4
10064: ef000000 svc 0x00000000
10068: e28f0001 add r0, pc, #1
1006c: e12fff10 bx r0
10070: a10e220c tstge lr, ip, lsl #4
10074: 20013102 andcs r3, r1, r2, lsl #2
10078: df2f2704 svcle 0x002f2704
1007c: 47004678 smlsdxmi r0, r8, r6, r4
10080: e3a0200a mov r2, #10
10084: e28f102e add r1, pc, #46 ; 0x2e
10088: e3a00001 mov r0, #1
1008c: e3a07004 mov r7, #4
10090: ef000000 svc 0x00000000
10094: e0200000 eor r0, r0, r0
10098: e3a07001 mov r7, #1
1009c: ef000000 svc 0x00000000
100a0: 6c6c6548 cfstr64vs mvdx6, [ip], #-288 ; 0xfffffee0
100a4: 77202c6f strvc r2, [r0, -pc, ror #24]!
100a8: 646c726f strbtvs r7, [ip], #-623 ; 0xfffffd91
100ac: 68740a2e ldmdavs r4!, {r1, r2, r3, r5, r9, fp}^
100b0: 20626d75 rsbcs r6, r2, r5, ror sp
100b4: 65646f6d strbvs r6, [r4, #-3949]! ; 0xfffff093
100b8: 72610a2e rsbvc r0, r1, #188416 ; 0x2e000
100bc: 6f6d206d svcvs 0x006d206d
100c0: 0a2e6564 beq 0xba9658

0x10070处THUMB模式代码按ARM模式反汇编了。注意,$a、$t是否存在只影响ELF工具的反汇编效果,不影响实际执行效果。

4) GDB中ARM模式与THUMB模式的切换

set arm fallback-mode (arm|thumb|auto)

gdb uses the symbol table, when available, to determine whether instructions are ARM or Thumb. This command controls gdb’s default behavior when the symbol table is not available. The default is auto, which causes gdb to use the current execution mode (from the T bit in the CPSR register).

GDB尝试寻找$a、$t,找不到时GDB按此设置切换模式。如果此设置是auto,GDB从CPSR中取T位来确定模式。

set arm force-mode (arm|thumb|auto)

This command overrides use of the symbol table to determine whether instructions are ARM or Thumb. The default is auto, which causes gdb to use the symbol table and then the setting of “set arm fallback-mode”.

若此设置为auto,GDB受”set arm fallback-mode”影响,否则完全不理符号表,强制使用指定模式。如果正在逆向非ELF格式的裸格式固件,忘了符号表吧。

如果CPU在THUMB模式,但某些地址处的代码实际是ARM模式的,此时可以

"set arm force-mode arm"之后"x/5i"。
--------------------------------------------------------------------------

$ gdb -q -nx ./arm_thumb_switch_1_strip

(gdb) starti
Starting program: /tmp/arm_thumb_switch_1_strip

Program stopped.
0x00010054 in ?? ()
(gdb) display/5i $pc
1: x/5i $pc
=> 0x10054: mov r2, #14
0x10058: add r1, pc, #64 ; 0x40
0x1005c: mov r0, #1
0x10060: mov r7, #4
0x10064: svc 0x00000000
(gdb) p/x $cpsr&0x20
$1 = 0x0

CPU当前是ARM模式,尝试反汇编0x10070处的THUMB模式代码:

(gdb) x/8i 0x10070
0x10070: tstge lr, r12, lsl #4
0x10074: andcs r3, r1, r2, lsl #2
0x10078: svcle 0x002f2704
0x1007c: ; <UNDEFINED> instruction: 0x47004678
0x10080: mov r2, #10
0x10084: add r1, pc, #46 ; 0x2e
0x10088: mov r0, #1
0x1008c: mov r7, #4
(gdb) set arm force-mode thumb
(gdb) x/8i 0x10070
0x10070: movs r2, #12
0x10072: add r1, pc, #56 ; (adr r1, 0x100ac)
0x10074: adds r1, #2
0x10076: movs r0, #1
0x10078: movs r7, #4
0x1007a: svc 47 ; 0x2f
0x1007c: mov r0, pc
0x1007e: bx r0

(gdb) tb *0x1007e
Temporary breakpoint 1 at 0x1007e
(gdb) c
Continuing.
Hello, world.
thumb mode.

Temporary breakpoint 1, 0x0001007e in ?? ()
1: x/5i $pc
=> 0x1007e: bx r0
0x10080: movs r0, #10
0x10082: b.n 0x107c6
0x10084: asrs r6, r5, #32
0x10086: b.n 0x105a8
(gdb) p/x $cpsr&0x20
$2 = 0x20
(gdb) i r r0
r0 0x10080 65664

CPU当前是THUMB模式,尝试反汇编0x10080处的ARM模式代码:

(gdb) x/5i 0x10080
0x10080: movs r0, #10
0x10082: b.n 0x107c6
0x10084: asrs r6, r5, #32
0x10086: b.n 0x105a8
0x10088: movs r1, r0
(gdb) set arm force-mode arm
(gdb) x/5i 0x10080
0x10080: mov r2, #10
0x10084: add r1, pc, #46 ; 0x2e
0x10088: mov r0, #1
0x1008c: mov r7, #4
0x10090: svc 0x00000000

 

“set arm force-mode”、”set arm fallback-mode”默认均为auto,如果动态调试到某地址,”x/5i $pc”时没有问题,不必手工设置什么,GDB会根据CPSR中的T位自动确定模式。如果不使用上述设置,GDB有个邪门办法在ARM模式下强制查看THUMB模式代码:

$ gdb -q -nx ./arm_thumb_switch_1_strip

(gdb) starti
(gdb) x/8i 0x10070+1
0x10071: movs r2, #12
0x10073: add r1, pc, #56 ; (adr r1, 0x100ac)
0x10075: adds r1, #2
0x10077: movs r0, #1
0x10079: movs r7, #4
0x1007b: svc 47 ; 0x2f
0x1007d: mov r0, pc
0x1007f: bx r0

这事你在GDB手册里可能找不到。注意显示出来的地址,其最低位均置1了。

5) IDA中指定ARM模式与THUMB模式

光标停在指令处,Alt-G,弹出”Segment Register Value”对话框,选中T/Value:

0 ARM模式,反汇编窗口显示”CODE32″
1 THUMB模式,反汇编窗口显示”CODE16″

☆ ARM汇编编程中的几个坑

这节就当是留给那些永远充满好奇心的读者的作业吧,针对arm_thumb_switch_1.s提几个小问题:

1) “.syntax divided”换成”.syntax unified”会如何
2) “.arch armv5te”换成”.arch armv6t2″会如何
3) “.arm”、”.thumb”这些Directives意义何在
4) 如果_start处一上来就是”.thumb”,ld命令该如何写
5) 为什么没有写成”add r1,pc,#0x3a”
6) “.align 2″意义何在

☆ 参考资源

[1] 《ARMv5 Architecture Reference Manual》
https://developer.arm.com/docs/ddi0100/latest/armv5-architecture-reference-manual

[2] 《ELF for the ARM Architecture》
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044e/IHI0044E_aaelf.pdf

[3] 《ARM Mapping Symbols》
https://sourceware.org/binutils/docs/as/ARM_002dDependent.html

Spread the word. Share this post!

Meet The Author

C/ASM程序员

Leave Comment