Miasm is a free and open source (GPLv2) reverse engineering framework. Miasm aims to analyze / modify / generate binary programs. Here is a non exhaustive list of features:
- Opening / modifying / generating PE / ELF 32 / 64 LE / BE using Elfesteem
- Assembling / Disassembling X86 / ARM / MIPS / SH4 / MSP430
- Representing assembly semantic using intermediate language
- Emulating using JIT (dynamic code analysis, unpacking, ...)
- Expression simplification for automatic de-obfuscation
- ...
Basic examples
Assembling / Disassembling
Import Miasm x86 architecture:
>>> from miasm2.arch.x86.arch import mn_x86
>>> from miasm2.core.locationdb import LocationDB
>>> loc_db = LocationDB()
>>> l = mn_x86.fromstring('XOR ECX, ECX', loc_db, 32)
>>> print l
XOR ECX, ECX
>>> mn_x86.asm(l)
['1\xc9', '3\xc9', 'g1\xc9', 'g3\xc9']
>>> l.args[0] = mn_x86.regs.EAX
>>> print l
XOR EAX, ECX
>>> a = mn_x86.asm(l)
>>> print a
['1\xc8', '3\xc1', 'g1\xc8', 'g3\xc1']
>>> print mn_x86.dis(a[0], 32)
XOR EAX, ECX
Machine
abstraction:>>> from miasm2.analysis.machine import Machine
>>> mn = Machine('x86_32').mn
>>> print mn.dis('\x33\x30', 32)
XOR ESI, DWORD PTR [EAX]
>>> mn = Machine('mips32b').mn
>>> print mn.dis('97A30020'.decode('hex'), "b")
LHU V1, 0x20(SP)
Intermediate representation
Create an instruction:
>>> machine = Machine('arml')
>>> instr = machine.mn.dis('002088e0'.decode('hex'), 'l')
>>> print instr
ADD R2, R8, R0
>>> ira = machine.ira(loc_db)
>>> ircfg = ira.new_ircfg()
>>> ira.add_instr_to_ircfg(instr, ircfg)
>>> for lbl, irblock in ircfg.blocks.items():
... print irblock.to_string(loc_db)
loc_0:
R2 = R8 + R0
IRDst = loc_4
>>> for lbl, irblock in ircfg.blocks.iteritems():
... for assignblk in irblock:
... rw = assignblk.get_rw()
... for dst, reads in rw.iteritems():
... print 'read: ', [str(x) for x in reads]
... print 'written:', dst
... print
...
read: ['R8', 'R0']
written: R2
read: []
written: IRDst
Emulation
Giving a shellcode:
00000000 8d4904 lea ecx, [ecx+0x4]
00000003 8d5b01 lea ebx, [ebx+0x1]
00000006 80f901 cmp cl, 0x1
00000009 7405 jz 0x10
0000000b 8d5bff lea ebx, [ebx-1]
0000000e eb03 jmp 0x13
00000010 8d5b01 lea ebx, [ebx+0x1]
00000013 89d8 mov eax, ebx
00000015 c3 ret
>>> s = '\x8dI\x04\x8d[\x01\x80\xf9\x01t\x05\x8d[\xff\xeb\x03\x8d[\x01\x89\xd8\xc3'
Container
abstraction:>>> from miasm2.analysis.binary import Container
>>> c = Container.from_string(s)
>>> c
<miasm2.analysis.binary.ContainerUnknown object at 0x7f34cefe6090>
0
:>>> from miasm2.analysis.machine import Machine
>>> machine = Machine('x86_32')
>>> mdis = machine.dis_engine(c.bin_stream)
>>> asmcfg = mdis.dis_multiblock(0)
>>> for block in asmcfg.blocks:
... print block.to_string(asmcfg.loc_db)
...
loc_0
LEA ECX, DWORD PTR [ECX + 0x4]
LEA EBX, DWORD PTR [EBX + 0x1]
CMP CL, 0x1
JZ loc_10
-> c_next:loc_b c_to:loc_10
loc_10
LEA EBX, DWORD PTR [EBX + 0x1]
-> c_next:loc_13
loc_b
LEA EBX, DWORD PTR [EBX + 0xFFFFFFFF]
JMP loc_13
-> c_to:loc_13
loc_13
MOV EAX, EBX
RET
>>> jitter = machine.jitter(jit_type='python')
>>> jitter.init_stack()
>>> run_addr = 0x40000000
>>> from miasm2.jitter.csts import PAGE_READ, PAGE_WRITE
>>> jitter.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, s)
def code_sentinelle(jitter):
jitter.run = False
jitter.pc = 0
return True
>>> jitter.add_breakpoint(0x1337beef, code_sentinelle)
>>> jitter.push_uint32_t(0x1337beef)
>>> jitter.set_trace_log()
>>> jitter.init_run(run_addr)
>>> jitter.continue_run()
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000000 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
RIP 0000000040000000
40000000 LEA ECX, DWORD PTR [ECX+0x4]
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
....
4000000e JMP loc_0000000040000013:0x40000013
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
RIP 0000000040000013
40000013 MOV EAX, EBX
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
RIP 0000000040000013
40000015 RET
>>>
>>> jitter.vm
ad 1230000 size 10000 RW_ hpad 0x2854b40
ad 40000000 size 16 RW_ hpad 0x25e0ed0
>>> hex(jitter.cpu.EAX)
'0x0L'
>>> jitter.cpu.ESI = 12
Symbolic execution
Initializing the IR pool:
>>> ira = machine.ira(loc_db)
>>> ircfg = ira.new_ircfg_from_asmcfg(asmcfg)
>>> from miasm2.ir.symbexec import SymbolicExecutionEngine
>>> sb = SymbolicExecutionEngine(ira)
>>> symbolic_pc = sb.run_at(ircfg, 0)
>>> print symbolic_pc
((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
>>> sb = SymbolicExecutionEngine(ira, machine.mn.regs.regs_init)
>>> symbolic_pc = sb.run_at(ircfg, 0, step=True)
Instr LEA ECX, DWORD PTR [ECX + 0x4]
Assignblk:
ECX = ECX + 0x4
________________________________________________________________________________
ECX = ECX + 0x4
________________________________________________________________________________
Instr LEA EBX, DWORD PTR [EBX + 0x1]
Assignblk:
EBX = EBX + 0x1
________________________________________________________________________________
EBX = EBX + 0x1
ECX = ECX + 0x4
________________________________________________________________________________
Instr CMP CL, 0x1
Assignblk:
zf = (ECX[0:8] + -0x1)?(0x0,0x1)
nf = (ECX[0:8] + -0x1)[7:8]
pf = parity((ECX[0:8] + -0x1) & 0xFF)
of = ((ECX[0:8] ^ (ECX[0:8] + -0x1)) & (ECX[0:8] ^ 0x1))[7:8]
cf = (((ECX[0:8] ^ 0x1) ^ (ECX[0:8] + -0x1)) ^ ((ECX[0:8] ^ (ECX[0:8] + -0x1)) & (ECX[0:8] ^ 0x1)))[7:8]
af = ((ECX[0:8] ^ 0x1) ^ (ECX[0:8] + -0x1))[4:5]
________________________________________________________________________________
af = (((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[4:5]
pf = parity((ECX + 0x4)[0:8] + 0xFF)
zf = ((ECX + 0x4)[0:8] + 0xFF)?(0x0,0x1)
ECX = ECX + 0x4
of = ((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1))[7:8]
nf = ((ECX + 0x4)[0:8] + 0xFF)[7:8]
cf = (((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1)) ^ ((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[7:8]
EBX = EBX + 0x1
________________________________________________________________________________
Instr JZ loc_key_1
Assignblk:
IRDst = zf?(loc_key_1,loc_key_2)
EIP = zf?(loc_key_1,loc_key_2)
________________________________________________________________________________
af = (((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[4:5]
EIP = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
pf = parity((ECX + 0x4)[0:8] + 0xFF)
IRDst = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
zf = ((ECX + 0x4)[0:8] + 0xFF)?(0x0,0x1)
ECX = ECX + 0x4
of = ((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1))[7:8]
nf = ((ECX + 0x4)[0:8] + 0xFF)[7:8]
cf = (((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1)) ^ ((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[7:8]
EBX = EBX + 0x1
________________________________________________________________________________
>>>
>>> from miasm2.expression.expression import ExprInt
>>> sb.symbols[machine.mn.regs.ECX] = ExprInt(-3, 32)
>>> symbolic_pc = sb.run_at(ircfg, 0, step=True)
Instr LEA ECX, DWORD PTR [ECX + 0x4]
Assignblk:
ECX = ECX + 0x4
________________________________________________________________________________
af = (((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[4:5]
EIP = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
pf = parity((ECX + 0x4)[0:8] + 0xFF)
IRDst = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
zf = ((ECX + 0x4)[0:8] + 0xFF)?(0x0,0x1)
ECX = 0x1
of = ((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1))[7:8]
nf = ((ECX + 0x4)[0:8] + 0xFF)[7:8]
cf = (((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1)) ^ ((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[7:8]
EBX = EBX + 0x1
________________________________________________________________________________
Instr LEA EBX, DWORD PTR [EBX + 0x1]
Assignblk:
EBX = EBX + 0x1
________________________________________________________________________________
af = (((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[4:5]
EIP = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
pf = parity((ECX + 0x4)[0:8] + 0xFF)
IRDst = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
zf = ((ECX + 0x4)[0:8] + 0xFF)?(0x0,0x1)
ECX = 0x1
of = ((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1))[7:8]
nf = ((ECX + 0x4)[0:8] + 0xFF)[7:8]
cf = (((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1)) ^ ((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[7:8]
EBX = EBX + 0x2
________________________________________________________________________________
Instr CMP CL, 0x1
Assignblk:
zf = (ECX[0:8] + -0x1)?(0x0,0x1)
nf = (ECX[0:8] + -0x1)[7:8]
pf = parity((ECX[0:8] + -0x1) & 0xFF)
of = ((ECX[0:8] ^ (ECX[0:8] + -0x1)) & (ECX[0:8] ^ 0x1))[7:8]
cf = (((ECX[0:8] ^ 0x1) ^ (ECX[0:8] + -0x1)) ^ ((ECX[0:8] ^ (ECX[0:8] + -0x1)) & (ECX[0:8] ^ 0x1)))[7:8]
af = ((ECX[0:8] ^ 0x1) ^ (ECX[0:8] + -0x1))[4:5]
________________________________________________________________________________
af = 0x0
EIP = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
pf = 0x1
IRDst = ((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
zf = 0x1
ECX = 0x1
of = 0x0
nf = 0x0
cf = 0x0
EBX = EBX + 0x2
________________________________________________________________________________
Instr JZ loc_key_1
Assignblk:
IRDst = zf?(loc_key_1,loc_key_2)
EIP = zf?(loc_key_1,loc_key_2)
________________________________________________________________________________
af = 0x0
EIP = 0x10
pf = 0x1
IRDst = 0x10
zf = 0x1
ECX = 0x1
of = 0x0
nf = 0x0
cf = 0x0
EBX = EBX + 0x2
________________________________________________________________________________
Instr LEA EBX, DWORD PTR [EBX + 0x1]
Assignblk:
EBX = EBX + 0x1
________________________________________________________________________________
af = 0x0
EIP = 0x10
pf = 0x1
IRDst = 0x10
zf = 0x1
ECX = 0x1
of = 0x0
nf = 0x0
cf = 0x0
EBX = EBX + 0x3
________________________________________________________________________________
Instr LEA EBX, DWORD PTR [EBX + 0x1]
Assignblk:
IRDst = loc_key_3
________________________________________________________________________________
af = 0x0
EIP = 0x10
pf = 0x1
IRDst = 0x13
zf = 0x1
ECX = 0x1
of = 0x0
nf = 0x0
cf = 0x0
EBX = EBX + 0x3
________________________________________________________________________________
Instr MOV EAX, EBX
Assignblk:
EAX = EBX
________________________________________________________________________________
af = 0x0
EIP = 0x10
pf = 0x1
IRDst = 0x13
zf = 0x1
ECX = 0x1
of = 0x0
nf = 0x0
cf = 0x0
EBX = EBX + 0x3
EAX = EBX + 0x3
________________________________________________________________________________
Instr RET
Assignblk:
IRDst = @32[ESP[0:32]]
ESP = {ESP[0:32] + 0x4 0 32}
EIP = @32[ESP[0:32]]
________________________________________________________________________________
af = 0x0
EIP = @32[ESP]
pf = 0x1
IRDst = @32[ESP]
zf = 0x1
ECX = 0x1
of = 0x0
nf = 0x0
cf = 0x0
EBX = EBX + 0x3
ESP = ESP + 0x4
EAX = EBX + 0x3
________________________________________________________________________________
>>>
How does it work?
Miasm embeds its own disassembler, intermediate language and instruction semantic. It is written in Python.
To emulate code, it uses LLVM, GCC, Clang or Python to JIT the intermediate representation. It can emulate shellcodes and all or parts of binaries. Python callbacks can be executed to interact with the execution, for instance to emulate library functions effects.
Documentation
TODO
An auto-generated documentation is available here.
Obtaining Miasm
- Clone the repository: Miasm on GitHub
- Get one of the Docker images at Docker Hub
Software requirements
Miasm uses:
- python-pyparsing
- python-dev
- elfesteem from Elfesteem
- optionally python-pycparser (version >= 2.17)
- GCC
- Clang
- LLVM with Numba llvmlite, see below
- Z3, the Theorem Prover
Configuration
- Install elfesteem
git clone https://github.com/serpilliere/elfesteem.git elfesteem
cd elfesteem
python setup.py build
sudo python setup.py install
- GCC (any version)
- Clang (any version)
- LLVM
- Debian (testing/unstable): Not tested
- Debian stable/Ubuntu/Kali/whatever:
pip install llvmlite
or install from llvmlite - Windows: Not tested
- Build and install Miasm:
$ cd miasm_directory
$ python setup.py build
$ sudo python setup.py install
Windows & IDA
Most of Miasm's IDA plugins use a subset of Miasm functionnality. A quick way to have them working is to add:
elfesteem
directory andpyparsing.py
toC:\...\IDA\python\
orpip install pyparsing elfesteem
miasm2/miasm2
directory toC:\...\IDA\python\
Testing
Miasm comes with a set of regression tests. To run all of them:
cd miasm_directory/test
python test_all.py
- Mono threading:
-m
- Code coverage instrumentation:
-c
- Only fast tests:
-t long
(excludes the long tests)
They already use Miasm
Tools
- Sibyl: A function divination too
- R2M2: Use miasm2 as a radare2 plugin
- CGrex : Targeted patcher for CGC binaries
- ethRE Reversing tool for Ethereum EVM (with corresponding Miasm2 architecture)
Blog posts / papers / conferences
- Deobfuscation: recovering an OLLVM-protected program
- Taming a Wild Nanomite-protected MIPS Binary With Symbolic Execution: No Such Crackme
- Génération rapide de DGA avec Miasm: Quick computation of DGA (French article)
- Enabling Client-Side Crash-Resistance to Overcome Diversification and Information Hiding: Detect undirected call potential arguments
- Miasm: Framework de reverse engineering (French)
- Tutorial miasm (French video)
- Graphes de dépendances : Petit Poucet style: DepGraph (French)
Books
- Practical Reverse Engineering: X86, X64, Arm, Windows Kernel, Reversing Tools, and Obfuscation: Introduction to Miasm (Chapter 5 "Obfuscation")
- BlackHat Python - Appendix: Japan security book's samples
Misc
- Man, does miasm has a link with rr0d?
- Yes! crappy code and uggly documentation.