While the current Pixie interpreter works well it's pretty much just like any other mutable interpreter. But on the other hand we have the PyPy JIT generator that allows us to create interpreters modeled almost any way we want. What would a interpreter look like in a "perfect world"?
For answers to such questions I suggest reading up on the language Eff (http://arxiv.org/pdf/1203.1539v1.pdf). This language specifies an interesting language design, one that makes a distinction between side-effect-free computations, and effects that modify the environment or the system. Having an interpreter that makes this distinction has several benefits:
-
Any computation in the system could be hinted to the JIT as being pure, thus removable if the arguments to the computation are constants.
-
Support for Generators, Exceptions, and Lightweight threads (among other things) could be added via this approach.
-
With the ability to "pause" a thread and with all blocking IO segregated inside effects, the vm could easily integrate async IO libraries such as libuv, removing the need for a GIL.
-
If foreign functions are considered effects (as they should be) then FFI functions could be called in separate threads, allowing for parallel execution of C functions.
-
The upcoming STM solution in PyPy works best for computations that do not perform IO. Once again, this effects system separates code, allowing for better optimizations.
-
If the interpreter is immutable, then really weird things can be done at runtime, including: forking a interpreter, resuming exceptions, re-running a step in the interpreter, even saving the interpreter's state to disk may be possible.
-
Tail Call Optimization is a side-effect of all of this.
-
Deep generators. It's possible to call
yieldwith this approach deep inside collection code. In additionyieldcan be used as inversion of control for Transducers.
Most bytecode interpreters are implemented via mutable state. They often mirror a virtual CPU that has a stack, a stack pointer, an instruction pointer, and perhaps a hashmap of locals. As the program is interpreted, these values are mutated. Each execution of a bytecode advances the instruction pointer, pushing an item to the stack increments the stack pointer, etc. But imagine if all this was immutable, if executing a bytecode created a new instance of the interpreter. What if local maps were implemented via persistent hash maps?
If this could be accomplished, then it would be possible to run an instruction with the state of the interpreter, and then if desired, run it again producing the same results. That is to say, calling interpret(state) with the same state over and over would always produce the same value.
It turns out that this approach is not quite optimal, as having an immutable stack may require splicing stack, copying large sections of the stack, etc. Instead we take the approach of an AST interpreter. The code is represented as immutable trees of structs, each representing an operation. To each of these AST nodes we pass an immutable map of locals. Thus, a let binding becomes as simple as interpret(self._child, locals.assoc(name, interpret(self._binding, locals). Locals are immutable thus we maintain proper scoping.
In this approach (inspired by eclj https://github.com/brandonbloom/eclj/blob/master/src/eclj/interpret/cps.clj) we (ab)use thunks to support effects and TCO. An example implementation can be seen in this repository.
As of today, this POC is finished. It works, and the JIT produces very promising results (on par with the current JIT).
Thus it is now just a matter of refactoring the guts of Pixie to match the new coding styles. Thankfully not much in the
.lisp files of pixie have to change.
- Collections - Anything that calls into pixie or RT will need to be refactored with
@cps - Reader - Will need to be completely refactored. Calling
rdr.read()can now be an effect, that change will ripple through the rest of the system. - Compiler - Completely re-written, the new interpreter is an AST interpreter. This will however drastically simplify the compiler as it will no longer have to track stacks, save constants, etc.
- Interpreter - The AST nodes will replace the interpreter
- RT - Completely rewritten, but since it uses vars, it should be simple to perform this
No! Development of pixie can still continue, the changes will be merged in as possible to the new development branch.
This project contains a very 'touchy' code transformer, known as @cps. This will take an RPython function or method and
transform it into an immutable state-machine via CPS transformation. As is expected with this sort of code mangling there
are many caveats to using this transformer, but they are mostly simple to remember:
-
Calls to functions or methods that end with
_Efare considered to be effect functions (functions that will either return Answer, Effect or a Thunk). Thus at every call to such a function the transformer will create a continuation. -
Generators/Iterators should be avoided. A single step of the function may be run many times, thus it is important to clone any mutable state.
-
The call stack is not persisted across continuations, so be sure that the stack position is 0 at every effect function call. For example:
@cps def foo(x): r = invoke_Ef(x) z = invoke_Ef(r) return z
Is fine, while the following is not:
@cps
def foo(x):
return invoke_Ef(invoke_Ef(x))
Since the outer invoke_ will be loaded before the inner_.
-
Calls to effect functions that immediately return will be turned into tail calls, so prefer this style when possible:
@cps def foo(x): return invoke_Ef(x)
-
Currently (until this restriction is removed) effect calls that take anything but locals as arguments are not supported.
@cps def foo(x): # works x = invoke_Ef(something) # doesn't work x = invoke_Ef(something._zing)
-
breakandcontinuerequire the stack to operate, and as such are not supported -
Since functions are RPython and internally function locals are converted class fields, locals can only have one type. Unlike RPython that supports locals with conflicting types as long as they are redefined between usages.