diff --git a/psudocode.txt b/psudocode.txt new file mode 100644 index 0000000..823b0f9 --- /dev/null +++ b/psudocode.txt @@ -0,0 +1,212 @@ + +Key Contributions to look out for that make this work in practical time: + 1. First class environments that: + a. Have IDs + b. Can either be "real", in which case it maps symbols to values, + or "fake", in which case it maps symbols to themselves, but with the env ID as it's for-progress + c. Chain up to an upper environment that may be fake or real + 2. AST nodes that maintain on-node: + a. The IDs of environments that, if "real", can be used to make progress in this subtree + b. The hashes of infinite recursive calls that were detected and stopped - if this hash isn't in the current call chain, this subtree can make progress + c. Extra IDs of environments that are "real" but have "fake" environments in their chain - this is used to make return value checking fast O(1 or log n, depending) + 3. Combiners, both user-defined and built in (including that maintain a "wrap level" that: + a. Is a property of this function value, *not* the function itself + * meaning that if wrap_level > 1, you can evaluate each parameter and decrement wrap_level, even if you can't execute the call + 4. The return value of a combiner is checked for: + a. If it is a value, in which case it is good to be returned if it doesn't contain a reference to the envID of the function it is being returned from + b. If it is (veval something env) where env doesn't contain a reference to the envID of the function it is being returned from + c. If it is a call to a function (func params...) and func doesn't take in a dynamic environment and params... are all good to be returned + This makes it so that combiner calls can return partially-evaluated code - any macro-like combiner would calculate the new code and return + (eval dynamic_env), which would do what partial evaluation it could and either become a value or a call like case "b" above. + Case "b" allows this code essentially "tagged" with the environment it should be evaluated in to be returned out of "macro-like" combiners, + and this dovetails with the next point + 5. The (veval something env) form essentially "tags" a piece of code with the environment it should be evaluated in. At each stage where + it is possible, the system checks for redundent constructions like these, where the env in (veval something env) is the currently active env. + In this case, it unwraps it to just "something" and continues on - this completes the second half of the macro-like combiner evaluation where + after being returned to the calling function the code is essentially spliced in. + 6. The compiler can emit if/else branches on the wrap_level of combiners and in each branch further compile/partial eval if appropriate, allowing + dynamic calls to either functions or combiners with the overhead of a single branch + +Note that points 4&5 make it so that any macro written as a combiner in "macro-style" will be expanded just like a macro would and cause no runtime overhead! +Additionally, point 6 makes it so that functions (wrap level 1 combiners) and non-parameter-evaluating (wrap level 0) combiners can be dynamically passed around and called with very minimal overhead. +Combine them together and you get a simpler but more flexiable semantics than macro based (pure functional) languages with little-to-no overhead. + +Additional tricky spots to look out for: + 1. If you don't do the needed-for-progress tracking, you have exponential runtime + 2. If you aren't careful about storing analysis information on the AST node itself or memoize, a naive tree traversal of the DAG has exponential runtime + 3. Infinite recursion can hide in sneaky places, including the interply between the partial evaluator and the compiler, and careful use of multiple recursion blockers / memoization is needed to prevent all cases + 4. The invarients needed to prevent mis-evaluation are non-trivial to get right. Our invarients: + a. All calls to user-combiners have the parameters as total values, thus not moving something that needs a particular environment underneath a different environment + b. All return values from functions must not depend on the function's environment (there are a couple of interesting cases here, see combiner_return_ok(func_result, env_id)) + c. All array values are made up of total values + d. Some primitive combiners don't obey "a", but they must be written with extreme care, and often partially evaluate only some of their parameters and have to keep track of which. + + + +Everything operates on AST nodes, an ADT: + * val - integers, strings, booleans + * marked_array + * marked_symbol + * comb + * prim_comb + * marked_env + +Each AST node contains a hash representing it&it's subtree. + +fun needed_for_progress(ast_node) -> (progress_IDs, rec_stopping_hashes, extra_IDs): + returns + - environment IDs (stored in each AST node for it and it's children) + that must have real values if the partial evaluation of the subtree rooted at + this node is going to make progress partial evaluating. + + progress_IDs is either true (meaning it will make progress no matter what), an + intset of env IDs (the ones that will cause it to make progress), or an empty + set, meaning it can't make forward progress no matter what + - hashes that if you're not inside the evaluation of, it could make progress + - extra IDs for envs it contains that don't count as forward progress IDs because the + env does have values, but envs in it's parent chain doesn't have values. + +The calculation for needed_for_progress is straightforward-ish, with some tricky bits at comb and array. + +Under these definitions, we call an AST subtree a "total val" if it is either a val or it's needed-for-progress IDs is nil. + +fun mark(x, eval_pos): + x is env -> error + x is combiner -> error + x is symbol -> if x == true than MarkedVal(true) + else if x == false than MarkedVal(false) + else MarkedSymbol(x, needed_IDs=if eval_pos true else nil) + x is array -> + MarkedArray(is_val=!eval_pos, attempted=false, resume_hashes=nil, + values = [mark(x[0], eval_pos)] + [mark(xi, false) for xi in x[1:]]) + true -> MarkedVal(x) + +fun strip(x) -> value: + if X is an AST node representing a value, it returns the value. + May strip recursively in the case of an array value, etc. + Errors on env, comb (but not prim_comb!) non value symbols or arrays + +fun try_unval(x) -> Result: + //Removes one level of "value-ness". + x is Array -> if !x.array_is_val Error() + else Ok(MarkedArray(is_value=false, + values = [try_unval(x.values[0])] + x.values[1:])) + x is Symbol -> if !x.symbol_is_val Error() + else Ok(MarkedSymbol(symbol=x.symbol, is_value=false)) + true -> Ok(x) + +fun check_for_env_id_in_result(env_id, x): + return env_id in + if either progress_IDs or extra_IDs is true, then we have a fallback, but + that doesn't get called even on large testcases so it's either rare or impossible. + Fallback is slow though, whereas this is just a check for set membership + +// We only allow returning a value out of a combiner if the return value +// doesn't reference the environment of the combiner +fun combiner_return_ok(func_result, env_id): + func_result isn't later -> !check_for_env_id_in_result(env_id, func_result) + // special cases now + (veval body {env}) => (combiner_return_ok {env}) + // The reason we don't have to check body is that this form is only creatable in ways that body was origionally a value and only need {env} + // Either it's created by eval, in which case it's fine, or it's created by something like (eval (array veval x de) de2) and the array has checked it, + // or it's created via literal vau invocation, in which case the body is a value. + (func ...params) => func doesn't take dynamic env && all params are combiner_return_ok + otherwise -> false + +// We may end up in situations where the value/code we care about is wrapped up in +// a redundent call to veval, namely after sucessfully returning based on combiner_return_ok above. +// This call may prevent other optimizations though, so we should unwrap the redundent call if possible, +// and if it causes a change we should re-partially-evaluate to make further progress if we can +fun drop_redundent_veval(x, dynamic_env, env_stack, memostuff): + (veval node env) if env.id == dynamic_env.id -> drop_redundent_veval(node, dynamic_env, env_stack, memostuff) + (comb params...) if comb.wrap_level != -1 -> map drop_redundent_veval over params and if any change: partial_eval( (comb new_params...), dynamic_env, env_stack, memostuff) + else: x + else -> x + +fun make_tmp_inner_env(params, de?, ue, env_id): + ... + + +fun partial_eval_helper(x, only_head, env, env_stack, memostuff, force): + needed, hashes, _extra = needed_for_partial_eval(x) + if force || one of hashes is not in memostuff || needed == true || set_intersection(needed, env_stack.set_of_ids_that_are_vals) != empty_set: + x is MarkedVal -> x + x is MarkedEnv -> find(x.env_id == it.env_id, env_stack) ?: x + x is MarkedComb -> if !env.is_real && !x.se.is_real // both aren't real, re-evaluation of closure creation site + || env.is_real && !x.se.is_real // new env real, but se isn't - the creation of the closure! + then let inner_env = make_tmp_inner_env(x.params, x.de?, env, x.env_id) + in MarkedComb(se=env, body=partial_eval_helper(body, false, inner_env, add inner_env to env_stack, memostuff, false)) + x is MarkedPrimComb -> x + x is MarkedSymbol -> if x.is_val then x + else env_lookup_helper(x, env) + x is MarkedArray -> if x.is_val then x + else ...TODO... + + +And then we define a root_env with PrimComb versions of all of the standard functions. +The ones that are most interesting and interact the most with partial evaluation are + vau eval cond +The other key is that array only takes in values, that is an array value never hides something that isn't a total value and needs more partial-evaluation + (this makes a lot of things simpler in other places since we can treat array values as values no matter what and know things aren't hiding in sneaky places) + +fun needs_params_prim(...): + ... +fun give_up_params_prim(...): + ... +fun veval_inner(...): + ... +root_env = { + eval: ... + vapply: ... + lapply: ... + vau: .... + wrap: ... + unwrap: ... + cond: ... + symbol?: needs_params_prim(symbol?) + int?: needs_params_prim(int?) + string?: needs_params_prim(string?) + combiner?: ... + env?: ... + nil?: needs_params_prim(nil?) + bool?: needs_params_prim(bool?) + str-to-symbol: needs_params_prim(str-to-symbol) + get-text: needs_params_prim(get-text) + array?: ... + array: ... + len: ... + idx: ... + slice: ... + concat: ... + +: needs_params_prim(+) + -: needs_params_prim(-) + *: needs_params_prim(*) + /: needs_params_prim(/) + %: needs_params_prim(%) + band: needs_params_prim(band) + bor: needs_params_prim(bor) + bnot: needs_params_prim(bnot) + bxor: needs_params_prim(bxor) + <<: needs_params_prim(<<) + >>: needs_params_prim(>>) + =: needs_params_prim(=) + !=: needs_params_prim(!=) + <: needs_params_prim(<) + <=: needs_params_prim(<=) + >: needs_params_prim(>) + >=: needs_params_prim(>=) + str: needs_params_prim(true_str) + log: give_up_params_prim(log) + error: give_up_params_prim(error) + read-string: needs_params_prim(read-string) + empty_env: MarkedEnv() +} + +fun compile(...): + ... + Note that when it's compiling a call, it compiles an if/else chain on the wrap level of the combiner being called. + in the 0 branch, it emits the parameters as constant data + in the 1 branch, it unval's and partial evals all of the parameters before compiling them. + - note that this must be robust to partial-eval errors, as this branch might not ever happen at runtime and be nonsense code! + - if the partial evaluation errors, it emits a value that will cause an error at runtime into the compiled code + in the > 1 branch, it errors