Started brain-dumping psudocode and descriptions of interesting points/contributions, the reason all macro-like combiners should be partially-evaled away, the invarients we must maintain, etc
This commit is contained in:
212
psudocode.txt
Normal file
212
psudocode.txt
Normal file
@@ -0,0 +1,212 @@
|
||||
|
||||
Key Contributions to look out for that make this work in practical time:
|
||||
1. First class environments that:
|
||||
a. Have IDs
|
||||
b. Can either be "real", in which case it maps symbols to values,
|
||||
or "fake", in which case it maps symbols to themselves, but with the env ID as it's for-progress
|
||||
c. Chain up to an upper environment that may be fake or real
|
||||
2. AST nodes that maintain on-node:
|
||||
a. The IDs of environments that, if "real", can be used to make progress in this subtree
|
||||
b. The hashes of infinite recursive calls that were detected and stopped - if this hash isn't in the current call chain, this subtree can make progress
|
||||
c. Extra IDs of environments that are "real" but have "fake" environments in their chain - this is used to make return value checking fast O(1 or log n, depending)
|
||||
3. Combiners, both user-defined and built in (including that maintain a "wrap level" that:
|
||||
a. Is a property of this function value, *not* the function itself
|
||||
* meaning that if wrap_level > 1, you can evaluate each parameter and decrement wrap_level, even if you can't execute the call
|
||||
4. The return value of a combiner is checked for:
|
||||
a. If it is a value, in which case it is good to be returned if it doesn't contain a reference to the envID of the function it is being returned from
|
||||
b. If it is (veval something env) where env doesn't contain a reference to the envID of the function it is being returned from
|
||||
c. If it is a call to a function (func params...) and func doesn't take in a dynamic environment and params... are all good to be returned
|
||||
This makes it so that combiner calls can return partially-evaluated code - any macro-like combiner would calculate the new code and return
|
||||
(eval <constructed-code> dynamic_env), which would do what partial evaluation it could and either become a value or a call like case "b" above.
|
||||
Case "b" allows this code essentially "tagged" with the environment it should be evaluated in to be returned out of "macro-like" combiners,
|
||||
and this dovetails with the next point
|
||||
5. The (veval something env) form essentially "tags" a piece of code with the environment it should be evaluated in. At each stage where
|
||||
it is possible, the system checks for redundent constructions like these, where the env in (veval something env) is the currently active env.
|
||||
In this case, it unwraps it to just "something" and continues on - this completes the second half of the macro-like combiner evaluation where
|
||||
after being returned to the calling function the code is essentially spliced in.
|
||||
6. The compiler can emit if/else branches on the wrap_level of combiners and in each branch further compile/partial eval if appropriate, allowing
|
||||
dynamic calls to either functions or combiners with the overhead of a single branch
|
||||
|
||||
Note that points 4&5 make it so that any macro written as a combiner in "macro-style" will be expanded just like a macro would and cause no runtime overhead!
|
||||
Additionally, point 6 makes it so that functions (wrap level 1 combiners) and non-parameter-evaluating (wrap level 0) combiners can be dynamically passed around and called with very minimal overhead.
|
||||
Combine them together and you get a simpler but more flexiable semantics than macro based (pure functional) languages with little-to-no overhead.
|
||||
|
||||
Additional tricky spots to look out for:
|
||||
1. If you don't do the needed-for-progress tracking, you have exponential runtime
|
||||
2. If you aren't careful about storing analysis information on the AST node itself or memoize, a naive tree traversal of the DAG has exponential runtime
|
||||
3. Infinite recursion can hide in sneaky places, including the interply between the partial evaluator and the compiler, and careful use of multiple recursion blockers / memoization is needed to prevent all cases
|
||||
4. The invarients needed to prevent mis-evaluation are non-trivial to get right. Our invarients:
|
||||
a. All calls to user-combiners have the parameters as total values, thus not moving something that needs a particular environment underneath a different environment
|
||||
b. All return values from functions must not depend on the function's environment (there are a couple of interesting cases here, see combiner_return_ok(func_result, env_id))
|
||||
c. All array values are made up of total values
|
||||
d. Some primitive combiners don't obey "a", but they must be written with extreme care, and often partially evaluate only some of their parameters and have to keep track of which.
|
||||
|
||||
|
||||
|
||||
Everything operates on AST nodes, an ADT:
|
||||
* val - integers, strings, booleans
|
||||
* marked_array
|
||||
* marked_symbol
|
||||
* comb
|
||||
* prim_comb
|
||||
* marked_env
|
||||
|
||||
Each AST node contains a hash representing it&it's subtree.
|
||||
|
||||
fun needed_for_progress(ast_node) -> (progress_IDs, rec_stopping_hashes, extra_IDs):
|
||||
returns
|
||||
- environment IDs (stored in each AST node for it and it's children)
|
||||
that must have real values if the partial evaluation of the subtree rooted at
|
||||
this node is going to make progress partial evaluating.
|
||||
|
||||
progress_IDs is either true (meaning it will make progress no matter what), an
|
||||
intset of env IDs (the ones that will cause it to make progress), or an empty
|
||||
set, meaning it can't make forward progress no matter what
|
||||
- hashes that if you're not inside the evaluation of, it could make progress
|
||||
- extra IDs for envs it contains that don't count as forward progress IDs because the
|
||||
env does have values, but envs in it's parent chain doesn't have values.
|
||||
|
||||
The calculation for needed_for_progress is straightforward-ish, with some tricky bits at comb and array.
|
||||
|
||||
Under these definitions, we call an AST subtree a "total val" if it is either a val or it's needed-for-progress IDs is nil.
|
||||
|
||||
fun mark(x, eval_pos):
|
||||
x is env -> error
|
||||
x is combiner -> error
|
||||
x is symbol -> if x == true than MarkedVal(true)
|
||||
else if x == false than MarkedVal(false)
|
||||
else MarkedSymbol(x, needed_IDs=if eval_pos true else nil)
|
||||
x is array ->
|
||||
MarkedArray(is_val=!eval_pos, attempted=false, resume_hashes=nil,
|
||||
values = [mark(x[0], eval_pos)] + [mark(xi, false) for xi in x[1:]])
|
||||
true -> MarkedVal(x)
|
||||
|
||||
fun strip(x) -> value:
|
||||
if X is an AST node representing a value, it returns the value.
|
||||
May strip recursively in the case of an array value, etc.
|
||||
Errors on env, comb (but not prim_comb!) non value symbols or arrays
|
||||
|
||||
fun try_unval(x) -> Result<ASTNode>:
|
||||
//Removes one level of "value-ness".
|
||||
x is Array -> if !x.array_is_val Error()
|
||||
else Ok(MarkedArray(is_value=false,
|
||||
values = [try_unval(x.values[0])] + x.values[1:]))
|
||||
x is Symbol -> if !x.symbol_is_val Error()
|
||||
else Ok(MarkedSymbol(symbol=x.symbol, is_value=false))
|
||||
true -> Ok(x)
|
||||
|
||||
fun check_for_env_id_in_result(env_id, x):
|
||||
return env_id in <either progress_IDs or extra_IDs in needed_for_progress(x)>
|
||||
if either progress_IDs or extra_IDs is true, then we have a fallback, but
|
||||
that doesn't get called even on large testcases so it's either rare or impossible.
|
||||
Fallback is slow though, whereas this is just a check for set membership
|
||||
|
||||
// We only allow returning a value out of a combiner if the return value
|
||||
// doesn't reference the environment of the combiner
|
||||
fun combiner_return_ok(func_result, env_id):
|
||||
func_result isn't later -> !check_for_env_id_in_result(env_id, func_result)
|
||||
// special cases now
|
||||
(veval body {env}) => (combiner_return_ok {env})
|
||||
// The reason we don't have to check body is that this form is only creatable in ways that body was origionally a value and only need {env}
|
||||
// Either it's created by eval, in which case it's fine, or it's created by something like (eval (array veval x de) de2) and the array has checked it,
|
||||
// or it's created via literal vau invocation, in which case the body is a value.
|
||||
(func ...params) => func doesn't take dynamic env && all params are combiner_return_ok
|
||||
otherwise -> false
|
||||
|
||||
// We may end up in situations where the value/code we care about is wrapped up in
|
||||
// a redundent call to veval, namely after sucessfully returning based on combiner_return_ok above.
|
||||
// This call may prevent other optimizations though, so we should unwrap the redundent call if possible,
|
||||
// and if it causes a change we should re-partially-evaluate to make further progress if we can
|
||||
fun drop_redundent_veval(x, dynamic_env, env_stack, memostuff):
|
||||
(veval node env) if env.id == dynamic_env.id -> drop_redundent_veval(node, dynamic_env, env_stack, memostuff)
|
||||
(comb params...) if comb.wrap_level != -1 -> map drop_redundent_veval over params and if any change: partial_eval( (comb new_params...), dynamic_env, env_stack, memostuff)
|
||||
else: x
|
||||
else -> x
|
||||
|
||||
fun make_tmp_inner_env(params, de?, ue, env_id):
|
||||
...
|
||||
|
||||
|
||||
fun partial_eval_helper(x, only_head, env, env_stack, memostuff, force):
|
||||
needed, hashes, _extra = needed_for_partial_eval(x)
|
||||
if force || one of hashes is not in memostuff || needed == true || set_intersection(needed, env_stack.set_of_ids_that_are_vals) != empty_set:
|
||||
x is MarkedVal -> x
|
||||
x is MarkedEnv -> find(x.env_id == it.env_id, env_stack) ?: x
|
||||
x is MarkedComb -> if !env.is_real && !x.se.is_real // both aren't real, re-evaluation of closure creation site
|
||||
|| env.is_real && !x.se.is_real // new env real, but se isn't - the creation of the closure!
|
||||
then let inner_env = make_tmp_inner_env(x.params, x.de?, env, x.env_id)
|
||||
in MarkedComb(se=env, body=partial_eval_helper(body, false, inner_env, add inner_env to env_stack, memostuff, false))
|
||||
x is MarkedPrimComb -> x
|
||||
x is MarkedSymbol -> if x.is_val then x
|
||||
else env_lookup_helper(x, env)
|
||||
x is MarkedArray -> if x.is_val then x
|
||||
else ...TODO...
|
||||
|
||||
|
||||
And then we define a root_env with PrimComb versions of all of the standard functions.
|
||||
The ones that are most interesting and interact the most with partial evaluation are
|
||||
vau eval cond
|
||||
The other key is that array only takes in values, that is an array value never hides something that isn't a total value and needs more partial-evaluation
|
||||
(this makes a lot of things simpler in other places since we can treat array values as values no matter what and know things aren't hiding in sneaky places)
|
||||
|
||||
fun needs_params_prim(...):
|
||||
...
|
||||
fun give_up_params_prim(...):
|
||||
...
|
||||
fun veval_inner(...):
|
||||
...
|
||||
root_env = {
|
||||
eval: ...
|
||||
vapply: ...
|
||||
lapply: ...
|
||||
vau: ....
|
||||
wrap: ...
|
||||
unwrap: ...
|
||||
cond: ...
|
||||
symbol?: needs_params_prim(symbol?)
|
||||
int?: needs_params_prim(int?)
|
||||
string?: needs_params_prim(string?)
|
||||
combiner?: ...
|
||||
env?: ...
|
||||
nil?: needs_params_prim(nil?)
|
||||
bool?: needs_params_prim(bool?)
|
||||
str-to-symbol: needs_params_prim(str-to-symbol)
|
||||
get-text: needs_params_prim(get-text)
|
||||
array?: ...
|
||||
array: ...
|
||||
len: ...
|
||||
idx: ...
|
||||
slice: ...
|
||||
concat: ...
|
||||
+: needs_params_prim(+)
|
||||
-: needs_params_prim(-)
|
||||
*: needs_params_prim(*)
|
||||
/: needs_params_prim(/)
|
||||
%: needs_params_prim(%)
|
||||
band: needs_params_prim(band)
|
||||
bor: needs_params_prim(bor)
|
||||
bnot: needs_params_prim(bnot)
|
||||
bxor: needs_params_prim(bxor)
|
||||
<<: needs_params_prim(<<)
|
||||
>>: needs_params_prim(>>)
|
||||
=: needs_params_prim(=)
|
||||
!=: needs_params_prim(!=)
|
||||
<: needs_params_prim(<)
|
||||
<=: needs_params_prim(<=)
|
||||
>: needs_params_prim(>)
|
||||
>=: needs_params_prim(>=)
|
||||
str: needs_params_prim(true_str)
|
||||
log: give_up_params_prim(log)
|
||||
error: give_up_params_prim(error)
|
||||
read-string: needs_params_prim(read-string)
|
||||
empty_env: MarkedEnv()
|
||||
}
|
||||
|
||||
fun compile(...):
|
||||
...
|
||||
Note that when it's compiling a call, it compiles an if/else chain on the wrap level of the combiner being called.
|
||||
in the 0 branch, it emits the parameters as constant data
|
||||
in the 1 branch, it unval's and partial evals all of the parameters before compiling them.
|
||||
- note that this must be robust to partial-eval errors, as this branch might not ever happen at runtime and be nonsense code!
|
||||
- if the partial evaluation errors, it emits a value that will cause an error at runtime into the compiled code
|
||||
in the > 1 branch, it errors
|
||||
Reference in New Issue
Block a user