Started brain-dumping psudocode and descriptions of interesting points/contributions, the reason all macro-like combiners should be partially-evaled away, the invarients we must maintain, etc

This commit is contained in:
Nathan Braswell
2022-03-21 01:03:54 -04:00
parent 0c554078bd
commit b3122f62d1

212
psudocode.txt Normal file
View File

@@ -0,0 +1,212 @@
Key Contributions to look out for that make this work in practical time:
1. First class environments that:
a. Have IDs
b. Can either be "real", in which case it maps symbols to values,
or "fake", in which case it maps symbols to themselves, but with the env ID as it's for-progress
c. Chain up to an upper environment that may be fake or real
2. AST nodes that maintain on-node:
a. The IDs of environments that, if "real", can be used to make progress in this subtree
b. The hashes of infinite recursive calls that were detected and stopped - if this hash isn't in the current call chain, this subtree can make progress
c. Extra IDs of environments that are "real" but have "fake" environments in their chain - this is used to make return value checking fast O(1 or log n, depending)
3. Combiners, both user-defined and built in (including that maintain a "wrap level" that:
a. Is a property of this function value, *not* the function itself
* meaning that if wrap_level > 1, you can evaluate each parameter and decrement wrap_level, even if you can't execute the call
4. The return value of a combiner is checked for:
a. If it is a value, in which case it is good to be returned if it doesn't contain a reference to the envID of the function it is being returned from
b. If it is (veval something env) where env doesn't contain a reference to the envID of the function it is being returned from
c. If it is a call to a function (func params...) and func doesn't take in a dynamic environment and params... are all good to be returned
This makes it so that combiner calls can return partially-evaluated code - any macro-like combiner would calculate the new code and return
(eval <constructed-code> dynamic_env), which would do what partial evaluation it could and either become a value or a call like case "b" above.
Case "b" allows this code essentially "tagged" with the environment it should be evaluated in to be returned out of "macro-like" combiners,
and this dovetails with the next point
5. The (veval something env) form essentially "tags" a piece of code with the environment it should be evaluated in. At each stage where
it is possible, the system checks for redundent constructions like these, where the env in (veval something env) is the currently active env.
In this case, it unwraps it to just "something" and continues on - this completes the second half of the macro-like combiner evaluation where
after being returned to the calling function the code is essentially spliced in.
6. The compiler can emit if/else branches on the wrap_level of combiners and in each branch further compile/partial eval if appropriate, allowing
dynamic calls to either functions or combiners with the overhead of a single branch
Note that points 4&5 make it so that any macro written as a combiner in "macro-style" will be expanded just like a macro would and cause no runtime overhead!
Additionally, point 6 makes it so that functions (wrap level 1 combiners) and non-parameter-evaluating (wrap level 0) combiners can be dynamically passed around and called with very minimal overhead.
Combine them together and you get a simpler but more flexiable semantics than macro based (pure functional) languages with little-to-no overhead.
Additional tricky spots to look out for:
1. If you don't do the needed-for-progress tracking, you have exponential runtime
2. If you aren't careful about storing analysis information on the AST node itself or memoize, a naive tree traversal of the DAG has exponential runtime
3. Infinite recursion can hide in sneaky places, including the interply between the partial evaluator and the compiler, and careful use of multiple recursion blockers / memoization is needed to prevent all cases
4. The invarients needed to prevent mis-evaluation are non-trivial to get right. Our invarients:
a. All calls to user-combiners have the parameters as total values, thus not moving something that needs a particular environment underneath a different environment
b. All return values from functions must not depend on the function's environment (there are a couple of interesting cases here, see combiner_return_ok(func_result, env_id))
c. All array values are made up of total values
d. Some primitive combiners don't obey "a", but they must be written with extreme care, and often partially evaluate only some of their parameters and have to keep track of which.
Everything operates on AST nodes, an ADT:
* val - integers, strings, booleans
* marked_array
* marked_symbol
* comb
* prim_comb
* marked_env
Each AST node contains a hash representing it&it's subtree.
fun needed_for_progress(ast_node) -> (progress_IDs, rec_stopping_hashes, extra_IDs):
returns
- environment IDs (stored in each AST node for it and it's children)
that must have real values if the partial evaluation of the subtree rooted at
this node is going to make progress partial evaluating.
progress_IDs is either true (meaning it will make progress no matter what), an
intset of env IDs (the ones that will cause it to make progress), or an empty
set, meaning it can't make forward progress no matter what
- hashes that if you're not inside the evaluation of, it could make progress
- extra IDs for envs it contains that don't count as forward progress IDs because the
env does have values, but envs in it's parent chain doesn't have values.
The calculation for needed_for_progress is straightforward-ish, with some tricky bits at comb and array.
Under these definitions, we call an AST subtree a "total val" if it is either a val or it's needed-for-progress IDs is nil.
fun mark(x, eval_pos):
x is env -> error
x is combiner -> error
x is symbol -> if x == true than MarkedVal(true)
else if x == false than MarkedVal(false)
else MarkedSymbol(x, needed_IDs=if eval_pos true else nil)
x is array ->
MarkedArray(is_val=!eval_pos, attempted=false, resume_hashes=nil,
values = [mark(x[0], eval_pos)] + [mark(xi, false) for xi in x[1:]])
true -> MarkedVal(x)
fun strip(x) -> value:
if X is an AST node representing a value, it returns the value.
May strip recursively in the case of an array value, etc.
Errors on env, comb (but not prim_comb!) non value symbols or arrays
fun try_unval(x) -> Result<ASTNode>:
//Removes one level of "value-ness".
x is Array -> if !x.array_is_val Error()
else Ok(MarkedArray(is_value=false,
values = [try_unval(x.values[0])] + x.values[1:]))
x is Symbol -> if !x.symbol_is_val Error()
else Ok(MarkedSymbol(symbol=x.symbol, is_value=false))
true -> Ok(x)
fun check_for_env_id_in_result(env_id, x):
return env_id in <either progress_IDs or extra_IDs in needed_for_progress(x)>
if either progress_IDs or extra_IDs is true, then we have a fallback, but
that doesn't get called even on large testcases so it's either rare or impossible.
Fallback is slow though, whereas this is just a check for set membership
// We only allow returning a value out of a combiner if the return value
// doesn't reference the environment of the combiner
fun combiner_return_ok(func_result, env_id):
func_result isn't later -> !check_for_env_id_in_result(env_id, func_result)
// special cases now
(veval body {env}) => (combiner_return_ok {env})
// The reason we don't have to check body is that this form is only creatable in ways that body was origionally a value and only need {env}
// Either it's created by eval, in which case it's fine, or it's created by something like (eval (array veval x de) de2) and the array has checked it,
// or it's created via literal vau invocation, in which case the body is a value.
(func ...params) => func doesn't take dynamic env && all params are combiner_return_ok
otherwise -> false
// We may end up in situations where the value/code we care about is wrapped up in
// a redundent call to veval, namely after sucessfully returning based on combiner_return_ok above.
// This call may prevent other optimizations though, so we should unwrap the redundent call if possible,
// and if it causes a change we should re-partially-evaluate to make further progress if we can
fun drop_redundent_veval(x, dynamic_env, env_stack, memostuff):
(veval node env) if env.id == dynamic_env.id -> drop_redundent_veval(node, dynamic_env, env_stack, memostuff)
(comb params...) if comb.wrap_level != -1 -> map drop_redundent_veval over params and if any change: partial_eval( (comb new_params...), dynamic_env, env_stack, memostuff)
else: x
else -> x
fun make_tmp_inner_env(params, de?, ue, env_id):
...
fun partial_eval_helper(x, only_head, env, env_stack, memostuff, force):
needed, hashes, _extra = needed_for_partial_eval(x)
if force || one of hashes is not in memostuff || needed == true || set_intersection(needed, env_stack.set_of_ids_that_are_vals) != empty_set:
x is MarkedVal -> x
x is MarkedEnv -> find(x.env_id == it.env_id, env_stack) ?: x
x is MarkedComb -> if !env.is_real && !x.se.is_real // both aren't real, re-evaluation of closure creation site
|| env.is_real && !x.se.is_real // new env real, but se isn't - the creation of the closure!
then let inner_env = make_tmp_inner_env(x.params, x.de?, env, x.env_id)
in MarkedComb(se=env, body=partial_eval_helper(body, false, inner_env, add inner_env to env_stack, memostuff, false))
x is MarkedPrimComb -> x
x is MarkedSymbol -> if x.is_val then x
else env_lookup_helper(x, env)
x is MarkedArray -> if x.is_val then x
else ...TODO...
And then we define a root_env with PrimComb versions of all of the standard functions.
The ones that are most interesting and interact the most with partial evaluation are
vau eval cond
The other key is that array only takes in values, that is an array value never hides something that isn't a total value and needs more partial-evaluation
(this makes a lot of things simpler in other places since we can treat array values as values no matter what and know things aren't hiding in sneaky places)
fun needs_params_prim(...):
...
fun give_up_params_prim(...):
...
fun veval_inner(...):
...
root_env = {
eval: ...
vapply: ...
lapply: ...
vau: ....
wrap: ...
unwrap: ...
cond: ...
symbol?: needs_params_prim(symbol?)
int?: needs_params_prim(int?)
string?: needs_params_prim(string?)
combiner?: ...
env?: ...
nil?: needs_params_prim(nil?)
bool?: needs_params_prim(bool?)
str-to-symbol: needs_params_prim(str-to-symbol)
get-text: needs_params_prim(get-text)
array?: ...
array: ...
len: ...
idx: ...
slice: ...
concat: ...
+: needs_params_prim(+)
-: needs_params_prim(-)
*: needs_params_prim(*)
/: needs_params_prim(/)
%: needs_params_prim(%)
band: needs_params_prim(band)
bor: needs_params_prim(bor)
bnot: needs_params_prim(bnot)
bxor: needs_params_prim(bxor)
<<: needs_params_prim(<<)
>>: needs_params_prim(>>)
=: needs_params_prim(=)
!=: needs_params_prim(!=)
<: needs_params_prim(<)
<=: needs_params_prim(<=)
>: needs_params_prim(>)
>=: needs_params_prim(>=)
str: needs_params_prim(true_str)
log: give_up_params_prim(log)
error: give_up_params_prim(error)
read-string: needs_params_prim(read-string)
empty_env: MarkedEnv()
}
fun compile(...):
...
Note that when it's compiling a call, it compiles an if/else chain on the wrap level of the combiner being called.
in the 0 branch, it emits the parameters as constant data
in the 1 branch, it unval's and partial evals all of the parameters before compiling them.
- note that this must be robust to partial-eval errors, as this branch might not ever happen at runtime and be nonsense code!
- if the partial evaluation errors, it emits a value that will cause an error at runtime into the compiled code
in the > 1 branch, it errors