Clean up and rearrange
This commit is contained in:
306
doc/psudocode.txt
Normal file
306
doc/psudocode.txt
Normal file
@@ -0,0 +1,306 @@
|
||||
|
||||
Key Contributions to look out for that make this work in practical time:
|
||||
1. First class environments that:
|
||||
a. Have IDs
|
||||
b. Can either be "real", in which case it maps symbols to values,
|
||||
or "fake", in which case it maps symbols to themselves, but with the env ID as it's for-progress
|
||||
c. Chain up to an upper environment that may be fake or real
|
||||
2. AST nodes that maintain on-node:
|
||||
a. The IDs of environments that, if "real", can be used to make progress in this subtree
|
||||
b. The hashes of infinite recursive calls that were detected and stopped - if this hash isn't in the current call chain, this subtree can make progress
|
||||
c. Extra IDs of environments that are "real" but have "fake" environments in their chain - this is used to make return value checking fast O(1 or log n, depending)
|
||||
3. Combiners, both user-defined and built in (including that maintain a "wrap level" that:
|
||||
a. Is a property of this function value, *not* the function itself
|
||||
* meaning that if wrap_level > 1, you can evaluate each parameter and decrement wrap_level, even if you can't execute the call
|
||||
4. The return value of a combiner is checked for:
|
||||
a. If it is a value, in which case it is good to be returned if it doesn't contain a reference to the envID of the function it is being returned from
|
||||
b. If it is (veval something env) where env doesn't contain a reference to the envID of the function it is being returned from
|
||||
c. If it is a call to a function (func params...) and func doesn't take in a dynamic environment and params... are all good to be returned
|
||||
This makes it so that combiner calls can return partially-evaluated code - any macro-like combiner would calculate the new code and return
|
||||
(eval <constructed-code> dynamic_env), which would do what partial evaluation it could and either become a value or a call like case "b" above.
|
||||
Case "b" allows this code essentially "tagged" with the environment it should be evaluated in to be returned out of "macro-like" combiners,
|
||||
and this dovetails with the next point
|
||||
5. The (veval something env) form essentially "tags" a piece of code with the environment it should be evaluated in. At each stage where
|
||||
it is possible, the system checks for redundent constructions like these, where the env in (veval something env) is the currently active env.
|
||||
In this case, it unwraps it to just "something" and continues on - this completes the second half of the macro-like combiner evaluation where
|
||||
after being returned to the calling function the code is essentially spliced in.
|
||||
6. The compiler can emit if/else branches on the wrap_level of combiners and in each branch further compile/partial eval if appropriate, allowing
|
||||
dynamic calls to either functions or combiners with the overhead of a single branch
|
||||
|
||||
Note that points 4&5 make it so that any macro written as a combiner in "macro-style" will be expanded just like a macro would and cause no runtime overhead!
|
||||
Additionally, point 6 makes it so that functions (wrap level 1 combiners) and non-parameter-evaluating (wrap level 0) combiners can be dynamically passed around and called with very minimal overhead.
|
||||
Combine them together and you get a simpler but more flexiable semantics than macro based (pure functional) languages with little-to-no overhead.
|
||||
|
||||
Additional tricky spots to look out for:
|
||||
1. If you don't do the needed-for-progress tracking, you have exponential runtime
|
||||
2. If you aren't careful about storing analysis information on the AST node itself or memoize, a naive tree traversal of the DAG has exponential runtime
|
||||
3. Infinite recursion can hide in sneaky places, including the interply between the partial evaluator and the compiler, and careful use of multiple recursion blockers / memoization is needed to prevent all cases
|
||||
4. The invarients needed to prevent mis-evaluation are non-trivial to get right. Our invarients:
|
||||
a. All calls to user-combiners have the parameters as total values, thus not moving something that needs a particular environment underneath a different environment
|
||||
b. All return values from functions must not depend on the function's environment (there are a couple of interesting cases here, see combiner_return_ok(func_result, env_id))
|
||||
c. All array values are made up of total values
|
||||
d. Some primitive combiners don't obey "a", but they must be written with extreme care, and often partially evaluate only some of their parameters and have to keep track of which.
|
||||
|
||||
|
||||
|
||||
Everything operates on AST nodes, an ADT:
|
||||
* val - integers, strings, booleans
|
||||
* marked_array
|
||||
* marked_symbol
|
||||
* comb
|
||||
* prim_comb
|
||||
* marked_env
|
||||
|
||||
Each AST node contains a hash representing it&it's subtree.
|
||||
|
||||
fun needed_for_progress(ast_node) -> (progress_IDs, rec_stopping_hashes, extra_IDs):
|
||||
returns
|
||||
- environment IDs (stored in each AST node for it and it's children)
|
||||
that must have real values if the partial evaluation of the subtree rooted at
|
||||
this node is going to make progress partial evaluating.
|
||||
|
||||
progress_IDs is either true (meaning it will make progress no matter what), an
|
||||
intset of env IDs (the ones that will cause it to make progress), or an empty
|
||||
set, meaning it can't make forward progress no matter what
|
||||
- hashes that if you're not inside the evaluation of, it could make progress
|
||||
- extra IDs for envs it contains that don't count as forward progress IDs because the
|
||||
env does have values, but envs in it's parent chain doesn't have values.
|
||||
|
||||
The calculation for needed_for_progress is straightforward-ish, with some tricky bits at comb and array.
|
||||
|
||||
Under these definitions, we call an AST subtree a "total val" if it is either a val or it's needed-for-progress IDs is nil.
|
||||
|
||||
fun mark(x, eval_pos):
|
||||
x is env -> error
|
||||
x is combiner -> error
|
||||
x is symbol -> if x == true than MarkedVal(true)
|
||||
else if x == false than MarkedVal(false)
|
||||
else MarkedSymbol(x, needed_IDs=if eval_pos true else nil)
|
||||
x is array ->
|
||||
MarkedArray(is_val=!eval_pos, attempted=false, resume_hashes=nil,
|
||||
values = [mark(x[0], eval_pos)] + [mark(xi, false) for xi in x[1:]])
|
||||
true -> MarkedVal(x)
|
||||
|
||||
fun strip(x) -> value:
|
||||
if X is an AST node representing a value, it returns the value.
|
||||
May strip recursively in the case of an array value, etc.
|
||||
Errors on env, comb (but not prim_comb!) non value symbols or arrays
|
||||
|
||||
fun try_unval(x) -> Result<ASTNode>:
|
||||
//Removes one level of "value-ness".
|
||||
x is Array -> if !x.array_is_val Error()
|
||||
else Ok(MarkedArray(is_value=false,
|
||||
values = [try_unval(x.values[0])] + x.values[1:]))
|
||||
x is Symbol -> if !x.symbol_is_val Error()
|
||||
else Ok(MarkedSymbol(symbol=x.symbol, is_value=false))
|
||||
true -> Ok(x)
|
||||
|
||||
fun check_for_env_id_in_result(env_id, x):
|
||||
return env_id in <either progress_IDs or extra_IDs in needed_for_progress(x)>
|
||||
if either progress_IDs or extra_IDs is true, then we have a fallback, but
|
||||
that doesn't get called even on large testcases so it's either rare or impossible.
|
||||
Fallback is slow though, whereas this is just a check for set membership
|
||||
|
||||
// We only allow returning a value out of a combiner if the return value
|
||||
// doesn't reference the environment of the combiner
|
||||
fun combiner_return_ok(func_result, env_id):
|
||||
func_result isn't later -> !check_for_env_id_in_result(env_id, func_result)
|
||||
// special cases now
|
||||
(veval body {env}) => (combiner_return_ok {env})
|
||||
// The reason we don't have to check body is that this form is only creatable in ways that body was origionally a value and only need {env}
|
||||
// Either it's created by eval, in which case it's fine, or it's created by something like (eval (array veval x de) de2) and the array has checked it,
|
||||
// or it's created via literal vau invocation, in which case the body is a value.
|
||||
(func ...params) => func doesn't take dynamic env && all params are combiner_return_ok
|
||||
otherwise -> false
|
||||
|
||||
// We may end up in situations where the value/code we care about is wrapped up in
|
||||
// a redundent call to veval, namely after sucessfully returning based on combiner_return_ok above.
|
||||
// This call may prevent other optimizations though, so we should unwrap the redundent call if possible,
|
||||
// and if it causes a change we should re-partially-evaluate to make further progress if we can
|
||||
fun drop_redundent_veval(x, dynamic_env, env_stack, memostuff):
|
||||
(veval node env) if env.id == dynamic_env.id -> drop_redundent_veval(node, dynamic_env, env_stack, memostuff)
|
||||
(comb params...) if comb.wrap_level != -1 -> map drop_redundent_veval over params and if any change: partial_eval( (comb new_params...), dynamic_env, env_stack, memostuff)
|
||||
else: x
|
||||
else -> x
|
||||
|
||||
fun make_tmp_inner_env(params, de?, ue, env_id):
|
||||
...
|
||||
|
||||
|
||||
fun partial_eval_helper(x, only_head, env, env_stack, memostuff, force):
|
||||
needed, hashes, _extra = needed_for_partial_eval(x)
|
||||
if force || one of hashes is not in memostuff || needed == true || set_intersection(needed, env_stack.set_of_ids_that_are_vals) != empty_set:
|
||||
x is MarkedVal -> x
|
||||
x is MarkedEnv -> find(x.env_id == it.env_id, env_stack) ?: x
|
||||
x is MarkedComb -> if !env.is_real && !x.se.is_real // both aren't real, re-evaluation of closure creation site
|
||||
|| env.is_real && !x.se.is_real // new env real, but se isn't - the creation of the closure!
|
||||
then let inner_env = make_tmp_inner_env(x.params, x.de?, env, x.env_id)
|
||||
in MarkedComb(se=env, body=partial_eval_helper(body, false, inner_env, <add inner_env to env_stack>, memostuff, false))
|
||||
x is MarkedPrimComb -> x
|
||||
x is MarkedSymbol -> if x.is_val then x
|
||||
else env_lookup_helper(x, env)
|
||||
x is MarkedArray -> if x.is_val then x
|
||||
else let
|
||||
comb = partial_eval_helper(x.values[0], only_head=true, env, env_stack, memostuff, false)
|
||||
params = x.values[1:]
|
||||
if later_head?(comb) return MarkedArray(values=[comb]+params)
|
||||
if comb.needed_for_progress == true:
|
||||
comb = partial_eval_helper(comb, only_head=false, ...)
|
||||
|
||||
// If not -1, we always partial eval, if >0 we also unval/partial eval to do one full round of eval
|
||||
wrap_level = comb.wrap_level
|
||||
while wrap_level >= 0:
|
||||
if wrap_level >= 1:
|
||||
params = map(unval, map(\x. partial_eval_helper(x, ...), params))
|
||||
params = map(\x. partial_eval_helper(x, ...), params)
|
||||
wrap_level -= 1
|
||||
if <any of the above error, or couldn't be unvaled yet>:
|
||||
return MarkedArray(values=[comb.with_wrap_level(wrap_level)] + <params at whatever level they were sucessfully evaluated to>)
|
||||
|
||||
if comb is MarkedPrimComb:
|
||||
result = comb.impl(params)
|
||||
if result == 'LATER:
|
||||
return MarkedArray(values=[comb.with_wrap_level(wrap_level)] + params)
|
||||
else:
|
||||
return result
|
||||
|
||||
if comb.is_varadic:
|
||||
params = params[:comb.params.len-1] + [ params[comb.params.len-1:] ]
|
||||
|
||||
inner_env = MarkedEnv(id=comb.env_id, possible_de_symbol=comb.de?, possible_de=env, symbols=comb.params, values=params, upper=comb.se)
|
||||
|
||||
rec_stop_hash = combine_hash(inner_env.hash, comb.body.hash)
|
||||
if rec_stop_hash in memostuff:
|
||||
return MarkedArray(values=[comb] + params, transient_needed_env_id=true, rec_stopping_hash=rec_stop_hash)
|
||||
|
||||
memostuff.add(rec_stop_hash)
|
||||
result = partial_eval_helper(body, false, inner_env, <add inner_env to env_stack>, memostuff, false)
|
||||
memostuff.remove(rec_stop_hash)
|
||||
|
||||
if !combiner_return_ok(result, comb.env_id):
|
||||
transiently_needed = if comb.de? != nil then env.id else nil
|
||||
return MarkedArray(values=[comb] + params, transient_needed_env_id=transiently_needed, rec_stopping_hash=rec_stop_hash)
|
||||
|
||||
return drop_redundent_veval(result, env, env_stack, memostuff)
|
||||
|
||||
And then we define a root_env with PrimComb versions of all of the standard functions.
|
||||
The ones that are most interesting and interact the most with partial evaluation are
|
||||
vau eval cond
|
||||
The other key is that array only takes in values, that is an array value never hides something that isn't a total value and needs more partial-evaluation
|
||||
(this makes a lot of things simpler in other places since we can treat array values as values no matter what and know things aren't hiding in sneaky places)
|
||||
|
||||
fun needs_params_prim(...):
|
||||
...
|
||||
fun give_up_params_prim(...):
|
||||
...
|
||||
|
||||
fun veval_inner(only_head, de, env_stack, memostuff, params):
|
||||
body = params[0]
|
||||
implicent_env = len(params) != 2
|
||||
eval_env = if implicit_env { de } else { partial_eval_helper(params[1], only_head, de, env_stack, memostuff, false) }
|
||||
evaled_body = partial_eval_helper(body, only_head, eval_env, env_stack, memostuff, false)
|
||||
if implicit_env or combiner_return_ok(evaled_body, eval_env.idx):
|
||||
return drop_redundent_veval(evaled_body, de, env_stack, memostuff)
|
||||
else:
|
||||
return drop_redundent_veval(MarkedArray(values=[MarkedPrimComb('veval, wrap_level=-1, val_head_ok=true, handler=veval_inner), evaled_body, eval_env], de, env_stack, memostuff)
|
||||
|
||||
root_env = {
|
||||
eval: MarkedPrimComb('eval, wrap_level=1, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, params):
|
||||
let
|
||||
body = params[0]
|
||||
implicit_env = len(params) != 2
|
||||
return veval_inner(only_head, de, env_stack, memostuff, if implicit_env { [try_unval(body)] } else { [try_unval(body), params[1]] })
|
||||
)
|
||||
vapply: MarkedPrimComb('vapply, wrap_level=1, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, [func params env]):
|
||||
return veval_inner(only_head, de, env_stack, memostuff, [MarkedArray(values=[func]+params), env)
|
||||
)
|
||||
lapply: MarkedPrimComb('lapply, wrap_level=1, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, [func params env]):
|
||||
return veval_inner(only_head, de, env_stack, memostuff, [MarkedArray(values=[func.offset_wrap_level(-1)]+params), env)
|
||||
)
|
||||
vau: MarkedPrimComb('vau, wrap_level=0, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, params):
|
||||
let
|
||||
de? = if len(params) == 3 { params[0].symbol_value } else { nil }
|
||||
params = map(lambda(x): s.symbol_value, if de? { params[1] } else { params[0] })
|
||||
varadic = '& in params
|
||||
params.remove('&)
|
||||
implicit_env = len(params) != 2
|
||||
body = try_unval(if de? { params[2] } else { params[1] })
|
||||
env_id = <new_id>
|
||||
if !only_head:
|
||||
inner_env = make_tmp_inner_env(params, de?, upper=de, id=env_id)
|
||||
body = partial_eval_helper(body, false, inner_env, <add inner_env to env_stack>, memostuff, false)
|
||||
return MarkedComb(wrap_level=0, id=new_id, de?=de?, static_env=de, variadic=varadic, params=params, body=body)
|
||||
)
|
||||
wrap: ...<returns new MarkedPrimComb/MarkedComb with incremented wrap_level>...
|
||||
unwrap: ...<returns new MarkedPrimComb/MarkedComb with decremented wrap_level>...
|
||||
cond: ...
|
||||
...Oddly tricky - is wrap_level 0, but...
|
||||
... 1. unvals & partially evaluates starting from the first condition
|
||||
... 2. if this condition is true, return the unvald & partially evaluated corresponding arm
|
||||
... 3. if this condition is false, drop the arm and return to 1
|
||||
... 4. In this case, we have an unknown between true & false
|
||||
... 5. check to see if combine_hash(x.hash, env.hash) is in memostuff (prevent infinite recursion blocked on a cond guard!)
|
||||
... 6. if the hash was in memostuff, return MarkedArray(later_hash=the_hash,
|
||||
... values=[MarkedPrimComb('vcond,wraplevel=-1,...)] + map(unval, <remaining preds/arms>))
|
||||
... 7. else new_preds_arms = map(partial_eval..., map(unval, <remaining preds/arms>))
|
||||
... <TODO: 8. remove arms/preds now guarenteed to be false, remove all arms/preds after first true>
|
||||
... 9. return MarkedArray(values=[MarkedPrimComb('vcond,wraplevel=-1,...)] + new_preds)
|
||||
...
|
||||
...The vcond is like cond but doesn't do any unvaling (as it's already been done) (and wrap_level is set to -1 so the function call machinery doesn't touch the params either)
|
||||
...
|
||||
symbol?: needs_params_prim(symbol?)
|
||||
int?: needs_params_prim(int?)
|
||||
string?: needs_params_prim(string?)
|
||||
combiner?: ...
|
||||
env?: ...
|
||||
nil?: needs_params_prim(nil?)
|
||||
bool?: needs_params_prim(bool?)
|
||||
str-to-symbol: needs_params_prim(str-to-symbol)
|
||||
get-text: needs_params_prim(get-text)
|
||||
array?: ...
|
||||
array: ...
|
||||
len: ...
|
||||
idx: ...
|
||||
slice: ...
|
||||
concat: ...
|
||||
+: needs_params_prim(+)
|
||||
-: needs_params_prim(-)
|
||||
*: needs_params_prim(*)
|
||||
/: needs_params_prim(/)
|
||||
%: needs_params_prim(%)
|
||||
band: needs_params_prim(band)
|
||||
bor: needs_params_prim(bor)
|
||||
bnot: needs_params_prim(bnot)
|
||||
bxor: needs_params_prim(bxor)
|
||||
<<: needs_params_prim(<<)
|
||||
>>: needs_params_prim(>>)
|
||||
=: needs_params_prim(=)
|
||||
!=: needs_params_prim(!=)
|
||||
<: needs_params_prim(<)
|
||||
<=: needs_params_prim(<=)
|
||||
>: needs_params_prim(>)
|
||||
>=: needs_params_prim(>=)
|
||||
str: needs_params_prim(true_str)
|
||||
log: give_up_params_prim(log)
|
||||
error: give_up_params_prim(error)
|
||||
read-string: needs_params_prim(read-string)
|
||||
empty_env: MarkedEnv()
|
||||
}
|
||||
|
||||
fun compile(...):
|
||||
...
|
||||
... tagged words, etc
|
||||
... eval
|
||||
... vau / vau helper closure
|
||||
...
|
||||
Note that when it's compiling a call, it compiles an if/else chain on the wrap level of the combiner being called.
|
||||
in the 0 branch, it emits the parameters as constant data
|
||||
in the 1 branch, it unval's and partial evals all of the parameters before compiling them.
|
||||
- note that this must be robust to partial-eval errors, as this branch might not ever happen at runtime and be nonsense code!
|
||||
- if the partial evaluation errors, it emits a value that will cause an error at runtime into the compiled code
|
||||
in the > 1 branch, it errors
|
||||
...
|
||||
...
|
||||
Must be careful about infiniate recursion, including tricky cases that infinitly ping back and forth between
|
||||
partial eval and compile even though both have individual internal recursion checks
|
||||
...
|
||||
Reference in New Issue
Block a user