Clean up and rearrange

2022-05-07 16:09:16 -04:00
parent 08c01257f3
commit ca68826fbc
52 changed files with 577 additions and 344 deletions
--- a/doc/psudocode.txt
+++ b/doc/psudocode.txt
@@ -0,0 +1,306 @@
+
+Key Contributions to look out for that make this work in practical time:
+    1. First class environments that:
+        a. Have IDs
+        b. Can either be "real", in which case it maps symbols to values,
+                      or "fake", in which case it maps symbols to themselves, but with the env ID as it's for-progress
+        c. Chain up to an upper environment that may be fake or real
+    2. AST nodes that maintain on-node:
+        a. The IDs of environments that, if "real", can be used to make progress in this subtree
+        b. The hashes of infinite recursive calls that were detected and stopped - if this hash isn't in the current call chain, this subtree can make progress
+        c. Extra IDs of environments that are "real" but have "fake" environments in their chain - this is used to make return value checking fast O(1 or log n, depending)
+    3. Combiners, both user-defined and built in (including  that maintain a "wrap level" that:
+        a. Is a property of this function value, *not* the function itself
+            * meaning that if wrap_level > 1, you can evaluate each parameter and decrement wrap_level, even if you can't execute the call
+    4. The return value of a combiner is checked for:
+        a. If it is a value, in which case it is good to be returned if it doesn't contain a reference to the envID of the function it is being returned from
+        b. If it is (veval something env) where env doesn't contain a reference to the envID of the function it is being returned from
+        c. If it is a call to a function (func params...) and func doesn't take in a dynamic environment and params... are all good to be returned
+        This makes it so that combiner calls can return partially-evaluated code - any macro-like combiner would calculate the new code and return
+            (eval <constructed-code> dynamic_env), which would do what partial evaluation it could and either become a value or a call like case "b" above.
+            Case "b" allows this code essentially "tagged" with the environment it should be evaluated in to be returned out of "macro-like" combiners,
+            and this dovetails with the next point
+    5. The (veval something env) form essentially "tags" a piece of code with the environment it should be evaluated in. At each stage where
+        it is possible, the system checks for redundent constructions like these, where the env in (veval something env) is the currently active env.
+        In this case, it unwraps it to just "something" and continues on - this completes the second half of the macro-like combiner evaluation where
+        after being returned to the calling function the code is essentially spliced in.
+    6. The compiler can emit if/else branches on the wrap_level of combiners and in each branch further compile/partial eval if appropriate, allowing
+        dynamic calls to either functions or combiners with the overhead of a single branch
+
+Note that points 4&5 make it so that any macro written as a combiner in "macro-style" will be expanded just like a macro would and cause no runtime overhead!
+Additionally, point 6 makes it so that functions (wrap level 1 combiners) and non-parameter-evaluating (wrap level 0) combiners can be dynamically passed around and called with very minimal overhead.
+Combine them together and you get a simpler but more flexiable semantics than macro based (pure functional) languages with little-to-no overhead.
+
+Additional tricky spots to look out for:
+    1. If you don't do the needed-for-progress tracking, you have exponential runtime
+    2. If you aren't careful about storing analysis information on the AST node itself or memoize, a naive tree traversal of the DAG has exponential runtime
+    3. Infinite recursion can hide in sneaky places, including the interply between the partial evaluator and the compiler, and careful use of multiple recursion blockers / memoization is needed to prevent all cases
+    4. The invarients needed to prevent mis-evaluation are non-trivial to get right. Our invarients:
+        a. All calls to user-combiners have the parameters as total values, thus not moving something that needs a particular environment underneath a different environment
+        b. All return values from functions must not depend on the function's environment (there are a couple of interesting cases here, see combiner_return_ok(func_result, env_id))
+        c. All array values are made up of total values
+        d. Some primitive combiners don't obey "a", but they must be written with extreme care, and often partially evaluate only some of their parameters and have to keep track of which.
+
+
+
+Everything operates on AST nodes, an ADT:
+    * val - integers, strings, booleans
+    * marked_array
+    * marked_symbol
+    * comb
+    * prim_comb
+    * marked_env
+
+Each AST node contains a hash representing it&it's subtree.
+
+fun needed_for_progress(ast_node) -> (progress_IDs, rec_stopping_hashes, extra_IDs):
+    returns
+    - environment IDs (stored in each AST node for it and it's children)
+        that must have real values if the partial evaluation of the subtree rooted at
+        this node is going to make progress partial evaluating.
+
+        progress_IDs is either true (meaning it will make progress no matter what), an
+            intset of env IDs (the ones that will cause it to make progress), or an empty
+            set, meaning it can't make forward progress no matter what
+    - hashes that if you're not inside the evaluation of, it could make progress
+    - extra IDs for envs it contains that don't count as forward progress IDs because the
+        env does have values, but envs in it's parent chain doesn't have values.
+
+The calculation for needed_for_progress is straightforward-ish, with some tricky bits at comb and array.
+
+Under these definitions, we call an AST subtree a "total val" if it is either a val or it's needed-for-progress IDs is nil.
+
+fun mark(x, eval_pos):
+    x is env -> error
+    x is combiner -> error
+    x is symbol -> if x == true  than MarkedVal(true)
+            else if x == false than MarkedVal(false)
+            else               MarkedSymbol(x, needed_IDs=if eval_pos true else nil)
+    x is array ->
+        MarkedArray(is_val=!eval_pos, attempted=false, resume_hashes=nil,
+                    values = [mark(x[0], eval_pos)] + [mark(xi, false) for xi in x[1:]])
+    true -> MarkedVal(x)
+
+fun strip(x) -> value:
+    if X is an AST node representing a value, it returns the value.
+    May strip recursively in the case of an array value, etc.
+    Errors on env, comb (but not prim_comb!) non value symbols or arrays
+
+fun try_unval(x) -> Result<ASTNode>:
+    //Removes one level of "value-ness".
+    x is Array -> if !x.array_is_val Error()
+                  else Ok(MarkedArray(is_value=false,
+                                   values = [try_unval(x.values[0])] + x.values[1:]))
+    x is Symbol -> if !x.symbol_is_val Error()
+                   else Ok(MarkedSymbol(symbol=x.symbol, is_value=false))
+    true -> Ok(x)
+
+fun check_for_env_id_in_result(env_id, x):
+    return env_id in <either progress_IDs or extra_IDs in needed_for_progress(x)>
+        if either progress_IDs or extra_IDs is true, then we have a fallback, but
+        that doesn't get called even on large testcases so it's either rare or impossible.
+        Fallback is slow though, whereas this is just a check for set membership
+
+// We only allow returning a value out of a combiner if the return value
+// doesn't reference the environment of the combiner
+fun combiner_return_ok(func_result, env_id):
+    func_result isn't later -> !check_for_env_id_in_result(env_id, func_result)
+    // special cases now
+    (veval body {env}) => (combiner_return_ok {env})
+    //    The reason we don't have to check body is that this form is only creatable in ways that body was origionally a value and only need {env}
+    //        Either it's created by eval, in which case it's fine, or it's created by something like (eval (array veval x de) de2) and the array has checked it,
+    //        or it's created via literal vau invocation, in which case the body is a value.
+    (func ...params) => func doesn't take dynamic env && all params are combiner_return_ok
+    otherwise -> false
+
+// We may end up in situations where the value/code we care about is wrapped up in
+// a redundent call to veval, namely after sucessfully returning based on combiner_return_ok above.
+// This call may prevent other optimizations though, so we should unwrap the redundent call if possible,
+// and if it causes a change we should re-partially-evaluate to make further progress if we can
+fun drop_redundent_veval(x, dynamic_env, env_stack, memostuff):
+    (veval node env) if env.id == dynamic_env.id -> drop_redundent_veval(node, dynamic_env, env_stack, memostuff)
+    (comb params...) if comb.wrap_level != -1 -> map drop_redundent_veval over params and  if any change: partial_eval( (comb new_params...), dynamic_env, env_stack, memostuff)
+                                                                                                    else: x
+    else -> x
+
+fun make_tmp_inner_env(params, de?, ue, env_id):
+    ...
+
+
+fun partial_eval_helper(x, only_head, env, env_stack, memostuff, force):
+    needed, hashes, _extra = needed_for_partial_eval(x)
+    if force || one of hashes is not in memostuff || needed == true || set_intersection(needed, env_stack.set_of_ids_that_are_vals) != empty_set:
+        x is MarkedVal -> x
+        x is MarkedEnv -> find(x.env_id == it.env_id, env_stack) ?: x
+        x is MarkedComb -> if !env.is_real && !x.se.is_real // both aren't real, re-evaluation of closure creation site
+                           ||  env.is_real && !x.se.is_real // new env real, but se isn't - the creation of the closure!
+                           then let inner_env = make_tmp_inner_env(x.params, x.de?, env, x.env_id)
+                           in MarkedComb(se=env, body=partial_eval_helper(body, false, inner_env, <add inner_env to env_stack>, memostuff, false))
+        x is MarkedPrimComb -> x
+        x is MarkedSymbol -> if x.is_val then x
+                                         else env_lookup_helper(x, env)
+        x is MarkedArray -> if x.is_val then x
+                                        else let
+                                            comb = partial_eval_helper(x.values[0], only_head=true, env, env_stack, memostuff, false)
+                                            params = x.values[1:]
+                                            if later_head?(comb) return MarkedArray(values=[comb]+params)
+                                            if comb.needed_for_progress == true:
+                                                comb = partial_eval_helper(comb, only_head=false, ...)
+
+                                            // If not -1, we always partial eval, if >0 we also unval/partial eval to do one full round of eval
+                                            wrap_level = comb.wrap_level
+                                            while wrap_level >= 0:
+                                                if wrap_level >= 1:
+                                                    params = map(unval, map(\x. partial_eval_helper(x, ...), params))
+                                                params = map(\x. partial_eval_helper(x, ...), params)
+                                                wrap_level -= 1
+                                            if <any of the above error, or couldn't be unvaled yet>:
+                                                return MarkedArray(values=[comb.with_wrap_level(wrap_level)] + <params at whatever level they were sucessfully evaluated to>)
+
+                                            if comb is MarkedPrimComb:
+                                                result = comb.impl(params)
+                                                if result == 'LATER:
+                                                    return MarkedArray(values=[comb.with_wrap_level(wrap_level)] + params)
+                                                else:
+                                                    return result
+
+                                            if comb.is_varadic:
+                                                params = params[:comb.params.len-1] + [ params[comb.params.len-1:] ]
+
+                                            inner_env = MarkedEnv(id=comb.env_id, possible_de_symbol=comb.de?, possible_de=env, symbols=comb.params, values=params, upper=comb.se)
+
+                                            rec_stop_hash = combine_hash(inner_env.hash, comb.body.hash)
+                                            if rec_stop_hash in memostuff:
+                                                return MarkedArray(values=[comb] + params, transient_needed_env_id=true, rec_stopping_hash=rec_stop_hash)
+
+                                            memostuff.add(rec_stop_hash)
+                                            result = partial_eval_helper(body, false, inner_env, <add inner_env to env_stack>, memostuff, false)
+                                            memostuff.remove(rec_stop_hash)
+
+                                            if !combiner_return_ok(result, comb.env_id):
+                                                transiently_needed = if comb.de? != nil then env.id else nil
+                                                return MarkedArray(values=[comb] + params, transient_needed_env_id=transiently_needed, rec_stopping_hash=rec_stop_hash)
+
+                                            return drop_redundent_veval(result, env, env_stack, memostuff)
+
+And then we define a root_env with PrimComb versions of all of the standard functions.
+The ones that are most interesting and interact the most with partial evaluation are
+    vau eval cond
+The other key is that array only takes in values, that is an array value never hides something that isn't a total value and needs more partial-evaluation
+     (this makes a lot of things simpler in other places since we can treat array values as values no matter what and know things aren't hiding in sneaky places)
+
+fun needs_params_prim(...):
+    ...
+fun give_up_params_prim(...):
+    ...
+
+fun veval_inner(only_head, de, env_stack, memostuff, params):
+    body = params[0]
+    implicent_env = len(params) != 2
+    eval_env = if implicit_env { de } else { partial_eval_helper(params[1], only_head, de, env_stack, memostuff, false) }
+    evaled_body = partial_eval_helper(body, only_head, eval_env, env_stack, memostuff, false)
+    if implicit_env or combiner_return_ok(evaled_body, eval_env.idx):
+        return drop_redundent_veval(evaled_body, de, env_stack, memostuff)
+    else:
+        return drop_redundent_veval(MarkedArray(values=[MarkedPrimComb('veval, wrap_level=-1, val_head_ok=true, handler=veval_inner), evaled_body, eval_env], de, env_stack, memostuff)
+
+root_env = {
+    eval: MarkedPrimComb('eval, wrap_level=1, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, params):
+                let
+                    body = params[0]
+                    implicit_env = len(params) != 2
+                    return veval_inner(only_head, de, env_stack, memostuff, if implicit_env { [try_unval(body)] } else { [try_unval(body), params[1]] })
+          )
+    vapply: MarkedPrimComb('vapply, wrap_level=1, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, [func params env]):
+                    return veval_inner(only_head, de, env_stack, memostuff, [MarkedArray(values=[func]+params), env)
+          )
+    lapply: MarkedPrimComb('lapply, wrap_level=1, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, [func params env]):
+                    return veval_inner(only_head, de, env_stack, memostuff, [MarkedArray(values=[func.offset_wrap_level(-1)]+params), env)
+          )
+    vau: MarkedPrimComb('vau, wrap_level=0, val_head_ok=true, handler=lambda(only_head, de, env_stack, memostuff, params):
+                let
+                    de? = if len(params) == 3 { params[0].symbol_value } else { nil }
+                    params = map(lambda(x): s.symbol_value, if de? { params[1] } else { params[0] })
+                    varadic = '& in params
+                    params.remove('&)
+                    implicit_env = len(params) != 2
+                    body = try_unval(if de? { params[2] } else { params[1] })
+                    env_id = <new_id>
+                    if !only_head:
+                        inner_env = make_tmp_inner_env(params, de?, upper=de, id=env_id)
+                        body = partial_eval_helper(body, false, inner_env, <add inner_env to env_stack>, memostuff, false)
+                    return MarkedComb(wrap_level=0, id=new_id, de?=de?, static_env=de, variadic=varadic, params=params, body=body)
+          )
+    wrap: ...<returns new MarkedPrimComb/MarkedComb with incremented wrap_level>...
+    unwrap: ...<returns new MarkedPrimComb/MarkedComb with decremented wrap_level>...
+    cond: ...
+          ...Oddly tricky - is wrap_level 0, but...
+          ...                 1. unvals & partially evaluates starting from the first condition
+          ...                   2. if this condition is true, return the unvald & partially evaluated corresponding arm
+          ...                   3. if this condition is false, drop the arm and return to 1
+          ...                 4. In this case, we have an unknown between true & false
+          ...                   5. check to see if combine_hash(x.hash, env.hash) is in memostuff (prevent infinite recursion blocked on a cond guard!)
+          ...                       6. if the hash was in memostuff, return MarkedArray(later_hash=the_hash,
+          ...                                                                           values=[MarkedPrimComb('vcond,wraplevel=-1,...)] + map(unval, <remaining preds/arms>))
+          ...                       7. else new_preds_arms = map(partial_eval..., map(unval, <remaining preds/arms>))
+          ...                       <TODO: 8. remove arms/preds now guarenteed to be false, remove all arms/preds after first true>
+          ...                       9. return MarkedArray(values=[MarkedPrimComb('vcond,wraplevel=-1,...)] + new_preds)
+          ...
+          ...The vcond is like cond but doesn't do any unvaling (as it's already been done) (and wrap_level is set to -1 so the function call machinery doesn't touch the params either)
+          ...
+    symbol?: needs_params_prim(symbol?)
+    int?: needs_params_prim(int?)
+    string?: needs_params_prim(string?)
+    combiner?: ...
+    env?: ...
+    nil?: needs_params_prim(nil?)
+    bool?: needs_params_prim(bool?)
+    str-to-symbol: needs_params_prim(str-to-symbol)
+    get-text: needs_params_prim(get-text)
+    array?: ...
+    array: ...
+    len: ...
+    idx: ...
+    slice: ...
+    concat: ...
+    +: needs_params_prim(+)
+    -: needs_params_prim(-)
+    *: needs_params_prim(*)
+    /: needs_params_prim(/)
+    %: needs_params_prim(%)
+    band: needs_params_prim(band)
+    bor: needs_params_prim(bor)
+    bnot: needs_params_prim(bnot)
+    bxor: needs_params_prim(bxor)
+    <<: needs_params_prim(<<)
+    >>: needs_params_prim(>>)
+    =: needs_params_prim(=)
+    !=: needs_params_prim(!=)
+    <: needs_params_prim(<)
+    <=: needs_params_prim(<=)
+    >: needs_params_prim(>)
+    >=: needs_params_prim(>=)
+    str: needs_params_prim(true_str)
+    log: give_up_params_prim(log)
+    error: give_up_params_prim(error)
+    read-string: needs_params_prim(read-string)
+    empty_env: MarkedEnv()
+}
+
+fun compile(...):
+    ...
+    ... tagged words, etc
+    ... eval
+    ... vau / vau helper closure
+    ...
+    Note that when it's compiling a call, it compiles an if/else chain on the wrap level of the combiner being called.
+    in the 0 branch, it emits the parameters as constant data
+    in the 1 branch, it unval's and partial evals all of the parameters before compiling them.
+        - note that this must be robust to partial-eval errors, as this branch might not ever happen at runtime and be nonsense code!
+        - if the partial evaluation errors, it emits a value that will cause an error at runtime into the compiled code
+    in the > 1 branch, it errors
+    ...
+    ...
+    Must be careful about infiniate recursion, including tricky cases that infinitly ping back and forth between
+    partial eval and compile even though both have individual internal recursion checks
+    ...