Remember when I said that I’d probably be formalizing some more deciders next? Well, looks like I lied, because Skelet #1 managed to pull me in the very next day.
The Coq development stands at a chonky 1628 lines – almost as much as all the other Coq code in the repository combined. The main theorems are as follows:
- first, we show that the machine quickly enters a configuration that can be described with the high-level symbolic representation:
Lemma init : c0 -->* L <: C1 |> P :> R. Proof. unfold L, C1, R. execute. Qed. Definition initial: conf := (right, [l_C1], [r_P]). Lemma init' : c0 -->* lift initial. Proof. exact init. Qed.
- then, we have the function
fullstep
, which attempts to make progress using either the @uni-cycle acceleration, the stride acceleration, or a simple symbolic rule. Here is a theorem that states that whatever this function returns is a valid transition:Lemma fullstep_spec : forall c c', fullstep c = Some c' -> lift c -->* lift c'.
- the final cycle is described as follows:
Three things to note here:Lemma infinite_cycle : forall l, lift (right, l_C0 :: l, K) -->+ lift (right, l_C0 :: F ++ l, K).
-->+
indicates that we have taken special care to make sure that this transition takes at least one step – this is necessary for the proof of non-halting to go through- my simulator only 982 steps to simulate this cycle, as opposed to @uni’s 1357. This is because the stride acceleration doesn’t require a
C3
symbol to trigger – something like|> D x^n D x^n P $
is enough. - I have chosen a different point in the cycle than the one in Shawn’s blogpost, as my simulator was skipping right over that point (because, again, the stride acceleration is more general).
- the cycling configuration is detected by the function
is_cycling
. If it returns true, then the machine doesn’t halt when started in this configuration.Theorem is_cycling_spec : forall c, is_cycling c = true -> ~ halts tm (lift c).
This is all extracted to OCaml, and driven by the following tiny program:
open Skelet1
let rec doit n c =
if is_cycling c then
Printf.printf "cycle found after %d steps\n" n
else begin
if n mod 1_000_000 == 0 then
Printf.printf "%d steps done\n%!" n;
match fullstep c with
| Some c' -> doit (n + 1) c'
| None ->
Printf.printf "unknown transition after %d steps\n" n
end
let _ = doit 0 initial
The results are as follows:
$ time ./skelet1
[...]
87000000 steps done
cycle found after 87637653 steps
real 1m36.329s
user 1m36.240s
sys 0m0.086s
Perhaps one of the more interesting parts of the proof is the uni-cycle, because it involves computation (simple_step
evaluates to this, so consider what happens in that configuration now) significantly interleaved with more abstract reasoning (we have an assumption that says we can perform T
strides on r
, so we can do a stride now and r
will advance by (say) 4096 strides).
This is all automated by perhaps the most complicated Ltac code I have written so far, culminating with a single line that takes over 10 seconds to execute (and then roughly another 10s at Qed
, when the Coq kernel type-checks the proof term). The general approach for strides look like this:
- compute
stride 0 1 (...)
for the current right side of the tape - if we can’t perform a stride, this will have evaluated to
None
, plain and simple. If we can, then it is some term that containsstride _ _ r
. - we have an assumption about
r
that we can use to rewrite this and finish evaluating
My first attempt was taking around 7 seconds for each stride, because the intermediate term had the call to stride _ _ r
buried under dozens of match
expressions. Apparently Coq doesn’t like those.
This was solved with a “tail-recursive” implementation of the stride
function, which uses manual continuation passing. This allows you to run the computation as far as it can go, and then capture the continuation into an Ltac variable.
Of course, all this complex machinery doesn’t need to be understood and checked to trust the proof, because the Coq kernel makes sure that whatever we did conforms to the rules of the particular type theory in use.
As a somewhat simpler example of a similar phenomenon, we have the max_stride
function, which computes how many strides can be performed on a tape. The value which it returns doesn’t matter for correctness, as the result is only used to decide what stride count to ask of stride
– if max_stride
returns a value that is too high, then stride
will return None and the relevant acceleration won’t be applied, affecting only performance.
I could talk at length about the formalization, with tidbits like the funky induction needed to prove the stride rule or the headache that is performant arithmetic in programs extracted from Coq (I make the slow but trivially correct option of using binary numbers directly from the Coq standard library, extracted to an OCaml ADT like any other Inductive
definition). But I probably should stop right about now – it is 6 AM, and therefore my sleep schedule is shifting forwards even more. God, Coq can be way too addictive.