[Decider] Finite Automata Reduction

UncombedCoconut · November 6, 2022, 5:30pm

Finite Automata Reduction: principle

This program finds and verifies proofs of the form:

The following finite-state machine, by construction, recognizes all halting tape configurations for this TM.
It doesn’t recognize the initial configuration. Therefore, the TM doesn’t halt from its initial configuration.

The decider correctness paper, available here, derives its algorithms from this requirement.

It’s also fair to call it a general co-CTL decider, where the “co-” means complement.
(The complement of a CTL is characterized by closure under time-reversed TM steps. Regardless, the goal is to tell the start config apart from halting ones.)

The basis of its proof format is a specific state-machine architecture.
If the required conditions hold, the state machine works as a proof.
It’s known that all possible (co-)CTL solutions are expressible in this way, though there may be translation overhead.
Moreover, the full automaton is efficiently constructible from the DFA it uses to scan one side of the tape.
This has three good consequences:

The DFA serves as verification data, in a compact and easily parsed format. @TonyG has helpfully implemented a verifier in an independent code base.
We get a powerful decider by searching the space of DFAs for one that works. That’s implemented in my code (--prover=direct) and also Tony’s.
A second decider, built on a state-of-the-art SAT solver (powered by cats), is automatically sound if its DFAs check out.

This second prover actually looks for for meet-in-the-middle DFA solutions—which translate into the above framework but are best understood on their own terms.
(Update: This method is not part of the 2023-04-09 decider acceptance.)
Historically, that one was developed earlier. The first public SAT-based implementation is this one by @djmati1111 on Discord.
@Mateon1 also shared many insights, discoveries, results, and a public but not freestanding implementation.
Huge thanks and credit to the above authors!

The methods coexist because direct is far superior at lower search depths, whereas mitm_dfa scales far better at high search depths.
(When feasible, direct-n covers mitm_dfa-n in scope.)

Decider examples and counterexamples

From a zoology perspective, this decider’s coverage is cross-cutting.
It doesn’t particularly care what the machines do, just what they won’t do (set themselves up to halt).
For grins, we can ask it to search the entire Skelet list single-threaded at low depth (abbreviating --limit and --ad-hoc on the command line):

$ target/release/decider-finite-automata-reduction -l 6 -l 0 -a 68329601 -a 55767995 -a 5950405 -a 6897876 -a 60581745 -a 58211439 -a 7196989 -a 7728246 -a 12554268 -a 3810716 -a 3810169 -a 4982511 -a 7566785 -a 31357173 -a 2204428 -a 20569060 -a 1365166 -a 15439451 -a 14536286 -a 347505 -a 9980689 -a 45615747 -a 6237150 -a 60658955 -a 47260245 -a 13134219 -a 7163434 -a 5657318 -a 6626162 -a 4986661 -a 56967673 -a 6957734 -a 11896833 -a 11896832 -a 11896831 -a 13609549 -a 7512832 -a 35771936 -a 9914965 -a 3841616 -a 5915217 -a 57874080 -a 5878998

this takes about 10 seconds to solve 8/43 of them and output JSON-formatted proofs:

Examples:
Skelet #4
Skelet #23
Skelet #25
Skelet #29
Skelet #32
Skelet #36
Skelet #37
Skelet #40
(Deeper searches add Skelet machines 2, 5, 6, 8, 9, 11, 12, 13, 16, 19, 24, 27, 38, and 42. See “Results”.)
Counterexamples:

Despite the halting-problem obstacles, proving any specific 5-state TM immune to CTL proofs was an open question for quite some time. This has been settled decisively by Iijil, and there are strong heuristic arguments that a total of 30 TMs in the seed DB are completely immune to CTL.

Basically, a heuristic argument gives a reason that a TM’s halting configurations can only be told apart from reachable ones by counting, and such arguments become proofs if we can invoke the pumping lemma.

Sometimes, CTL proofs exist but are only visible to my program at absurd search depths.
@Iijil reports that Bruteforce-CTL solved 7046428, whereas direct-10 and mitm_dfa-16 (~1 thread-hour each) can’t.
Similarly, the entire Halting Segment and CPS methods are covered in theory, but my program couldn’t replace them practice. (In CPS’s case, the idea is laughable.)

Decider code

Original home: bbchallenge-deciders/decider-finite-automata-reduction at finite-automata-reduction · UncombedCoconut/bbchallenge-deciders · GitHub

Pull request: Finite automata reduction by UncombedCoconut · Pull Request #11 · bbchallenge/bbchallenge-deciders · GitHub

Decider tests

The unit tests are in-line, and the src/core/ files responsible for proof-checking are extensively tested.

Results

April 9th 2023: 503,169 machines decided using up to direct-6 (6 DFA states). This takes a bit less than 30 minutes on a regular laptop.

The maximum potential is FAR greater, and has a dedicated thread.
The Decider Verification File there solves all but 38 TMs, and descending.
However, results this good are not even theoretically achievable with this version of the program (where proof search is limited to 25-state DFAs for 5-state TMs for efficiency reasons).
(Note: to verify this DVF with Tony’s program now requires raising MaxDFA_States in FAR.h.)

More practically, it should be quite easy to adopt MitM-DFA in order to solve thousands more TMs. Halving the undecided index to 15543 could be done almost immediately.
For a walkthrough of operating this decider in both modes, and reproducting its results exactly, see this Results Reproduction thread.
In addition, when the undecided population shrinks further, it becomes highly practical to run the FAR-direct decider again to higher depths, up to 9, without any tricks.

Database subset of application

index revision 758324e with 535801 machines.

Decider correctness

Formal presentation of the method here.

UncombedCoconut · November 7, 2022, 5:46am

Here are point-in-time index and dvf files, if anyone wants to cross-reference or try any independent verification.

At this time, the decider had solved 1526738 of the 1538624 machines.
There are 11868 to go. (That’s not a math error; it just kept solving while I realized I’d need to mess with an external file-sharing service. )

UncombedCoconut · November 10, 2022, 1:01pm

Judging by click counts, the community doesn’t need many of these, but here’s a checkpoint.
The dropbox links should be updated in-place, with 1529124 verifiably solved and 9500 unsolved.

The SAT-based prover is hitting more of a performance wall than I expected (based on smaller samples and Mateon1’s successes). I’m looking for a natural time to stop the program, add at least direct-9 on the smaller unsolved set, and call the solved-count final… at least until we know what new decider(s) shall cover the hard parts of BB(5) and whether slow FA-reduction can still fill any gaps.

BTW, I’ve pushed README updates which describe the direct algorithm in more detail, and I’m making cosmetic edits to the OP in this thread.

UncombedCoconut · November 13, 2022, 4:05am

I totally collided with a Decider Verification File “Decider Type” value TonyG was using already. Incremented it, and updated the proven machines (1531525 solved, 7099 unsolved).
For the record, Halting Segment (iijil) has 88 machines outside this solved-set. TonyG’s has those 88 plus 173 more. Mateon1’s cps_8_16 file has 4170 outside this set. Of course this decider gets tons of stuff those don’t, and its deficits are dropping, but not to 0.

UncombedCoconut · November 14, 2022, 1:07pm

OK, turns out TonyG tried a while ago to coordinate a DeciderType with me, but the email got eaten. So I’ve pushed another update (now at 1531683 solved, 6941 unsolved).

UncombedCoconut · December 12, 2022, 6:44am

Update #1:
PR for derivation / correctness proof is here.

Update #2:
I rebased to the post-halting-segment set of 535816 (available here).
I’ve optimized the pipeline of proof searches, and done a reproducer run to clarify the cost-benefit tradeoffs of reproducing a given search depth and applying the results.

Progress indicator (NOT FOR REPRODUCTION, EVEN THE COUNTS, JUST FYI)
530852+ 4964 ████████████████████████████████████████████████████████████████████████████████████████████████████████░ All Deciders: 2d 16:49:59
535816/ 535816 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-1 : 00:00:00
482842/ 482842 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-2 : 00:00:00
411355/ 411355 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-3 : 00:00:03
315080/ 315080 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-4 : 00:00:13
167339/ 167339 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-5 : 00:01:10
71221/ 71221 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-6 : 00:10:50
32744/ 32744 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ mitm_dfa-8 : 00:20:00
21059/ 21059 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ mitm_dfa-9 : 00:56:32
15554/ 15554 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-7 : 00:45:05
12369/ 12369 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ mitm_dfa-10 : 02:29:52
9468/ 9468 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ mitm_dfa-11 : 09:13:37
7360/ 7360 ~00:00:00 ████████████████████████████████████████████████████████████████████████████████████████████████████████████ direct-8 : 08:45:49
6058/ 6058 ~00:00:00 █████████████████████████████████████████████████████████████████████████████████████████████████████████ mitm_dfa-12 : 1d 14:50:22
119/ 4991 ~7d 07:11:19 ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ direct-9 : 03:19:34^C
[I cancelled here and reused the exact results I already had.]

This was with a 14-core “cluster” of laptops + 1 PC + 1 paperweight, aged 11y to 1y. I cancelled direct-9.. I don't think we'll want to run direct-9` on 4991 TMs anyway: better to pre-filter.

The below counts are exact and exactly reproducible by running the same pipeline:

pass	found	Σfound	left
direct-1	57644	57644	478172
direct-2	75823	133467	402349
direct-3	102241	235708	300108
direct-4	140760	376468	159348
direct-5	88779	465247	70569
direct-6	37931	503178	32638
mitm_dfa-8	11584	514762	21054
mitm_dfa-9	5505	520267	15549
direct-7	3182	523449	12367
mitm_dfa-10	2900	526349	9467
mitm_dfa-11	2111	528460	7356
direct-8	1299	529759	6057
mitm_dfa-12	1066	530825	4991
direct-9	759	531584	4232

Individual stage results and combined repro.{index,dvf} files are available here.

TonyG · December 14, 2022, 6:53pm

I have put your 4232 hold-out machines through my Deciders, and got the number down to 694. See bbchallenge/Holdouts at main · TonyGuil/bbchallenge · GitHub for details, including Decider Verification Files.

My broader segment-matching criterion for the HaltingSegments Decider fails to find any new non-halting machines here. So if @UncombedCoconut’s results are correct, there is no need for it.

cosmo · December 14, 2022, 6:55pm

Great news, thank you!

TonyG · December 22, 2022, 12:42pm

Seven more Bouncers found with improved Decider code. So the number of holdouts is now down to 687. See my github Holdouts page for details.

TonyG · January 11, 2023, 4:10pm

I have written a C++ Verifier for UncombedCoconut’s binary Decider Verification File format. It is available at bbchallenge/NFA at main · TonyGuil/bbchallenge · GitHub. It verifies all 531,584 machines in 30 seconds.

TonyG · January 17, 2023, 10:46pm

My C++ Decider and Verifier are now available on https://github.com/TonyGuil/bbchallenge/tree/main/FAR. Basically, all the Decider does is reproduce Justin’s results for smaller values of n (up to n=6, say). It can generate DFA-only and DFA-with-NFA output. The Verifier can handle both these formats, for any value of n.

UncombedCoconut · January 17, 2023, 11:07pm

Thanks very much, Tony - I’m excited.

UncombedCoconut · January 20, 2023, 12:30am

Thanks again, Tony, I’m getting around to trying your version.
Could you please double-check if the FAR.h header was successfully committed?
In the near future I will also ask for a favor, namely implementing the heuristic I wrote about on Discord – it can wait, if you like, til I’ve committed the same to my repo and the pseudocode/results subsections of the bbchallenge-proofs write-up.
This will most likely get you reproducing up to n=7 comfortably.

Intended next steps:

Depending on how beloved or otherwise the writeup is, we might have the ingredients very soon to apply the results of up to “direct-7” (reproduced by Tony’s program) and “mitm_dfa-11” (reproduced by djmati1111’s program). I think that that will come to 528254 decided and 7547 to go, relative to the Dec. 25 undecided index.

The maximum potential of this decider family is greater – my proofs file (index/DVF) has 531669 machines in the current index, leaving 4132. It takes high and costly parameters (or knowing what you’re looking for) to find these, so most likely we finalize the considerable progress on CPS and Bouncers before returning to these. It’s just good to know what we’ve got in reserve.

For now I will try to get the new heuristic committed/documented properly, then refresh the original post - updating a few minor points, the command to run, and the timing/result details.

TonyG · January 20, 2023, 9:38am

Hi Justin, sorry about that – the header files are up now. I have two questions:

As I understand it, your new heuristic (unlike the original version) doesn’t find all machines, only about 99% of them. Is this right? And does that 99% figure stay more or less the same as you increase the depth? I would certainly like to try it myself, but I’m not sure I have all the info – is your Discord description (“if the first k states of the n-state DFA aren’t sinks, that k <= 2^floor(log2(n))”) all there is to it? I know you posted your Rust code, but I am not fluent in that direction.

Secondly: why is it important to reproduce the search process for direct-7 and higher? Surely it’s enough to reproduce the process of verifying your dvf’s?

Oh, and thirdly: my GPU idea was a washout – any time the execution flow depends on the input data, the threads desynchronize and you lose all the benefit. Oh well, I learned a lot.

UncombedCoconut · January 20, 2023, 2:39pm

Hi, thanks!

Right - it’s an assumption about which searches are futile to continue, which is false but has two redeeming qualities: (1) it can’t be wrong enough to block a solution entirely, only at fixed depths, (2) statistically, it saves far more time than it wastes by making you re-search machines.

I doubt it, but it was a vast majority at all depths up to 9 (100% for power-of-2 depths).

Pretty much! For state i to be a “sink state” means DFA[i][0] == i && DFA[i][1] == i. (Consequence: if DFA[i][0] is known and DFA[i][1] is under construction, you can’t be sure i is a sink but you might be sure it isn’t.)
The Rust code computes this power-of-2 bound by extracting the high bit of n, tracks the run of non-sink-states starting at i=0, and rejects the DFA (same as if ExtendNFA returned false) if the count exceeds the power-of-2 bound. For fear of off-by-one errors I confirm that “the count” is e.g. 1 if the run is from i=0 to i=0 inclusive.

This same debate arose in HS acceptance discussions. To summarize: this thread and sub-forum is the place to debate that. I think everyone’s trying to set fair and non-dogmatic rules, just with varying ideas about what balance to strike. I’ll say this, though: when it’s practical, it’s very clean for the BB challenge’s ultimate work product to say: “we followed the method described here, deciding 1,234,567 of the remaining machines, and so can you, and nothing more needs to be said about those machines.”
I can’t find the quote right now, but I think even cosmo was open to shifting to a proof-data approach – but for a small machine population.

Drat. I wasn’t really expecting it to work, but you’d already beaten my expectations, so I was hoping.

UncombedCoconut · January 22, 2023, 5:48am

@TonyG – I see why n <= 6 was the expectation you wanted to set.
But the code is quite reasonable - I suspect performance differences are mainly about abstraction penalties the compiler couldn’t eliminate in BoolAlgebra. (For that matter, -march=native -flto compiler flags seem to help, ~10% faster here - and it’s cheating if my program can do it and yours can’t. )

I want to be careful with code feedback on an independent reproducer.
I’m very sure this much is OK, giving it here for transparency though:

It improves compiler compatibility if we use the & operator on FiniteAutomataReduction::ThreadFunction in DecideFAR.cpp, as for the sibling programs.
Any flat array of dimension MaxDFA_States or MaxDFA_States + 1 (RStack, aStack, m) should be double-checked – the question literally being, “am I indexing this by up to double MaxDFA_States?”

Best,
Justin

TonyG · January 22, 2023, 11:37am

I’ve implemented your two suggestions. There were indeed array indexing bugs. Thanks for the feedback! BTW, I have left this topic for now, and am thinking about Bells.

UncombedCoconut · January 22, 2023, 3:07pm

Fair enough, thanks for the work done, once this decider submission is on solid footing I will turn to a bouncers-repro. (We shall see if Uni’s ideas for a simpler inductive prover pan out or not - I’ll give it some space.)

cosmo · April 7, 2023, 10:16am

I’ve been running direct-6 both with your C++ implem and Justin’s rust program. The set of decided machines matches.

However, weirdly, it seems that your implem does not always find the smallest DFA, would you know why?

Here is an example with machine #75:

Your DFA: [[0, 1], [2, 3], [1, 3], [3, 4], [3, 5], [3, 3]]
Justin’s: [[0, 1], [2, 3], [1, 3], [3, 3]]

EDIT: @UncombedCoconut answered the question on discord. Your program searches DFAs with exactly 6 states, while Justin does <= 6.

UncombedCoconut · April 7, 2023, 10:36am

Yes. For that matter, my program also searches DFAs with exactly 6 states on the “direct-6” iteration.
It’s just a matter of what we encourage in our respective CLIs. I make it so you’re always asking for a range of “direct” or “mitm_dfa” runs, so just “direct-6” looks like “-p direct -x 5 -l 6”. It’s more convenient to do “direct” from 1 to 6, which is hopefully what the user wants anyway. Tony’s approach is simpler, so good in a different way.