#7410754 does not halt

UncombedCoconut · September 17, 2022, 5:01pm

Hi! The bbchallenge.org front page currently names Machine #7,410,754 first among the “scary undecided machines”.
I’ve observed that it doesn’t halt; I’d like to sketch a proof and ask any ideas here might be new and fit into the overall program.

I’m now seeing “Closed Tape Language”, and the idea here is similar (sadly with some notation differences).
As a TM steps through many intermediate states, whose description would clutter a closed language, let’s analyze equivalent String Rewriting Systems.

We model a TM’s state TM as a word in the language [01]*[a-eA-E][01]*.
Lower-case a-e represent the TM head in state A-E on top of a 0 bit.
Upper-case A-E represent the TM head in state A-E on top of a 1 bit.
A “Word” is just a string (with designated left/state/right components).
The TM states are Words modulo leading/trailing zeros.
Words (NOT modulo leading/trailing zeros) can also represent TM-state infixes.

Our little friend’s machine code is thus expressible as these rewrite rules:
a0→1b, a1→1B, 0A→d0, 1A→D0, 0b→c1, 1b→C1, B0→1c, B1→1C, 0c→a1, 1c→A1, C0→0c, C1→0C, 0D→e0, 1D→E0, e0→0b, e1→0B, 0E→d1, 1E→D1

The benefit is that we can compose rules and obtain a system with the same behavior (transform a TM state into a future state) with intermediate steps/states bypassed.
I found it convenient to formalize these rewrite (not exactly Semi-Thue) systems as follows:

Binary → relation on words (modulo leading/trailing zeros) generated from finitely many infix substitutions.
Designated halt words ({d} here): if one appears as an infix, the machine will halt.
Designated start state (a here).
The system is deterministic: every TM state matches exactly one rewrite or halt-word.

Determinism helps us prove certain transformations are sound.
A designated start state helps us prune unreachable rules. (Formally: mark them halting, and declare the system equivalent when the start state’s trajectory is.)
Imagine you diagrammed the system: vertex per rewrite rule, directed edge between rules which (partially) compose. The first rewrite applied to the start state serves as a root. Only vertices reachable from the root matter; furthermore, we can advance the start-state until this root belongs to a cycle.

So #7410754 simplifies as follows. (Messy but doable with heuristics.)

Compose rewrites, where possible: a0→0C, a1→1B, 1A→D0, 0b→c1, 1b→0C, B0→A1, B1→1C, 0c→1B, 1c→A1, C0→1B, C1→0C, 0D→c1, 1D→E0, e0→c1, e1→0B, 1E→D1; start at a; halt on 0A|d|0E
Advance start & prune a0→0C, a1,0b,1b,e0,e1 rules: 1A→D0, B0→A1, B1→1C, 0c→1B, 1c→A1, C0→1B, C1→0C, 0D→c1, 1D→E0, 1E→D1; start at C; halt on a|0A|b|d|e|0E
Split rules whose targets overlap with halt-words 0A, 0E: 1A→D0, 1B0→1A1→D01, B1→1C, 0c→1B, 11c→1A1→D01, C0→1B, C1→0C, 0D→c1, 11D→1E0→D10, 1E→D1; start at C; halt on a|0A|b|0B0|01c|d|01D|e|0E
Prune 1A→D0, 1E→D1: 1B0→D01, B1→1C, 0c→1B, 11c→D01, C0→1B, C1→0C, 0D→c1, 11D→D10; start at C; halt on a|0A|1A|b|0B0|01c|d|01D|e|0E|1E = a|A|b|0B0|01c|d|01D|e|E
Split on halt-word 01c, compose: 1B0→D01, B1→1C, 0c→1B, 11c→D01, C0→1B, C1→0C, 00D→0c1→11C, 110D→11c1→D011, 11D→D10; start at C; halt on a|A|b|0B0|01c|d|01D|010D|e|E
Prune: 1B0→D01, B1→1C, C0→1B, C1→0C, 00D→11C, 110D→D011, 11D→D10; start at C; halt on a|A|b|0B0|c|d|01D|010D|e|E
Split on substitutable 1B0, compose, prune: C00→D01, C01→11C, C1→0C, 00D→11C, 110D→D011, 11D→D10; start at C; halt on a|A|b|B|c|d|01D|010D|e|E

By inspection, the modified system only inserts 1’s on the left in pairs, so the halt-words are unreachable!
That is, the regular language (0|11)*[CD].* separates reachable states from halting states.

One thing that has me concerned about the approach: the simplifications helped me find a “closed tape language”, but what’s the algorithm, and could it have tackled the original TM just as easily? (By unwinding the above construction, we see that a CTL for the original TM exists.)

The RewriteSystem simplification is mostly implemented in Python. Needs work before it’s obviously correct, readable, and able to choose “split” steps automatically. If the approach doesn’t sound like a dead end, I’ll get it polished and shared.

UncombedCoconut · September 18, 2022, 12:25am

Actually, the corresponding CTL for the original TM simplifies to a | 1b | (0|11)* ( a1 | 1A1 | 0b | 1B | c | C | D | e0 | 1E0) [01]* in the above notation.
That’s really not that bad – it suggests that the above approach is overkill for at least this machine, and instead some not-too-complicated way of “truncating” TM’s into finite-state variants should be successful in this case.
So now I’m not sure which approach I’ll finish working on first!

sligocki · September 21, 2022, 6:55pm

Nice find and I like your formalization of this process in a syntactic rewriting language! This is one of the ~7k TMs that I have not been able to prove automatically, so finding some way to automate reasoning like this seems really valuable.

I can confirm by direct evaluation that the following is a CTL for this machine:

(0|11)* (a1 | 1A | 0b | 1B | c | C | D | e0 | 1E) .*

and that it enters this CTL after 2 steps (I’ve simplified this slightly 1A1.* -> 1A.* and 1E0.* -> 1E.*).

sligocki · September 21, 2022, 6:58pm

I’m guessing that my automated CTL system cannot find rules like this because of the variable block length in (0|11)*. I have seen rules like this a handful of times and it would be great to come up with an automated way to detect and prove such rules!

univerz · September 24, 2022, 5:50am

are you on our discord?

UncombedCoconut · September 24, 2022, 12:27pm

Just joined. (I’d given up when the invite link in the stickied thread didn’t work. Apparently that was just a weird one-off issue.)

tremlin · September 24, 2022, 7:46pm

Hi all,

I’ve been thinking about deciding problems from the challenge by using automatic pattern matching for a while. The example from this thread really motivated me to look deeper into this approach because it seems that “simple” solutions are possible even if the TM output looks complex.
I saw that there is existing Python code on Github for the CTL approach by @sligocki, I haven’t looked at it in detail yet (still on my list).
In order to get a better understanding myself, I hacked together a Python prototype consisting of two components:

One tool which can transform whole regular expressions (not only words) based on the syntax introduced in this thread. It can also decide if one regular expression represents a subset of other regular expression. So it can be used to deduce that a list of regular expressions is closed under given TM transformation rules.
The other component applies some very simple heuristic to find the best matching pattern for a given TM.

So far I’ve tested this combination only on a couple of examples. I still don’t have a feeling how easy it is to scale up because it needs to know roughly what to look for.
Surprisingly, knowing the minimal expression from above, I could make it also work on the TM #7410754 from this thread and automatically reproduce a set of expressions that describes this TM.
I will try to clean up my code so that I can share it and also test on more TMs.
I think this regular expression / CTL approach could be developed into an additional, independent decider for the project.

UncombedCoconut · September 24, 2022, 9:30pm

Here’s a version of my rewrite-system simplification script I’m not embarrassed to share.

It’s in Python (with a pypy3 shebang so I can pretend it’s fast).
Usage example below.
Search for a closed language is unimplemented. (It’s not necessarily any easier than vs. a TM itself, but in theory the search space would be exponentially smaller…)
If tremlin has a decider that makes it obsolete, great!

$ ./string_rewrite.py --help
usage: string_rewrite.py [-h] [--db DB] [seeds ...]

Try to simplify a TM as a string rewriting system.

positional arguments:
  seeds       DB seed numbers

optional arguments:
  -h, --help  show this help message and exit
  --db DB     Path to DB file
$ ./string_rewrite.py 7410754
======================================== 7410754 ========================================
TM Rewrite System: start C, halting (01D|010D), unreach (a|b|e|d|E|c|A|B), rules: 
    C1→0C
    11D→D10
    00D→11C
    110D→D011
    C00→D01
    C01→11C

UncombedCoconut · September 26, 2022, 2:00am

Added a dumb Deterministic Finite Automata / Closed Tape Language decider to the above repo. It should already be crushed by others’ tools, but it shows 84 lines of Python (~60 actually implementing the search) can handle #7410754 without introducing the above terminology. Instead: take a DFA that eats from the left edge of the tape to the head, take one that eats from the right edge to the head, and close off the set of accepted (left-state, TM-head-state, TM-head-bit, right-state) tuples.

It costs to be so simple; this bb takes ~.1s to prove non-halting with optimal parameters, drastically more without knowing them.
Obviously, running Kleene’s Algorithm to output REs would take more code.

$ time python dumb_dfa_decider.py -l 4 -r 2
======================================== 7410754 ========================================
DFAs: L=[0, 1, 2, 3, 2, 2, 0, 1], R=[0, 1, 0, 1], accept: 0:A@0:0 0:A@0:1 0:B@0:0 0:B@0:1 0:C@0:0 0:C@0:1 0:C@1:0 0:C@1:1 0:D@1:0 0:D@1:1 0:E@0:0 1:A@1:1 1:B@0:0 1:B@0:1 1:B@1:0 1:B@1:1 1:E@1:0 3:A@0:1 3:C@0:0 3:C@0:1 3:C@1:0 3:C@1:1 3:D@1:0 3:D@1:1 3:E@0:0

While dumb_dfa_decider searches for preimages of the transition functions, inverting them is just a sort operation. When preimages are easy, closing accept-states w.r.t. rewrite rules is basically the same as doing it for TM rules. Thus, this rewrite-system stuff might not improve the combinatorics. The one reason for hope is that smaller automata could handle simplified systems (the drop in this case being from 4+2 to 3+1).

tremlin · September 26, 2022, 7:54pm

This is an interesting approach. And it’s still fascinating to me that the difficulty to tackle machines differs so much although the outputs are very similar.
Is there an easy way to reconstruct a regular expression for the original tape language? I can’t see yet how the DFA-quadruples correlate to that.

UncombedCoconut · September 27, 2022, 1:29am

I got automata-lib to output REs without too much trouble; however, they look like vomit.
Pushed the update anyway.

$ python dumb_dfa_decider.py -l 4 -r 2 --re 7410754
======================================== 7410754 ========================================
((1(11)*10|0)*(1(11)*(1(a|c|C|D)|(A|b|B))|(a|b|c|C|D))1*1|(1(11)*10|0)*(1(11)*(1(c|C|D|e)|(b|B|E))|(a|b|c|C|D|e)))(01*1|0)*

I don’t really blame the library – IIUC, perfect NFA minimization is a hopeless PSPACE-complete problem, and even heuristic RE minimization is a bit of a mess. Besides, why throw stones when my code to translate my automata to its format is also ugly?

The above can be written as

(
 ( 1(11)*10 | 0 )*                                           
 ( 1(11)*(1[acCD] | [AbB]) | [abcCD] )
 1*1
 |
 ( 1(11)*10 | 0 )*
 ( 1(11)*(1 [cCDe] | [bBE]) | [abcCDe] )
)
(01*1 | 0)*

and then simplified further using observations like e.g. (01*1 | 0) = 01*, ( 1(11)*10 | 0 ) = (11)*0, etc.