Avoiding Additive Powers in Words by Jonathan Andrade a thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science (Hons.) in the Department of Mathematics & Statistics ©Jonathan Andrade 2024 We accept this thesis as conforming to the required standards: Lucas Mol Dept. of Mathematics & Statistics Thesis Supervisor Sean McGuinness Dept. of Mathematics & Statistics Kyle Schlitt Dept. of Mathematics & Statistics Dated May 15th, 2024, Kamloops, British Columbia, Canada THOMPSON RIVERS UNIVERSITY DEPARTMENT OF MATHEMATICS & STATISTICS Permission is herewith granted to Thompson Rivers University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon request of individuals or institutions. -------------------------------Signature of Author the author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the author’s written permission. the author attests that permission has been obtained for the use of any copy-righted material appearing in this thesis (other than brief excerpts requiring only proper acknowledgement in scholarly righting) and that all such use is clearly acknowledged. ii Abstract This thesis discusses an implementation of an algorithm for deciding if certain words are additive k-power-free. We use the algorithm to prove new results and to reprove several old results. In particular we construct an infinite additive 5-power-free binary rich word. We reprove that certain infinite binary words found by others are additive 4-power-free. iii Acknowledgements I would like to thank Lucas Mol for his invaluable support over the last year. I would particularly like to thank him for his vast amount of time spent trying to dissect my code, his insightful guidance, and his meticulous editing assistance throughout the thesis writing process. I would like to say thank you to Saeed Rahmati for helping me register for this honours course, organizing my defence and answering any questions I had. I would like to say thank you to Sean McGuinness and Kyle Schlitt for generously agreeing to be examiners and providing feedback for my thesis. Their feedback was invaluable, and I am grateful for their insightful comments, constructive criticism, and dedication throughout the evaluation process. I am especially grateful to my parents, who supported me emotionally and financially over the last year. Their support and belief in me were a huge driving force behind my completion of this thesis. For that I am forever grateful. iv Contents Abstract iii Acknowledgements iv 1 Introduction 1 2 Background 2.1 Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Morphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Ordinary Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Abelian Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Additive Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Rich Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Walnut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Walnut Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 5 6 7 8 9 10 3 Decision Algorithm 3.1 Theory of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Description of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 19 4 Applications 24 5 Open Problems 28 References 30 v Chapter 1 Introduction Long sequences over fixed alphabets appear in many different areas. Examples include: DNA strings, lattice paths, and binary strings in computing. In a sequence, objects are arranged one after another in successive order. As such, sequences are easy to create. Simply stacking varying coloured blocks on top of one another creates a particular sequence. A natural question then arises about what properties this sequence has; does a certain pattern ever occur? If we added another coloured block, what would change? These are the types of questions that combinatorics on words aims to answer. Combinatorics on words is the study of words and their various properties. A word over an alphabet A is a finite or infinite sequence of letters from A. In this thesis, we are interested in long words over small alphabets. We place a primary interest in pattern avoidance in infinite words, i.e, over some fixed alphabet, which patterns can be avoided by some infinite word, and which patterns must inevitably occur? Combinatorics on words as a field of research dates back to the start of the 20th century with the work of Axel Thue [30], [31]. Some simple examples of patterns are squares and cubes. A square (resp. cube) is a word of the form xx (resp. xxx), where x is a nonempty word. Words that do not contain squares (resp. cubes) are called square-free (resp. cube-free). While it is straightforward to check that there are only finitely many square-free words over the binary alphabet, Thue showed that there exists an infinite square-free word over the ternary alphabet and an infinite cube-free word over the binary alphabet [31] (see the translation of Thue’s work by Berstel [2]). Both of Thue’s constructions can be obtained by starting with a letter and repeatedly applying a special type of map called a morphism. This type of construction has continued to be important in combinatorics on words, and is fundamental to this thesis. A large portion of combinatorics on words is about proving results that are analogous to the results of Thue described above. Here, we focus on a variation of the squares and cubes studied by Thue. Over an alphabet where it makes sense to sum the letters, an additive square is a nonempty word w = w0 w1 where w0 and w1 have the same length and the same sum of letters. It is unknown whether or not there is an infinite additive square-free word over a finite subset of Z. An additive cube is a nonempty word w = w0 w1 w2 where all of w0 , w1 and w2 have the same length and the sum of the letters of w0 , w1 , and w2 are all equal. A word w is 1 2 called additive cube-free if it contains no additive cubes. Cassaigne et al. [5] showed that there exists an infinite word over the alphabet {0, 1, 3, 4} that is additive cube-free. This was done with a similar idea to Thue through the use of an iterated morphism. In general, an additive k-power is a nonempty word w = w0 w1 . . . wk−1 such that all of the wi ’s have the same length and the same sum of letters. In this thesis, we implement an algorithm first described by Currie et al. [7] which decides, under certain conditions, whether or not a word produced by iterating a morphism is additive k-power free. This algorithm follows what is known as the template method. The early version of the template method was first developed by Currie and Rampersad [10]. A much more powerful version was developed by Rao and Rosenfeld when attempting to find a construction of an infinite additive square-free words over a finite subset of Z [25]. Using the algorithm described in the previous paragraph along with the automatic theorem-proving software Walnut, we reprove several old results and prove some new results. • We reprove a result of Currie, Mol, Rampersad and Shallit [7], which says that a certain infinite binary word is additive 4-power-free. • We reprove a result of Dekking [12], which says that a certain infinite binary word is additive 4-power-free. • We construct an infinite binary word that is additive 5-power-free and rich. • We construct an infinite ternary word that is additive 4-power-free and conjectured to be rich. In Chapter 2, we cover the necessary background on combinatorics on words. Chapter 3 begins with a discussion of the theory behind the algorithm. Next, a detailed description of how the algorithm functions is given. In Chapter 4, we use the algorithm to prove the results described above. In Chapter 5, we discuss some related open problems and conjectures in combinatorics on words. Chapter 2 Background 2.1 Words An alphabet is a nonempty finite set of symbols, which we refer to as letters. Let X be an alphabet. A word over X is a finite or infinite sequence of letters from X. We let X ∗ denote the set of all finite words over the alphabet X. The length of a word w is the number of letters that make up w, denoted by |w|. We let ε denote the unique word of length 0. We call ε the empty word. For words x and y, the concatenation of x and y, denoted by xy, is the word consisting of all of the letters of x followed by all of the letters of y. For example, if x = book and y = shelf, then xy = bookshelf. Let u, w ∈ X ∗ . We say that u is a factor of w if there exist words x, z ∈ X ∗ such that w = xuz. If w can be written in this form with x = ε (resp. z = ε), then u is called a prefix (resp. suffix ) of w. If w can be written in this form with x ̸= ε and z ̸= ε, then u is called an internal factor of w. For example, the English word interrelated has late as a factor, inter as a prefix, related as a suffix and relate as an internal factor. 2.2 Morphisms For alphabets X and Y , a morphism from X ∗ to Y ∗ is a function h : X ∗ → Y ∗ that satisfies h(uv) = h(u)h(v) for all words u, v ∈ X ∗ . In particular, it follows that h(ε) = ε for every morphism h. In fact, a morphism h : X ∗ → Y ∗ can be described completely by its images on the letters of X. For example, we can define the morphisms λ : {0, 1, 2}∗ → {0, 1, 2}∗ and µ : {0, 1}∗ → {0, 1}∗ by λ(0) = 012, µ(0) = 01, λ(1) = 02, µ(1) = 10. λ(2) = 1, 3 MORPHISMS 4 Then for example, we have λ(012) = λ(01)λ(2) definition of morphism = λ(0)λ(1)λ(2) definition of morphism = 012021 definition of λ on letters. Note that the morphism λ, and the famous Thue-Morse morphism µ, were studied by Thue [30]. We will be using λ and µ to illustrate the definitions in the remainder of this chapter. Let h : X ∗ → Y ∗ be a morphism and a ∈ X. For all words x ∈ X ∗ , we define h0 (x) = x, n and h (x) = h(hn−1 (x)) for all integers n ≥ 1. We say that the morphism h is prolongable on a if h(a) = ax for some nonempty x ∈ X ∗ . For example, the morphism λ is prolongable on 0 as λ(0) = 012, but it is not prolongable on 1 as λ(1) = 02. If h is prolongable on a, then repeatedly applying the morphism h to a, we obtain the infinite fixed point of h around a: hω (a) = lim hn (a) = axh(x)h2 (x)h3 (x) · · · . n→∞ For example, we can compute the infinite fixed point of λ around 0 as follows: λ(0) = 012 λ2 (0) = 012021 λ3 (0) = 012021012102 .. . λω (0) = 012021012102012021020121 · · · . In fact, the infinite word λω (0) is square-free. This was first proven by Thue [31]. Similarly, we can compute the fixed point of µ around 0 as follows: µ(0) = 01 µ2 (0) = 0110 µ3 (0) = 01101001 .. . µω (0) = 011010011001 · · · . The word µω (0) is known as the Thue-Morse word. A morphism h : X ∗ → Y ∗ is called strictly growing if |h(a)| ≥ 2 for all a ∈ X. For instance, the morphism µ is strictly growing as we have |µ(a)| = 2 for all a ∈ {0, 1}. The morphism λ is not strictly growing because |λ(2)| = 1 ≱ 2. For an integer k ≥ 1, we say that a morphism h : X ∗ → Y ∗ is k-uniform if |h(a)| = k for all a ∈ X. Note that the morphism µ is a 2-uniform morphism as |µ(0)| = |µ(1)| = 2. Meanwhile, the morphism λ is non-uniform, i.e., it is not k-uniform for any integer k. ORDINARY POWERS 2.3 5 Ordinary Powers The work done in this thesis revolves mainly around the idea of what we call powers in words. A square is a word of the form xx, where x is a nonempty word. An example of a square in the English language is murmur with x = mur. We call a word w square-free if w contains no squares as factors. For example, the word 1021 is square-free, while the words 00123012301 and 01210121012 are not square-free. A backtracking algorithm shows that the longest square-free word over 2 letters is of length 3. Over alphabets of size at least 3, it is possible to construct arbitrarily long square-free words [30]. A cube is a nonempty word of the form xxx. An example of a cube in the English Language is the word shshsh (an admonition to be quiet) [29]. More generally, for every integer k ≥ 2, a k-power is a nonempty word of the form xk = xx · · · x where x is a word concatenated with itself k times. Note that a 2-power is a square, and a 3-power is a cube. There is no reason to restrict ourselves to integer powers of words. Dejean [11] was the first to generalize to fractional powers. Let p ∈ Z such that 1 ≤ p ≤ n, and let w = w1 w2 · · · wn , where the wi ’s are letters. We say that p is a period of w if wi = wi+p for i = 1, ..., n − p. For example, the word alfalfa has periods 3, 6, and 7. The exponent of w is its length divided by its minimum period. For example, alfalfa has length 7 and minimum period 3, so it has exponent 7/3. A word of exponent α is also called an α-power. Words that do not contain a factor with exponent greater than or equal to α are known as α-power-free. For example, alfalfa is 3-power-free as it contains no cube, but it is not 2-power-free, because it contains the square alfalf. For a finite or infinite word w, we define the critical exponent of w as sup{α ∈ Q | there is a factor of w with exponent α}. For example, the Thue-Morse sequence has critical exponent 2, because it contains factors of exponent 2, but no factors of exponent greater than 2 [31]. Let α ≥ 1. A word that has exponent greater than α is known as an α+ -power. A word w is α+ -power-free if it contains no α+ -power as a factor. For example, the Thue-Morse word is 2+ -power-free, but not 2-power+ free. The word sausage contains a 35 -power, but is 35 -power-free. The repetition threshold for k letters, denoted RT(k), is defined as: inf{α ∈ R | there exists an infinite α-power free word over k letters}. Dejean’s theorem describes the repetition threshold for k letters for all k ≥ 2:   2, if k = 2;    7, if k = 3; RT(k) = 47 5, if k = 4;     k , if k ≥ 5. k−1 Note that we have RT(2) = 2 since every binary word of length greater than 3 contains a square, but the Thue-Morse word has critical exponent 2, i,.e., it is 2+ -power-free. Similarly, ABELIAN POWERS 6 we have RT(3) = 47 since every sufficiently long ternary word contains a 47 -power, but the word + δ ω (0) is 74 -power-free, where δ : {0, 1, 2}∗ → {0, 1, 2}∗ is defined by δ(0) = 0120212012102120210, δ(1) = 1201020120210201021, δ(2) = 2012101201021012102. The morphism δ was found by Dejean [11], who proved that δ ω (0) has critical exponent 7/4. The proof of Dejean’s theorem for larger alphabets relies on more complicated constructions, and took the work of many different authors [8], [9], [20], [23]. Carpi [4] made a major contribution in proving Dejean’s theorem by providing a proof for k ≥ 33. 2.4 Abelian Powers We will now explore another type of power known as an abelian power. Two words x and y are called abelian equivalent if they are anagrams of each other. For example, the English words dog and god are abelian equivalent, as are the words add and dad. An abelian square is a nonempty word w = xy such that x and y are abelian equivalent. Some examples of abelian squares in the English language include mesosome, reappear, and intestines [29]. An abelian cube is a nonempty word w = xyz such that x, y, and z are abelian equivalent. An example in the English language is deeded [29]. In general, for an integer k ≥ 2, an abelian k-power is a nonempty word of the form w = w0 w1 . . . wk−1 where wi and wj are abelian equivalent for all i, j ∈ {0, 1, . . . , k − 1}. A word w is abelian k-power-free if it contains no abelian k-powers as factors. For example, the word w = 010 is abelian square-free and the word w = 0012102 is abelian cube-free. Meanwhile, the words w = 011200 and w = 21302121 are not abelian 2-power-free. Dekking [12] showed the existence of an infinite binary word that is abelian 4-powerfree. In particular, Dekking showed that the infinite fixed point around 0 of the morphism h : {0, 1}∗ → {0, 1}∗ defined by h(0) = 011, h(1) = 0001, is abelian 4-power free. This result is best possible in the sense that there are only finitely many abelian cube-free binary words; a backtracking algorithm shows that the longest abelian cubefree binary word has length 9. Dekking also constructed an infinite abelian cube-free ternary word. He showed that the infinite fixed point around 0 of the morphism h : {0, 1, 2}∗ → {0, 1, 2}∗ defined by h(0) = 0012, h(1) = 112, h(2) = 022, ADDITIVE POWERS 7 is abelian cube-free. Note that this result is best possible in the sense that there are only finitely many abelian square-free ternary words; a backtracking search shows that the longest abelian square-free ternary word has length 537. Erdős [14] asked if there exists an infinite abelian square-free word over some finite alphabet. Evdokimov [15] answered this question in the positive by constructing an infinite abelian square-free word over an alphabet of size 25. The size of the alphabet was improved upon by Pleasants [22], who was able to reduce it down to size 5. This was even further reduced down to an alphabet of size 4 by Keränen [17]. The result by Keränen is the best possible since backtracking shows that over three letters, there is no abelian square-free word of length greater than 7. Keränen used the 85-uniform morphism g : {0, 1, 2, 3}∗ → {0, 1, 2, 3}∗ defined by: g(0) = 0120232123203231301020103101213121021232021013010203212320231210212320232132303132120, g(1) = 1231303230310302012131210212320232132303132120121310323031302321323031303203010203231, g(2) = 2302010301021013123202321323031303203010203231232021030102013032030102010310121310302, g(3) = 3013121012132120230313032030102010310121310302303132101213120103101213121021232021013. 2.5 Additive Powers We now introduce one more variation on powers in words, namely additive powers. Note that we restrict to words over alphabets where there is some notion of sum, so that we can sum all of the letters in a word. The sum of w, denoted by S(w), is the sum of all the letters that make up w. An additive square is a nonempty word w = w0 w1 , where both w0 and w1 have the same length and the same sum, i.e., |w0 | = |w1 | and S(w0 ) = S(w1 ). A word w is said to be additive square-free if it contains no additive squares as factors. For example, the word w = 0120 is additive square-free, and the word w = 013121 contains an additive square. Additive square-free words were studied by Rao and Rosenfeld [25], who were able to show that there exists an infinite additive square-free word over a finite subset of Z2 . It is still unknown whether or not there exists such a word over some finite subset of Z. An additive cube is a nonempty word w = w0 w1 w2 where all of w0 , w1 , and w2 , have the same length and the same sum. A word w is called additive cube-free if it contains no additive cubes as factors. For example, the word w = 020312 is additive cube-free, but it contains the additive square u = 0312. Cassaigne et al. [5] showed that there exists an infinite word over the alphabet {0, 1, 3, 4} that is additive cube-free. Lietard and Rosenfeld [18] showed that there exists an infinite additive cube-free word over all subsets of Z of size 4 except possibly subsets equivalent to {0, 1, 2, 3}, i.e., any arithmetic progression of length 4. Rao [24] was able to show that there exist infinite additive cube-free words over several integer alphabets of size 3 including {0, 1, 5}, {0, 3, 7}, and {0, 2, 9}. Note that it is still unknown if such words exist over the alphabets {0, 1, 2}, {0, 1, 3}, {0, 1, 4}, and {0, 2, 5}. In general, an additive k-power is a nonempty word w = w0 w1 . . . wk−1 such that all of the wi ’s have the same length and the same sum. Recall from Section 2.4 that Dekking RICH WORDS 8 produced an infinite abelian 4-power-free binary word. Over the binary alphabet {0, 1}, it is straightforward to show that a word w is an abelian k-power if and only if it is an additive k-power. Thus, we can extend Dekking’s earlier conclusion and see that there exists an infinite additive 4-power-free word over the binary alphabet. A backtracking search shows that this result is best possible as over the binary alphabet, there is no additive 3-power-free word of length greater than 9. 2.6 Rich Words A palindrome is a word w that reads the same backward as forward. Some examples in the English language include racecar, level, and kayak. It is known that every word of length n contains at most n + 1 palindromic factors, including the empty word [13]. This fact led to the study of words that are “rich” in palindromes. We say that a finite word w is rich if it contains |w| + 1 distinct palindromic factors. In other words, a word w is rich if it contains the maximum number of palindromic factors among all words of length |w|. An example of a rich word is 1001 since it has 5 palindromic factors, namely ε, 0, 1, 00 and 1001. Note that 012120 is not rich as it has only 6 palindromic factors: ε, 0, 1, 2, 121, 212. An infinite word w is rich if all of its finite factors are rich. For example, the Fibonacci word F = 01001010 . . . where F is the infinite fixed point of the morphism ϕ : {0, 1}∗ → {0, 1}∗ defined by ϕ(0) = 01 and ϕ(1) = 0 is known to be an infinite rich word [16]. The Thue-Morse word µω (0) = 0110100110010110 · · · is an example of an infinite non-rich word. For example, the factor 11010011 of µω (0) is non-rich as it has length 8 and 8 palindromic factors: ε, 0, 1, 00, 11, 101, 010, and 1001. A factor u of a word w is said to be unioccurrent in w if u occurs exactly once in w. For example, in the word 01, the factors 0, 1 and 01 are all unioccurrent. In the word 011001, the factor 01 is not unioccurrent, but 001 is. The following theorem gives a useful characterization of finite rich words in terms of unioccurrent factors. Theorem 2.6.1 (Glen et al. [16]). A finite word w is rich if and only if every prefix of w has a unioccurrent palindromic suffix. The result of Theorem 2.6.1 can be extended to a well-known result about infinite rich words. We provide a proof below. Theorem 2.6.2. An infinite word w is rich if and only if every finite prefix of w has a unioccurrent palindromic suffix. Proof. ⇒ Suppose the infinite word w is rich. Then by definition, all of its finite factors are rich. This implies by Theorem 2.6.1 that every one of the finite factors of w has a unioccurrent palindromic suffix. Thus, every finite prefix of w has a unioccurrent palindromic suffix. ⇐ Suppose that every finite prefix of the infinite word w has a unioccurrent palindromic suffix. Let x be a finite factor of w. Then there is some finite prefix p of w that contains 9 WALNUT x. Since every finite prefix of w has a unioccurrent palindromic suffix, we see by Theorem 2.6.1 that p must be rich. By [16, Corollary 2.5], since p is rich, every one of its factors must also be rich. Since x is contained in p, this implies that x is rich. Therefore, we conclude that w is rich. After being introduced implicitly by Droubay, Justin, and Pirillo [13], rich words have been well-studied, sometimes under the name full words (see [3]). We will now briefly summarize the known results about repetitions in rich words. Pelantová and Starosta [21] proved that every infinite rich word, over any finite alphabet, contains a square. Shallit and Baranwal [1] √ proved that the infinite binary word g(f ω (0)) is rich and has critical exponent 2 + 2/2, where f : {0, 1, 2}∗ → {0, 1, 2}∗ and g : {0, 1, 2}∗ → {0, 1}∗ are defined by f (0) = 01, g(0) = 0, f (1) = 02, g(1) = 01, f (2) = 022, g(2) = 011. Further, they conjectured that this word has the least critical exponent among √ all infinite binary rich words, i.e., that the repetition threshold for binary rich words is 2 + 2/2. Currie, Mol and Rampersad [6] proved this conjecture of Shallit and Baranawal by giving a structure theorem for infinite binary rich words that are 14 5 -power-free. In Section 4, we will show that there is an infinite additive 5-power-free binary rich word, and that this is best possible in the sense that there are only finitely many additive 4-power-free binary rich words. One can think of this result as an additive-power analogue of the result of Currie, Mol, and Rampersad. 2.7 Walnut Walnut is a free Java software that was originally developed by Hamoon Mousavi [19] and has been added to by Aseem Raj Baranwal, Laindon C. Burnett, Kai Hsiang Yang, and Anatoly Zavyalov. To download Walnut for free, visit https://cs.uwaterloo.ca/~shallit/walnut.html Walnut is an implementation of the algorithm described in chapter 6 of the Walnut book [28, Theorem 6.4.1]. Roughly speaking, this theorem says that there is an algorithm which can determine the truth value of all statements about automatic sequences that can be written in a certain first-order logic. The internal representation of natural numbers in Walnut by default is base-2 representation with the most significant digit coming first. Walnut is quite flexible and supports k-automatic sequences for integers k ≥ 2, Fibonacci and Tribonacciautomatic sequences, Pell-automatic sequences, and Ostrowski-automatic sequences based on arbitrary quadratic irrationals. 10 WALNUT 2.7.1 Walnut Syntax Walnut provides a user interface that accepts input in the form of first-order logical formulas. An automaton is created to represent these various inputs. Automata are one of the simplest forms of a computer. An automaton starts in an initial state and after reading the input provided, transitions to one of a finite number of states. An automaton takes as input strings and based on the symbols in the strings, moves from state to state. Each Walnut automaton accepts tuples of integers in a user specified numeration system. These automata are stored in .txt files with easy to read and understand syntax. All automata created by Walnut are deterministic and minimal. One can express first-order logical formulas in Walnut with a relatively simple syntax. First of all, words made up of capital letters (apart from the reserved letters A and E) are used to refer to sequences stored in the directory Word Automata Library. For example, some hard-coded sequences, namely the Thue-Morse and Fibonnaci sequences, have respective symbols T and F. Indexing sequences in Walnut is fairly easy. Note that all variables in Walnut have domain N = {0, 1, 2, . . .} and thus do not accept negative values. For i ∈ N, we write W[i] for the symbol at position i in W. For example, since T = 01101001 · · · , the input T[0] would return the symbol 0, and T[2] would give the symbol 1. Algebraic expressions may also be used to index sequences. For example, to access the symbol at position x+y in W we write W[x+y]. Note that Walnut supports addition, subtraction, multiplication by a natural number constant, and integer division, where x/y represents the floor of the quotient of x and y. The letter E is the Walnut symbol for the existential quantifier ∃. The letter A is the Walnut symbol for the universal quantifier ∀. The symbol => represents logical implication, & represents AND, | represents OR, and <=> represents the biconditional (if and only if). There is an extra command that is used to designate the numeration system for the automata involved in evaluating the command. The command ?msd n preceding a first-order logical formula specifies that it should be evaluated in base-n, where n ≥ 2 is some integer. Example 2.7.1. We will use Walnut to show that the Thue-Morse word T = 0110100110 · · · is cube-free and 2+ -power-free. In order to do this, we write a first-order logical formula that asserts the existence of a cube and 2+ -power in Thue-Morse. We give the construction for the cube. Recall that a cube is a word w of the form xxx for some nonempty word x (see Section 2.1). We define T [a . . . b] to represent the word starting at position a and ending at position b inclusive. The idea is that if the word T contains xxx starting at position i, we must have |x| = n for some n ≥ 1 (see Figure 2.1). So there is a cube in T if and only if there are natural numbers i and n with n ≥ 1 such that T[i . . . i + n − 1] = T[i + n . . . i + 2n − 1] = T[i + 2n . . . i + 3n − 1] 11 WALNUT T[0 . . . i − 1] x x x T[i . . . i + n − 1] T[i + n . . . i + 2n − 1] T[i + 2n . . . i + 3n − 1] T[i + 3n . . .] T Figure 2.1: An instance of a cube occurring in T. or equivalently, ∃i, n, (n ≥ 1) ∧ ∀s(0 ≤ s < 2n) =⇒ T[i + s] = T[i + n + s]. We can translate this into Walnut as: Ei, n(n>=1) & As (s<2*n) => T[i+s] = T[i+n+s]: Now that we have the Walnut syntax for checking T for cubes, all we need to do is simply evaluate it using Walnut. To do so we use the eval command followed by the statement name, and we include ?msd 2 immediately before the formula to indicate that we are using the base-2 numeration system with most significant digit first: eval containscubes "?msd_2 Ei, n(n>=1) & As (s<2*n) => T[i+s] = T[i+n+s]": Walnut returns FALSE, thus proving that the Thue-Morse word is cube-free. We can prove that the Thue-Morse word is 2+ -power-free using a similar command: eval containstwoplus "?msd_2 Ei, n(n>=1) & As (s<=n) => T[i+s] = T[i+n+s]": Walnut returns FALSE, thus proving that the Thue-Morse word is 2+ -power-free. It is possible to add your own automatic sequences to Walnut. For example, consider the morphism h : {0, 1}∗ → {0, 1}∗ defined by h(0) = 001 and h(1) = 100. To define h in Walnut, we use the following command: h "0->001 1->100": If the morphism is uniform, it can be “promoted” to an automaton through the use of the promote command. For example, using our morphism h as defined above, we can write promote H h: which creates an automaton H that generates hω (0). For words that are of the form g(hω (0)), we use the image command. For example, if g and h are uniform morphisms, then we can write: WALNUT 12 promote H h: # promotes h to an automaton H that generates the fixed point of h around 0 image GH g H: # creates the automaton GH which generates the word g(h^{w}(0)) The resulting automaton GH is defined over the same numeration system as the original automaton H. Chapter 3 Decision Algorithm In this chapter we discuss a decision algorithm that checks if certain types of morphic sequences are additive k-power-free. Currie et al. [7] devised this method, and described it for additive k-powers. They demonstrated that the words f ω (0) and g(f ω (0)) are additive 4-power-free, where f : {0, 1, 2}∗ → {0, 1, 2}∗ is the morphism defined by f (0) = 001, f (1) = 012, f (2) = 212, and g : {0, 1, 2}∗ → {0, 1}∗ is the morphism defined by g(0) = 0001001110010001100011, g(1) = 0001001110011101100011, g(2) = 0111001110011101100011. The algorithm is based on the template method. An early version of this method was described by Currie and Rampersad [10]. A more powerful version was described by Rao and Rosenfeld [25]. This chapter is an overview of the decision algorithm used in this thesis. In Section 3.1, we describe the theory behind the algorithm. In Section 3.2, we describe the algorithm in detail, and explain why it works. 3.1 Theory of the Algorithm The results and proofs given in this section were given by Currie et al. [7] for k = 4, and have been generalized to all k ≥ 2 below. We first introduce additive k-templates, which describe structures that are not “too far” from an additive k-power. Given an integer alphabet X, an additive k-template (for additive k-powers) is a 2k-tuple t = [a0 , a1 , . . . , ak , d0 , d1 , . . . , dk−2 ] 13 (3.1) THEORY OF THE ALGORITHM 14 where ai ∈ X ∗ , |ai | ≤ 1, and each of the di are in Z2 . For example, an additive 3-template is a 6-tuple t = [a0 , a1 , a2 , a3 , d0 , d1 ]. Recall that for a finite word w over an integer alphabet, the length of w is denoted |w|, and the sum of the letters of w is denoted S(w). We define σ(w) to be the vector [|w|, S(w)]T . For example, if w = 01210, then σ(w) = [5, 4]T . A word x is an instance of t if there exist x0 , x1 , . . . , xk−1 ∈ X ∗ such that x = a0 x0 a1 x1 · · · xk−1 ak and for each i, we have di = σ(xi+1 ) − σ(xi ). Observe that an additive k-power is an instance of the additive k-template t0 = [ε, ε, . . . , ε, [0, 0]T , [0, 0]T , . . . , [0, 0]T ]. The word v = 102122 is an instance of the template [1, ε, 2, [0, 1]T ] where x0 = 02, and x1 = 12. The word w = 1110202011 is an instance of the template [1, ε, ε, ε, 1, [0, 0]T , [0, 0]T , [0, −1]T ] where x0 = 11, x1 = 02, x2 = 02 and x3 = 01. Note that the word v is not “too far” from an additive 2-power and w is not “too far” from an additive 4-power. Let h : X ∗ → Y ∗ be a morphism. For A ∈ X ∗ with |A| ≤ 1, an h-split of A is a 3-tuple [p, a, s], where h(A) = pas, a ∈ Y ∗ , and |a| ≤ 1. One can think of an h-split of A as a way in which the word h(A) can be factored into three pieces, with the empty word or a single letter in the center. For example, let β : {0, 1, 2}∗ → {0, 1, 2}∗ be the morphism defined by β(0) = 001, β(1) = 012, β(2) = 212, and γ : {0, 1, 2}∗ → {0, 1}∗ be the morphism defined by γ(0) = 00001, γ(1) = 000111, γ(2) = 0111011. Then the β-splits of 0, 1, 2, and ε respectively are: • [ε, 0, 01], [0, 0, 1], [00, 1, ε], [ε, ε, 001], [0, ε, 01], [00, ε, 1], [001, ε, ε], • [ε, 0, 12], [0, 1, 2], [01, 2, ε], [ε, ε, 012], [0, ε, 12], [01, ε, 2], [012, ε, ε], • [ε, 2, 12], [2, 1, 2], [21, 2, ε], [ε, ε, 212], [2, ε, 12], [21, ε, 2], [212, ε, ε], • [ε, ε, ε], . The γ-splits of 0, 1, 2, and ε respectively are: • [ε, 0, 0001], [0, 0, 001], [00, 0, 01], [000, 0, 1], [0000, 1, ε], [ε, ε, 00001], [0, ε, 0001], [00, ε, 001], [000, ε, 01], [0000, ε, 1], [00001, ε, ε], THEORY OF THE ALGORITHM 15 • [ε, 0, 00111], [0, 0, 0111], [00, 0, 111], [000, 1, 11], [0001, 1, 1], [00011, 1, ε], [ε, ε, 000111], [0, ε, 00111], [00, ε, 0111], [000, ε, 111], [0001, ε, 11], [00011, ε, 1], [000111, ε, ε], • [ε, 0, 111011], [0, 1, 11011], [01, 1, 1011], [011, 1, 011], [0111, 0, 11][01110, 1, 1], [011101, 1, ε], [ε, ε, 0111011], [0, ε, 111011], [01, ε, 11011], [011, ε, 1011], [0111, ε, 011], [01110, ε, 11], [011101, ε, 1], [0111011, ε, ε], • [ε, ε, ε], A morphism h : X ∗ → Y ∗ where X, Y ⊆ Z is called an affine morphism if there exist a, b, c, d ∈ Z such that for all x ∈ X, we have |h(x)| = a + bx and S(h(x)) = c + dx. For an affine morphism h, we define the matrix   a b Mh = . c d For example, let β be the morphism defined previously. Then for all x ∈ {0, 1, 2}, we have |β(x)| = 3 + 0x and S(β(x)) = 1 + 2x, so β is affine, and we have   3 0 . Mβ = 1 2 Similarly for the morphism γ, we have |γ(x)| = 5 + x and S(γ(x)) = 1 + 2x for all x ∈ {0, 1}. Therefore, γ is affine, and we have   5 1 . Mγ = 1 2 Lemma 3.1.1. Let X and Y be finite subsets of Z and h : X ∗ → Y ∗ be an affine morphism with   a b . Mh = c d Then for any word w ∈ X ∗ we have σ(h(w)) = Mh σ(w). Thus, if Mh−1 exists, then σ(w) = Mh−1 σ(h(w)). Proof. Let w be a word in X ∗ . We proceed by induction on |w|. If |w| = 0, then w is empty, so we have σ(h(w)) = σ(h(ε)) = σ(ε) = [0, 0]T and Mh σ(w) = Mh [0, 0]T = [0, 0]T . Therefore the result holds when |w| = 0. Now suppose that |w| > 0, and that the statement holds for all words of length |w| − 1. Then we write w = w′ x, where x is the final letter in w, and we have Mh σ(w) = Mh (σ(w′ ) + σ(x)) ′ = Mh σ(w ) + Mh σ(x) ′ = σ(h(w )) + Mh σ(x)    a b 1 ′ = σ(h(w )) + c d x   a + bx ′ = σ(h(w )) + c + dx property of σ distributive property inductive hypothesis definition of Mh and σ matrix multiplication = σ(h(w′ )) + σ(h(x)) definition of affine = σ(h(w)) property of σ THEORY OF THE ALGORITHM 16 difference D0 A0 X0 A1 X1 A2 h h(A0 ) h(X0 ) h(A1 ) h(X1 ) h(A2 ) a0 x0 a1 x1 a2 difference d0 Figure 3.1: An instance of a template t = [a0 , a1 , a2 , d0 ] arising from an instance of its h-parent T = [A0 , A1 , A2 , D0 ]. Therefore, the result follows by induction. Let t = [a0 , a1 , . . . , ak , d0 , d1 , . . . , dk−2 ], and T = [A0 , A1 , . . . , Ak , D0 , D1 , . . . , Dk−2 ], be templates. We call T an h-parent of t if the Ai have h-splits [pi , ai , si ] such that di = Mh Di + bi , where bi = σ(si+1 pi+2 ) − σ(si pi+1 ). Roughly speaking, the next lemma says that an instance of t arises when we apply h to an instance of T (see Figure 3.1). Lemma 3.1.2. Let h : X ∗ → Y ∗ be an affine morphism and W ∈ X ∗ . Let t = [a0 , a1 , . . . , ak , d0 , d1 , . . . , dk−2 ] and T = [A0 , A1 , . . . , Ak , D0 , D1 , . . . , Dk−2 ] be additive k-templates such that T is an h-parent of t. If W contains an instance of T , then h(W ) contains an instance of t. Proof. Suppose that W contains an instance V = A0 X0 A1 X1 · · · Xk−1 Ak of T . Since T is an hparent of t, there are h-splits [p0 , a0 , s0 ], [p1 , a1 , s1 ], . . . , [pk , ak , sk ] of A0 , A1 , . . . , Ak respectively such that for all i ∈ {0, 1, . . . , k−2}, we have di = Mh Di +bi , where bi = σ(si+1 pi+2 )−σ(si pi+1 ). THEORY OF THE ALGORITHM 17 Thus w = h(W ) contains the factor v = a0 x0 a1 x1 · · · xk−1 ak , where xi = si h(Xi )pi+1 . We have σ(xi+1 ) − σ(xi ) = σ(si+1 h(Xi+1 )pi+2 ) − σ(si h(Xi )pi+1 ) = σ(h(Xi+1 )) + σ(si+1 pi+2 ) − σ(h(Xi )) − σ(si pi+1 ) substituting in for xi and xi+1 property of σ = Mh σ(Xi+1 ) + σ(si+1 pi+2 ) − Mh σ(Xi ) − σ(si pi+1 ) h is affine so σ(h(Xi )) = Mh σ(Xi ) = Mh (σ(Xi+1 ) − σ(Xi )) + σ(si+1 pi+2 ) − σ(si pi+1 ) factor out Mh = Mh Di + bi V is an instance of T , definition of bi = di definition of parent Hence, the word v is an instance of t. For a morphism h : X ∗ → Y ∗ , we define Wh = max |h(a)|. a∈X So for example if γ and β are the morphisms defined previously, then we have Wγ = 7 and Wβ = 3. Let h : X ∗ → Y ∗ be a morphism and t = [a0 , a1 , . . . , ak , d0 , d1 , . . . , dk−2 ] h iT (1) be an additive k-template. Then we write di = d(0) . Furthermore, we define the values , d i i (0) ∆(t) = max{|di | : i = 0, . . . , k − 2} and (k − 1)k ∆(t). 2 For example, if we look at the morphism γ and the additive 4-template Bh (t) = k + 2 + k(Wh − 2) + t = [ε, ε, ε, ε, ε, [1, 0]T , [2, 1]T , [0, 1]T ], then d0 = [1, 0]T , d1 = [2, 1]T , and d2 = [0, 1]T . Hence ∆(t) = max{1, 2, 0} = 2. Therefore, we have (4 − 1) · 4 · 2 = 38. 2 In an additive k-power, the k “blocks” must have the same length and sum. Here ∆(t) is a bound on the difference in length between successive “blocks” of an instance of t, and the quantity Bh (t) bounds the length of instances of t that will be relevant to our algorithm. Roughly speaking, the next lemma says that if h satisfies certain conditions, then every sufficiently long instance of a template in a word of the form h(W ) must have come from a shorter instance of a parent template in W . Bγ (t) = 4 + 2 + 4 · (7 − 2) + THEORY OF THE ALGORITHM 18 Lemma 3.1.3. Let h : X ∗ → Y ∗ be a strictly growing affine morphism such that Mh is invertible. Let W ∈ X ∗ and t = [a0 , a1 , . . . , ak , d0 , d1 , . . . , dk−2 ] be an additive k-template. If h(W ) contains an instance v of t where |v| ≥ Bh (t), then there is an h-parent T of t such that W contains an instance V of T and |V | < |v|. Proof. Let v = a0 x0 a1 x1 . . . ak−1 xk−1 ak be a factor of the word w = h(W ) and be an instance of t = [a0 , a1 , . . . , ak , d0 , d1 , . . . , dk−2 ]. Suppose that |xi | ≥ Wh − 1 for all 0 ≤ i ≤ k − 1. Then xi is not an internal factor of the h-image of any letter in X. Therefore, we can write each xi in the form xi = si h(Xi )pi+1 for some Xi ∈ X ∗ , where si is a suffix of h(Ai ), and pi+1 is a prefix of h(Ai+1 ) for some words A0 , A1 , . . . , Ak ∈ X ∗ with |Ai | ≤ 1. Therefore, the analysis in the proof of Lemma 3.1.2 may be reversed. This means that W contains an instance V = A0 X0 A1 X1 · · · Ak−1 Xk Ak of the template T = [A0 , A1 , . . . , Ak , D0 , D1 , . . . , Dk−2 ] where the Ai have h-splits [pi , ai , si ] for all 0 ≤ i ≤ k and Di = Mh−1 (di − bi ) for all 0 ≤ i ≤ k − 2. Note that since h is strictly growing, the instance V of T satisfies |V | < |v|. It now suffices to show that if |v| ≥ Bh (t), then for every i we have |xi | ≥ Wh − 1. We proceed by contradiction. Suppose that for some i we have |xi | ≤ Wh − 2. Let ζ0 , ζ1 , . . . , ζk−1 be the lengths of the xi ’s arranged in non-decreasing order, and so that ζ0 ≤ ζ1 ≤ · · · ≤ ζk−1 . Since |xi | ≤ Wh − 2, it follows that ζ0 ≤ Wh − 2. Using the definition of ∆(t), we see that ζi ≤ ζ0 + i∆(t) for 0 ≤ i ≤ k − 1. Hence, k−1 X i=0 |xi | = k−1 X ζi ≤ i=0 k−1 X (ζ0 + i∆(t)) = i=0 k−1 X ζ0 + i=0 k−1 X i∆(t) i=0 k−1 X = kζ0 + ∆(t) i i=0 k(k − 1) ∆(t) 2 k(k − 1) ≤ k(Wh − 2) + ∆(t). 2 = kζ0 + Therefore, this implies that |v| ≤ (k + 1) + k−1 X |xi | ≤ (k + 1) + k(Wh − 2) + i=0 which contradicts our assumption that |v| ≥ Bh (t). k(k − 1) ∆(t) < Bh (t) 2 DESCRIPTION OF THE ALGORITHM 19 Theorem 3.1.4. Let h : X ∗ → X ∗ be an affine morphism such that every eigenvalue λ of Mh has modulus |λ| > 1 and let t be an additive k-template. Then the set of h-ancestors of t is finite. Proof. Since the eigenvalues of Mh have modulus greater than 1, it follows that Mh is invertible (otherwise λ = 0 is an eigenvalue), and the eigenvalues of Mh−1 are smaller than 1 in absolute value. Let λ1 and λ2 be the eigenvalues of Mh−1 and let Mh−1 = P −1 JP , where J is the Jordan form of Mh−1 . Then we have     λ 0 λ 1 J= 1 when λ1 ̸= λ2 , or J = 1 when λ1 = λ2 . 0 λ2 0 λ1 Hence, Mh−n = P −1 and so X   n λ1 0 P 0 λn2 or Mh−n = P −1   n λ1 nλ1n−1 P, 0 λn1 Mh−i converges. Let i≥0 t = [a0 , a1 , . . . , ak , d0 , . . . , dk−2 ] be an additive k-template and T = [A0 , A1 , . . . , Ak , D0 , . . . , Dk−2 ] be an h-ancestor of t obtained by applying the h-parent relation ℓ times. When ℓ = 1, then we have Di = Mh−1 (di − bi ), where bi can be a finite number of values as there are only finitely many choices for the h-splits. For ℓ ≥ 1, we find by iteration that Di has the form −(ℓ−1) Di = cℓ Mh−ℓ + cℓ−1 Mh + · · · + c1 Mh−1 + c0 , where all of the cj are chosen from a finite set of vectors. We know that X Mh−i converges i≥0 and that Di ∈ Z2 , so there are only finitely many possible vectors Di . Thus, we conclude that there are only finitely many h-ancestors T of t. 3.2 Description of the Algorithm In this section, we use the theory from the previous section to prove the following theorem, which describes an algorithm to decide if the words h(g ω (a)) and g ω (a) are additive k-power-free. This algorithm was described briefly by Currie et al. [7], but we provide a more detailed description in this section. Theorem 3.2.1. Let g : X ∗ → X ∗ and h : X ∗ → Y ∗ be affine morphisms over the integer alphabets X and Y such that g is prolongable on a. If DESCRIPTION OF THE ALGORITHM 20 • h and g are strictly growing, • Mg and Mh are invertible, and • the eigenvalues λg of Mg satisfy |λg | > 1. then it is possible to decide whether or not g ω (a) (and h(g ω (a))) are additive k-power-free. The algorithm of Theorem 3.2.1 relies on two main sub-algorithms, which are described separately in the lemmas below. Lemma 3.2.2 (Finding parents). Let f : X ∗ → Y ∗ be an affine morphism such that Mf is invertible, and let t be an additive k-template. Then the set of all f -parents of t is effectively computable. Proof. Suppose that t = [a0 , a1 , . . . , ak , d0 , d1 , . . . , dk−2 ], and T = [A0 , A1 , . . . , Ak , D0 , D1 , . . . , Dk−2 ] is an f -parent of t. By the definition of f -parent, the Ai have f -splits [pi , ai , si ] such that di = Mh Di + bi , where bi = σ(si+1 pi+2 ) − σ(si pi+1 ). To enumerate all f -parents of t, we first note that there are finitely many choices for the Ai ’s since X is finite. Once the Ai ’s are chosen, there are only finitely many f -splits of Ai (with ai at the centre). We find all such f -splits. Once the f -splits are chosen, the Di ’s are completely determined, since we can calculate them as follows: Di = Mh−1 (di − bi ). If all Di have integer entries, then we have a valid h-parent template T = [A0 , A1 , . . . , Ak , D0 , D1 , . . . , Dk−2 ]. Lemma 3.2.3 (Enumerating Factors). Let g be a strictly growing morphism prolongable on a, and n be a non-negative integer. Then the set of all factors of g ω (a) of length n is effectively computable. Proof. Let w = g ω (a) and F denote the set of factors of length n of w. We know that every finite factor of w arises by applying g to a a finite number of times. Further, since g is strictly growing, every factor u of length n in w must be contained in g(v) for some other factor v of g ω (a) of length n. We start with a and apply g repeatedly until we obtain a prefix g k (a) of g ω (a) of length at least n. We add all factors of length n of g k (a) to F . We go through each element u in F and again apply the morphism g to u. Then, we look at these words g(u) and add any new factors of length n in g(u) to F . We repeat this process, each time checking for any new factors DESCRIPTION OF THE ALGORITHM 21 of length n and adding them to F . When no new factors are available to add to F (i.e., they are already present in F ), the process terminates. Note that there are only a finite number of words of length n over a finite alphabet, therefore the process will terminate. We claim that F contains all factors of g ω (a) of length n. Suppose otherwise that there is some factor of g ω (a) of length n that is not contained in F and let w be such a factor that appears earliest in g ω (a), i.e., starting at the smallest index. Let ℓ be the smallest integer such that w appears in g ℓ (a). Note that ℓ > k since all factors of length n of g k (a) were already added to F in the first step. Then w appears in the g-image of some factor v of length n which appears in g ℓ−1 (a). Note that if v were in F , then w would also be in F . Therefore, v is not in F . This however, contradicts the assumption that w is a factor of length n of g ω (a) that is not in F and appears earliest in g ω (a). Therefore, the list F now contains all factors of the word g ω (a) of length n. Proof of Theorem 3.2.1. We begin by proving that it is possible to decide whether g ω (a) is additive k-power-free. We will do this by showing that if g ω (a) contains an additive k-power, then we can find it after a finite amount of computation. So suppose that g ω (a) contains an additive k-power v. We know by Definition 3.1 that v is an instance of the template t0 = [ε, ε, . . . , ε, [0, 0]T , [0, 0]T , . . . , [0, 0]T ]. Initial Check: and We begin by applying Lemma 3.1.3 with g and t = t0 . We have ∆(t0 ) = 0 k · (k − 1)∆(t0 ) = k + 2 + k · (Wg − 2). 2 Hence if |v| ≥ Bg (t0 ), then there is a g-parent T of t0 such that g ω (a) contains an instance V of T and |V | < |v|. We deal with this case in the Calculating Ancestors step. On the other hand, if |v| < Bg (t0 ), then v can be found by enumerating all factors of g ω (a) of length less than Bg (t0 ), as described in the proof of Lemma 3.2.3. If g ω (a) contains no additive k-power of length less than Bg (t0 ), then we proceed onto calculating ancestors. Otherwise, the algorithm terminates with g ω (a) containing an additive k-power. Bg (t0 ) = k + 2 + k · (Wg − 2) + Calculating Ancestors: We begin by building a list A of all possible g-parents of t0 as described in the proof of Lemma 3.2.2. This is the end of the first generation. Now we run through all templates in A, and find all of their g-parents. Any new templates that we find are added to A. This is the end of the second generation. Now we repeat the process. By Theorem 3.1.4, the set of g-ancestors of t0 is finite and thus the process will terminate with A containing all of the g-ancestors of t0 . Let j be the number of generations required to find all of the templates in A. It follows by Lemma 3.1.2 that v is a factor of g i (V ) for some instance V of a template T in A and some i ∈ {0, 1, 2, . . . , j}. DESCRIPTION OF THE ALGORITHM 22 Final Check: Let M = max{∆(t) | t ∈ A}. We note that for every template t in A, we have ∆(t) ≤ M . Let k · (k − 1)M BM = k + 2 + k · (Wg − 2) + . 2 We now repeatedly apply Lemma 3.1.3 with g and Bg (t) ≤ BM and notice that if g ω (a) contains an instance of some template in A, then it contains an instance V of some template in A where |V | < BM . Lemma 3.1.2 assures us that g j (V ) contains an additive k-power. We now enumerate all factors V of g ω (a) of length less than BM , as described in the proof of Lemma 3.2.3. Lastly, we examine the words g j (V ) for each V and check to see if any of them contains an additive k-power. We now consider words of the form h(g ω (a)). We want to decide if the word h(g ω (a)) is additive k-power-free. We show that if h(g ω (a)) contains an additive k-power, then we can find it with a finite amount of computation. Suppose that h(g ω (a)) contains an additive k-power v. We know by definition that v is an instance of the template t0 = [ε, ε, . . . , ε, [0, 0]T , [0, 0]T , . . . , [0, 0]T ]. Initial Check: and We begin by applying Lemma 3.1.3 with h and t = t0 . We have ∆(t0 ) = 0 k · (k − 1)∆(t0 ) = k + 2 + k · (Wh − 2). 2 Hence if |v| ≥ Bh (t0 ), then there is an h-parent T of t0 such that g ω (a) contains an instance V of T and |V | < |v|. We deal with this case in the Calculating Ancestors step. On the other hand, if |v| < Bh (t0 ), then v can be found by enumerating all factors of h(g ω (a)) of length less than Bg (t0 ), as described in the proof of Lemma 3.2.3. If h(g ω (a)) contains no additive k-power of length less than Bh (t0 ), then we proceed onto calculating ancestors. Otherwise, the algorithm terminates with h(g ω (a)) containing an additive k-power. Bh (t0 ) = k + 2 + k · (Wh − 2) + Calculating Ancestors: We begin by building a list A of all possible h-parents of t0 as described in the proof of Lemma 3.2.2. This is the end of the zeroth generation. Now we run through all templates in A, and find all of their g-parents. Any new templates that we find are added to A. This is the end of the first generation. Now we repeat the process. By Theorem 3.1.4, the set of g-ancestors of any template is finite, and thus the process will terminate with A containing all of the g-ancestors of all h-parents of t0 . Let j be the number of generations required to find all of the templates in A. It follows by Lemma 3.1.2 that v is a factor of h(g i (V )) for some instance V of a template T in A and some i ∈ {0, 1, 2, . . . , j}. Final Check: Let M = max{∆(t) | t ∈ A}. We note that for every template t in A, we have ∆(t) ≤ M . Let k · (k − 1)M BM = k + 2 + k · (Wg − 2) + . 2 DESCRIPTION OF THE ALGORITHM 23 Applying Lemma 3.1.3 repeatedly, we see that if g ω (a) contains an instance V of some template in A, then it contains one where |V | < BM . Lemma 3.1.2 assures us that h(g j (V )) contains an additive k-power v. We now calculate every factor V of g ω (a) of length less than BM . Lastly, we examine the words h(g j (V )) for each V and check to see if any of them contains an additive k-power. Chapter 4 Applications In this section, the algorithm described in Chapter 3 is used to prove some new results, and to reprove some previously known results. The new constructions were found using a backtracking algorithm, i.e., a depth-first search through the tree of all words satisfying a given property. Theorem 4.0.1 (Currie, Mol, Rampersad, Shallit [7]). Let f : {0, 1, 2}∗ → {0, 1, 2}∗ be the morphism defined by f (0) = 001, f (1) = 012, f (2) = 212. The word f ω (0) is additive 4-power-free. Proof. We verify that f ω (0) is additive 4-power-free, by running the algorithm described in Section 3.2. By definition, an additive 4-power v in f ω (0) is an instance of the template t0 = [ε, ε, ε, ε, ε, [0, 0]T , [0, 0]T , [0, 0]T ]. For the initial check, we enumerate all factors of f ω (0) of length Bf (t0 ) = 10, and verify that none of them contains an additive 4-power. Next, we find the set A of all f -ancestors of t0 . We find a total of 17056 ancestors after 3 generations. Proceeding onto the final check, we find that for every template t in A, we have ∆(t) ≤ 2. Therefore, we have BM = k + 2 + k · (Wf − 2) + k(k − 1) 4·3 ·M =4+2+4·1+ · 2 = 22. 2 2 We now enumerate all factors V of f ω (0) of length 21. We examine the word f 3 (V ) for each such factor V and check that none of them contain an additive 4-power. Therefore, we conclude that f ω (0) is additive 4-power-free. Theorem 4.0.2 (Currie, Mol, Rampersad, Shallit [7]). Let f : {0, 1, 2}∗ → {0, 1, 2}∗ be the morphism defined by f (0) = 001, f (1) = 012, f (2) = 212 and g : {0, 1, 2}∗ → {0, 1}∗ be the 24 25 morphism defined by g(0) = 0001001110010001100011, g(1) = 0001001110011101100011, g(2) = 0111001110011101100011. The word g(f ω (0)) is additive 4-power-free and 3+ -power-free. This is best possible in the sense that there are only finitely many additive 4-power-free and ordinary 3-power-free binary words with the longest being length 39. Proof. First, we show that g(f ω (0)) is additive 4-power-free. By definition, an additive 4-power v in g(f ω (0)) is an instance of the template t0 = [ε, ε, ε, ε, ε, [0, 0]T , [0, 0]T , [0, 0]T ]. For the initial check, we enumerate all factors of g(f ω (0)) of length Bg (t0 ) = 85, and verify that none of them contains an additive 4-power. Next, we find the set A of all f -ancestors of all g-parents of t0 . We find a total of 17104 ancestors after 1 generation. Proceeding onto the final check, we find M = max{∆(t) | t ∈ A} = 2. Therefore, we have BM = k + 2 + k · (Wf − 2) + k(k − 1) 4·3 ·M =4+2+4·1+ · 2 = 22. 2 2 We now enumerate all factors V of f ω (0) of length 21. We examine the words g(f 1 (V )) for each factor V and check that none of them contain an additive 4-power. Therefore, we conclude that g(f ω (0)) is additive 4-power-free. To check that the word g(f ω (0)) is 3+ -power-free, we enter the following commands into Walnut: morphism g "0->0001001110010001100011 1->0001001110011101100011 2->0111001110011101100011": morphism f "0->001 1-> 012 2->212": promote aut f: image SA g aut: eval containsthreeplus "?msd_3 Ei, n (n>=1) & At (t<=2*n) => SA[i+t]=SA[i+n+t]": The predicate containsthreeplus checks to see if the word contains a 3+ -power , and Walnut returns FALSE. Theorem 4.0.3. The word hω (0) produced by the morphism h : {0, 1}∗ → {0, 1}∗ where h(0) = 00001, h(1) = 01101, is additive 5-power-free and rich. This is best possible in the sense that there are only finitely many additive 4-power-free binary rich words. 26 Proof. First we verify that hω (0) is additive 5-power-free, by running the algorithm described in Section 3.2. By definition, an additive 5-power v in hω (0) is an instance of the template t0 = [ε, ε, ε, ε, ε, ε, [0, 0]T , [0, 0]T , [0, 0]T , [0, 0]T ]. For the initial check, we enumerate all factors of hω (0) of length Bh (t0 ) = 22, and verify that none of them contains an additive 5-power. Next, we find the set A of all h-ancestors of t0 . We find a total of 15625 ancestors after 1 generation. Proceeding onto the final check, we find that for every template t in A, we have ∆(t) ≤ 2. Therefore, we have BM = k + 2 + k · (Wh − 2) + k(k − 1) 5·4 ·M =5+2+5·3+ · 2 = 42. 2 2 We now enumerate all factors V of hω (0) of length 41. We examine the word h1 (V ) for each such factor V and check that none of them contain an additive 5-power. Therefore, we conclude that hω (0) is additive 5-power-free. We will now prove that this word is rich. By Theorem 2.6.2, it suffices to show that every finite prefix of hω (0) has a unioccurrent palindromic suffix. We show this using Walnut. We adapt the Walnut commands used by Baranwal and Shallit [1] and Schaeffer and Shallit [27] in proving that similar infinite words are rich. morphism h "0->00001 1->01101": # creates the desired morphism h. promote H h: # creates an automaton based on the morphism h. def FactorEq "?msd_5 Ak (k(H[i+k]=H[j+k])": def Occurs "?msd_5 (m<= n) & ( Ek (k+m<=n) & $FactorEq(i,j+k,m))": def Palindrome "?msd_5 Aj,k ((k (H[i+k]=H[i+j])": def HisRich "?msd_5 An Ej $Palindrome(j,n-j) & ~$Occurs(j,0,n-j,n-1)": The predicate FactorEq takes 3 parameters i, j, n and returns true if the length-n factors of hω (0) starting at indices i and j are equal. The predicate Occurs takes 4 parameters i, j, m, n and returns true if the length-m factor of the word hω (0) starting at index i occurs in the length-n factor starting at index j. The predicate Palindrome takes 2 parameters i, n and returns true if the length-n factor of hω (0) starting at index i is a palindrome. The predicate HisRich returns true if all the finite prefixes of hω (0) have a unioccurrent palindromic suffix. When we run all of these commands, Walnut returns TRUE. Finally, to see that this is best possible, we use a backtracking algorithm to find that the longest additive 4-power-free binary rich word has length 2411. This was a long backtracking search, made possible by the use of the data structure eerTree of Rubinchik and Shur [26]. This data structure greatly increased the efficiency of searching through the tree of binary rich words. 27 Theorem 4.0.4 (Dekking [12]). The word hω (0) produced by the morphism h : {0, 1}∗ → {0, 1}∗ where h(0) = 0001, h(1) = 011, is additive 4-power-free. Proof. By definition, an additive 4-power v in hω (0) is an instance of the template t0 = [ε, ε, ε, ε, ε, [0, 0]T , [0, 0]T , [0, 0]T ]. For the initial check, we enumerate all factors of hω (0) of length Bh (t0 ) = 14, and verify that none of them contains an additive 4-power. Next, we find the set A of all h-ancestors of t0 . We find a total of 3123 ancestors after 1 generation. Proceeding onto the final check, we find that for every template t in A, we have ∆(t) ≤ 2. Therefore, we have BM = k + 2 + k · (Wh − 2) + k(k − 1) 4·3 ·M =4+2+4·2+ · 2 = 26. 2 2 We now enumerate all factors V of hω (0) of length 25. We examine the word h1 (V ) for each such factor V and check that none of them contain an additive 4-power. Therefore, we conclude that hω (0) is additive 4-power-free. Theorem 4.0.5. The word g ω (0) produced by the morphism g : {0, 1, 2}∗ → {0, 1, 2}∗ defined by g(0) = 01100, g(1) = 0110121, g(2) = 011012221, is additive 4-power-free. Proof. By definition, an additive 4-power v in g ω (0) is an instance of the template t0 = [ε, ε, ε, ε, ε, [0, 0]T , [0, 0]T , [0, 0]T ]. For the initial check, we enumerate all factors of g ω (0) of length Bg (t0 ) = 34, and verify that none of them contains an additive 4-power. Next, we find the set A of all g-ancestors of t0 . We find a total of 17404 ancestors after 1 generation. Proceeding onto the final check, we find that for every template t in A, we have ∆(t) ≤ 2. Therefore, we have BM = k + 2 + k · (Wh − 2) + k(k − 1) 4·3 ·M =4+2+4·2+ · 2 = 46. 2 2 We now enumerate all factors V of g ω (0) of length 45. We examine the word g 1 (V ) for each such factor V and check that none of them contain an additive 4-power. Therefore, we conclude that g ω (0) is additive 4-power-free. Chapter 5 Open Problems This thesis was focused on an algorithm for deciding if a word is additive k-power-free. We now discuss several open problems related to additive k-power-free words. Problem 1: In Theorem 4.0.4, we reprove that the infinite binary word discovered by Dekking is additive 4-power-free. The frequency of zeros in this word converges to ≈ 55%. We attempted to find a construction of an infinite additive 4-power-free binary word with a higher frequency of 0’s, but were unsuccessful. The open problem is stated as follows: What is the maximum possible frequency of 0’s in an infinite additive 4-power-free binary word? Problem 2: It is known that over many alphabets of size 3 such as {0, 3, 8}, {0, 2, 7}, and {0, 1, 8} there exist infinite additive cube-free words [24]. There are various alphabets of size 3 where it is unknown if such words exists. It is conjectured that for every alphabet A = {0, i, j} such that i and j are co-prime and j ≥ 6, there exists an infinite additive-cube-free word on the alphabet A [24]. The cases {0, 1, 2}, {0, 1, 3}, {0, 1, 4}, and {0, 2, 5} remain open. It is also unknown if an infinite additive cube-free word exists over the 4 letter alphabet {0, 1, 2, 3}. Rao constructed a word over {0, 1, 2, 3} of length 140000 that is additive cube-free. Problem 3: In Theorem 4.0.3, we proved that the infinite binary word g ω (0) is additive 5-power-free. Using Walnut, we also proved it was rich. Through backtracking, we found a morphism g : {0, 1, 2}∗ → {0, 1, 2}∗ defined by g(0) = 01100, g(1) = 0110121, g(2) = 011012221. We proved the word g ω (0) is additive 4-power-free in Theorem 4.0.5. We conjecture that this word is also rich. If our conjecture is correct, then this would be considered best possible in the sense that there are only finitely many additive cube-free ternary rich words. 28 29 Problem 4: Over the course of writing this thesis, the word hω (0) produced by the morphism h : {0, 1, 2, 3}∗ → {0, 1, 2, 3}∗ where h(0) = 01210, h(1) = 20103, h(2) = 20123, h(3) = 13231, was discovered through backtracking, and we conjecture that it is additive 5-power-free and square-free. We attempted to run our algorithm to prove the additive 5-power-freeness, but at the calculating ancestors step, we found 957318 ancestors of t0 , and it took 36 generations. This large number of generations required meant that completing the final check was not feasible. We were able to prove that hω (0) is square-free using Walnut. Here is the Walnut code for proving square-freeness: morphism h "0->01210 1->20103 2->20123 3->13231": promote H h: eval HContainsSquares "?msd_5 Ei,n n>=1 & As s H[i+s]=H[i+s+n]": # returns FALSE. Bibliography [1] A. R. Baranwal and J. Shallit, “Repetitions in infinite palindrome-rich words,” in Combinatorics on Words. Springer International Publishing, 2019, pp. 93–105. [2] J. Berstel, “Axel Thue’s Papers on Repetitions in Words: A Translation,” in Monographies du LaCIM, vol. 11, LaCIM, 1992, pp. 65–80. [3] S. Brlek, S. Hamel, M. Nivat, and C. Reutenauer, “On the palindromic complexity of infinite words,” International Journal of Foundations of Computer Science, vol. 15, pp. 293– 306, 2004. [4] A. Carpi, “On dejean’s conjecture over large alphabets,” Theoretical Computer Science, vol. 385, pp. 137–151, 2007. [5] J. Cassaigne, J. D. Currie, L. Schaeffer, and J. Shallit, “Avoiding three consecutive blocks of the same size and same sum,” Journal of the ACM, vol. 61, pp. 1–17, 2011. [6] J. D. Currie, L. Mol, and N. Rampersad, “The repetition threshold for binary rich words,” Discrete Mathematics & Theoretical Computer Science, vol. 22, 2020. [7] J. D. Currie, L. Mol, N. Rampersad, and J. Shallit, “Extending dekking’s construction of an infinite binary word avoiding abelian 4-powers,” pp. 1–8, 2021. arXiv: 2111.07857. [8] J. D. Currie and N. Rampersad, “A proof of dejean’s conjecture,” Mathematics of Computation, vol. 80, 2009. [9] J. D. Currie and N. Rampersad, “Dejean’s conjecture holds for N ≥ 27,” RAIRO Theoretical Informatics and Applications, vol. 43, pp. 775–778, 2009. [10] J. D. Currie and N. Rampersad, “Fixed points avoiding abelian k-powers,” Journal of Combinatorial Theory, Series A, vol. 119, pp. 942–948, 2012. [11] F. Dejean, “Surunth´eor‘emedethue,” Journal of Combinatorial Theory, Series A, pp. 90– 99, 1972. [12] F. Dekking, “Strongly non-repetitive sequences and progression-free sets,” Journal of Combinatorial Theory, Series A, vol. 27, pp. 181–185, 1979. [13] X. Droubay, J. Justin, and G. Pirillo, “Episturmian words and some constructions of De Luca and Rauzy,” Theoretical Computer Science, vol. 255, pp. 539–553, 2001. 30 BIBLIOGRAPHY 31 [14] P. Erdős, “Some unsolved problems,” Michigan Mathematical Journal, vol. 4, pp. 291– 300, 1957. [15] A. A. Evdokimov, “Strongly asymmetric sequences generated by a finite number of symbols,” Dokl. Akad. Nauk SSSR, vol. 179, pp. 1268–1271, 1968. [16] A. Glen, J. Justin, S. Widmer, and L. Q. Zamboni, “Palindromic richness,” European Journal of Combinatorics, vol. 30, pp. 510–531, 2009. [17] V. Keränen, “Abelian squares are avoidable on 4 letters,” in Automata, Languages and Programming, W. Kuich, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 1992, pp. 41–52. [18] F. Lietard and M. Rosenfeld, “Avoidability of additive cubes over alphabets of four numbers,” in 2020, pp. 192–206. [19] H. Mousavi, “Automatic theorem proving in walnut,” preprint, vol. 2, pp. 1–34, 2016. [Online]. Available: http://arxiv.org/abs/1603.06017. [20] J. M. Ollagnier, “Proof of dejean’s conjecture for alphabets with 5, 6, 7, 8, 9, 10 and 11 letters,” Theoretical Computer Science, vol. 95, pp. 187–205, 1992. [21] E. Pelantová and Š. Starosta, “Languages invariant under more symmetries: Overlapping factors versus palindromic richness,” Discrete Mathematics, vol. 313, pp. 2432–2445, 2013. [22] P. A. B. Pleasants, “Non-repetitive sequences,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 68, pp. 267–274, 1970. [23] M. Rao, “Last cases of dejean’s conjecture,” Theoretical Computer Science, vol. 412, pp. 3010–3018, 2011. [24] M. Rao, “On some generalizations of abelian power avoidability,” Theoretical Computer Science, vol. 601, pp. 39–46, 2015. [25] M. Rao and M. Rosenfeld, “Avoiding two consecutive blocks of same size and same sum over Z2 ,” SIAM Journal on Discrete Mathematics, vol. 32, pp. 2381–2397, 2018. [26] M. Rubinchik and A. M. Shur, “Eertree: An efficient data structure for processing palindromes in strings,” European Journal of Combinatorics, vol. 2, 2015. [27] L. Schaeffer and J. Shallit, “Closed, palindromic, rich, privileged, trapezoidal, and balanced words in automatic sequences,” Electronic Journal of Combinatorics, pp. 1–19, 2016. [28] J. Shallit, The Logical Approach To Automatic Sequences: Exploring Combinatorics on Words with Walnut, 1st ed. Cambridge: Cambridge University Press, 2022. [29] J. Shallit. “Palindromes, squares, and other repitions in natural languages.” (), [Online]. Available: https://cs.uwaterloo.ca/~shallit/repetitions.html. [30] A. Thue, “Über unendliche zeichenreihen,” Norske Vid. Selsk. Skr. I. Mat. Nat. Kl., vol. No. 7, pp. 1–22, 1906. BIBLIOGRAPHY [31] 32 A. Thue, “Über die gegenseitige lage gleicher teile gewisser zeichenreihen,” Videnskapsselskapet: Skrifter; 1912,1, pp. 1–67, 1912.