h?2 lb2 Q7 :`T? AbQKQ`T?BbKb BM *?2KB+H :`T? h?2Q`v #v J2;M >MFb  i?2bBb bm#KBii2/ BM T`iBH 7mH7BHHK2Mi Q7 i?2 `2[mB`2K2Mib 7Q` i?2 /2;`22 Q7 "+?2HQ` Q7 a+B2M+2 U>QMbXV BM i?2 .2T`iK2Mi Q7 Ji?2KiB+b  aiiBbiB+b ÜJ2;M >MFb kyk8 q2 ++2Ti i?Bb i?2bBb b +QM7Q`KBM; iQ i?2 `2[mB`2/ biM/`/b, .`X _B+?`/ "`2rbi2` .2TiX Q7 Ji?2KiB+b  aiiBbiB+b h?2bBb amT2`pBbQ` .`X Gm+b JQH .2TiX Q7 Ji?2KiB+b  aiiBbiB+b .`X C2bbB+ HHBM;?K .2TiX Q7 *?2KBbi`v .i2/ CmHv k kyk8- EKHQQTb- "`BiBb? *QHmK#B- *M/ THOMPSON RIVERS UNIVERSITY DEPARTMENT OF MATHEMATICS & STATISTICS Permission is herewith granted to Thompson Rivers University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon request of individuals or institutions. -------------------------------Signature of Author the author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the author’s written permission. the author attests that permission has been obtained for the use of any copy-righted material appearing in this thesis (other than brief excerpts requiring only proper acknowledgement in scholarly righting) and that all such use is clearly acknowledged. ii Abstract This thesis examines the use of graphs as models for chemical molecules and processes. We begin by defining chemical graphs and their properties. We then define the subgraph isomorphism problem and examine two algorithms which can be used to solve it: Ullmann’s algorithm and the SubGemini algorithm. We further discuss the application of these algorithms to chemical graph theory and provide examples for each. In addition to structural models, we describe an algorithm for the identification of intermediates and products of a reaction modeled as a network. The algorithm is demonstrated by means of an example. iii Acknowledgements I would like to extend my sincere gratitude to my supervisor, Dr. Richard Brewster. This project would not have been possible without his guidance, expertise and patience. I would also like to thank my committee members, Dr. Lucas Mol and Dr. Jessica Allingham for their time and feedback on this project. Finally I would like to thank the Department of Mathematics and Statistics at Thompson Rivers University, and all the faculty members from whom I have gained a deeper appreciation for mathematics and research. iv Contents Abstract iii Acknowledgements iv 1 Introduction 1 2 Background in Graph Theory 2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Other Types of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Chemical Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Subgraphs and Induced Subgraphs . . . . . . . . . . . . . . . . . . . . . 2.2.4 Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Adjacency Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Other Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 5 5 6 8 10 10 10 12 3 Background in Chemistry 3.1 Common Molecules and Functional Groups . . . . . . . . . . . . . . . . . . . . 3.2 R-Group Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Drawing Molecular Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Corey–Pauling–Koltun (CPK) Colouring . . . . . . . . . . . . . . . . . . 14 15 15 16 17 4 Graph Isomorphism 4.1 Graph Isomorphism Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Similar and Permutation Similar Matrices . . . . . . . . . . . . . . . . . 4.1.2 Matrix Representation of an Isomorphism . . . . . . . . . . . . . . . . . 4.2 Ullman’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Labeling Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Neighbourhood Condition . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 22 24 25 27 27 27 v 4.3 The Gemini and SubGemini Algorithms . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Gemini and GeminiII . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 SubGemini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 36 5 Reaction Networks 5.1 Orlova’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Algorithm Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Reaction Network for Ethyl Acetoacetate . . . . . . . . . . . . . . . . . . . . . 5.2.1 Input: Initial Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Input: Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Input: Reaction Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Output: List of Chemical Species . . . . . . . . . . . . . . . . . . . . . . 39 39 40 40 40 41 42 45 47 6 Conclusion 48 References 48 vi List of Figures 2.1 2.2 A graph on 5 vertices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The graph on the left is labeled from the set {C, O, N }. The graph on the right uses colours and shapes to distinguish vertices. . . . . . . . . . . . . . . . . . . 2.3 The same underlying graph with two different colourings. The colouring on the left is a proper colouring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 A directed graph showing a simple chemical reaction involving two reactants and a single product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 A graph on 3 vertices, and a labeling of that same graph to model a molecule of water (H2 O). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Graphs representing carbon dioxyde (CO2 ) and ozone (O3 ). . . . . . . . . . . . 2.7 The structure of D-Galacturonic Acid. . . . . . . . . . . . . . . . . . . . . . . . 2.8 D-Galacturonic Acid with some hydrogen atoms removed. . . . . . . . . . . . . 2.9 The graph G from Figure 2.1 and two smaller graphs, G0 in the center and G00 on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Two induced subgraphs of G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Two drawings of a bipartite graph with with partite sets A = {a1 , a2 , a3 } and B = {b1 , b2 , b3 }. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 The graph G and its corresponding adjacency matrix A. . . . . . . . . . . . . . 2.13 Another graph, G0 with corresponding adjacency matrix B. . . . . . . . . . . . 2.14 The structure of P and P T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 3.3 3.4 3.5 3.6 4.1 The structure of (from left to right) a ketone, a carboxyl and an ester. . . . . . The General Form of a Carboxylic Acid and an Alcohol. . . . . . . . . . . . . . Methanol and Ethanol with the R groups shown in the box. . . . . . . . . . . . The structure of ethyl acetoacetate which include an ethyl group, shown in blue (dashed), an ester shown in red (solid), and a ketone group shown in green (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Kekulé and line-bond formulas for ethanol. . . . . . . . . . . . . . . . . . . Line-bond formulas for benzene and ethyl acetoacetate. . . . . . . . . . . . . . A graph G with V (G) = {a, b, c, d} and a second graph G0 with V (G0 ) = {w, x, y, z}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 3 4 4 5 6 7 7 8 9 9 10 11 11 12 15 15 16 16 17 17 19 4.2 Isomorphic bipartite graphs G and G0 with partite sets A = {a1 , a2 , a3 } and B = {b1 , b2 , b3 }. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Graph G on the left, and graph H on the right. . . . . . . . . . . . . . . . . . . 4.4 Graphs G and H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Graph G corresponding to a molecule of D-Galaturonic acid. . . . . . . . . . . 4.6 The graph H corresponding to a hydroxyl functional group, and its adjacency matrix B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Graph G corresponding to a molecule of D-Galaturonic acid with labels assigned based on type of atom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 The graph H corresponding to a hydroxyl functional group, and its adjacency matrix B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 The structure of cis-2-butene on the left, and a benzene ring on the right. . . . 4.10 The partitioning of hydrogen atoms in a molecule of cis-2-butene. . . . . . . . . 4.11 Determining the structure of cis-2-butene from one of the labels. . . . . . . . . 4.12 An H-NMR of 2-cis-butene showing two distinct peaks [4]. . . . . . . . . . . . . 4.13 A graph representation of propen-2-ol using CPK colouring. . . . . . . . . . . . 4.14 A graph representation of 2-propen-1-ol shown using CPK colouring. . . . . . . 4.15 4-Nitrobenzyl Acetoacetate and the carbonyl pattern. The dashed lines and vertices labeled X are corrupted vertices. The subscripts are shown for ease of identification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 5.2 5.3 The same simple chemical system modeled as a CRN and as an RRN. . . . . . The structure of ethyl acetoacetate and methanol. . . . . . . . . . . . . . . . . The reaction network generated by Orlova’s Algorithm generated from ethyl acetoacetate and methanol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 20 23 25 27 28 29 29 31 32 33 34 35 35 37 39 41 46 List of Tables 3.1 3.2 Common Functional Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPK colourings for some common atoms. . . . . . . . . . . . . . . . . . . . . . 15 18 5.1 5.2 Summary of Reactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The reaction intermediates from the reaction network for ethyl acetoacetate and methanol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Final products of the reaction network for ethyl acetoacetate and methanol. . . 43 5.3 ix 47 47 Chapter 1 Introduction Graph theory is an area of mathematics that studies structures composed of points and the connections between them. These structures, which are referred to as graphs, have many applications and are used for modeling the relationships between objects in a wide array of different fields. Some applications of graph theory include modeling social networks, route optimization, search and sort type algorithms and even language structure. One of the first mathematicians to employ this type of modeling technique was Arthur Cayley [3]. In 1874, he published a paper detailing a method for enumerating isomers using diagrams involving linked points. From these types of linked-point models came the field of chemical graph theory which applies graph theory to the modeling of molecules, chemical properties, and reactions. In this paper, we define chemical graphs and the ways in which graph theory can be applied to the study of chemical molecules and reactions. This includes the use of labeling and hydrogen-reduction to most efficiently model the structure of individual molecules, as well as the parts of a molecule which undergo reactions. We also discuss the subgraph isomorphism problem and how it relates to the identification of functional groups in a molecule. Two algorithms for subgraph isomorphism, Ullmann’s Algorithm [13] and the SubGemini Algorithm [10], are described along with conditions for their application to chemical graphs. Each algorithm is also applied to chemical examples. Finally, we describe reaction networks, and an algorithm for finding all the chemical species and transformations that occur within that network [11]. This algorithm is demonstrated using an example with two initial species: methanol and ethyl acetoacetate. 1 Chapter 2 Background in Graph Theory We will define key terms from graph theory and chemistry used in this thesis following Bondy and Murty for graph theory [2], and IUPAC (the International Union of Pure and Applied Chemisty) gold book for chemistry [9]. We refer the reader to these texts for terminology not defined in this thesis. 2.1 Basic Definitions Definition 2.1. A graph G = (V, E, G ) is a non-empty set of vertices V together with a set of edges, E. The incidence function G associates each edge in E with an unordered pair of vertices in V . If G (e) = {u, v} then we say that e joins u and v. For the purposes of this paper, a pair of vertices {u, v} will be denoted as uv. All graphs in this work are finite, which means that they contain a finite number of vertices and edges. Although a graph is an ordered triple, in this paper a graph will be denoted by G = (V, E) where the incidence function G is not explicitly stated. Example 2.2. The following is an example of a graph G with V (G) = {a, b, c, d, e}, E(G) = {e1 , e2 , e3 , e4 , e5 , e6 , e7 } and G is indicated by the diagram. For example G (e1 ) = ab, G (e2 ) = ac etc. 2 BASIC DEFINITIONS 3 b e1 a e4 e3 e2 c e7 e5 d e6 e Figure 2.1: A graph on 5 vertices. If (e) = uv, the edge e is said to be incident to u and v. The vertices u and v are the end points of e. In Figure 2.1 e2 , e3 and e5 are incedent to c. Two edges that share an endpoint are said to be adjacent, as are two vertices connected by an edge. In Figure 2.1 a, c and d are adjacent to b. Definition 2.3. The number of edges incident to a particular vertex v is called the degree of v, which is denoted as deg(v). The maximum degree of G, denoted (G), is the highest degree of any vertex in G. Conversely, the minimum degree of G, (G), is the lowest degree of any vertex in G. In the graph G drawn in Figure 2.1, deg(a) = 3 and deg(e) = 2. Since the maximum degree of any vertex in G is 3, we have (G) = 3. It is possible for a vertex to have a degree of zero, as would be the case for the lone vertex in the graph containing only one vertex and no edges. It is interesting to note that in early graph theory works, the degree was called the valence of a vertex. In chemistry, the valence of an atom is the number of bonds that the atom is able to form with other atoms. It is directly related to the valence electrons which are the outermost electrons of the atom. Definition 2.4. The number of vertices in a graph G is called the order of G. The number of edges in G is called the size of G. Unlike (G), the order of a graph must always be greater than or equal to 1, since the set of vertices V is non-empty. Many papers on chemical graph theory use the word size to refer to the number of vertices in a graph, however this paper will use the definitions as given above. Generally, the vertices of a graph are distinguished only by their connections to other vertices. However, it is possible to assign labels to further distinguish between vertices. Definition 2.5. Given a graph G = (V, E) and a set of labels L, a vertex labeled graph or simply a labeled graph is a triple G` = (`, V, E) where ` : V ! L assigns each vertex v a label BASIC DEFINITIONS 4 `(v). An edge labeled graph is similar, except that labels are assigned to edges. As an abuse of notation, we may write G to denote either the graph or the labeled graph. Note that the vertices in the graph G from Figure 2.1 have unique labels to distinguish them. This is not always the case, and more than one vertex in the same graph can have identical labels as in Figure 2.2. C C C N O C Figure 2.2: The graph on the left is labeled from the set {C, O, N }. The graph on the right uses colours and shapes to distinguish vertices. A labeled graph G` = (`, V, E) has an underlying graph G = (V, E). Two labeled graphs may have the same underlying graph and differ only in their labeling. A special case of graph labeling is graph colouring. Colours can be used simply to differentiate between vertices in the same way that labels do, however a proper graph colouring refers to a graph where colours are assigned to labels so that no two adjacent vertices have the same label. Consider the underlying graph G shown below. Figure 2.3: The same underlying graph with two different colourings. The colouring on the left is a proper colouring. Labeled graphs are particularly useful for modeling chemical molecules, as the labels OTHER TYPES OF GRAPHS 5 allow for differentiation between types of atoms in a structure. In reaction networks, nodes can be labeled as chemical species and reaction types. 2.2 Other Types of Graphs 2.2.1 Digraphs Generally, the edges of a graph are considered to be bidirectional, meaning it goes in both directions; however, there are times when it is useful to specify the direction of an edge. This can be thought of as the flow from one vertex to another. In all the graphs discussed so far in this paper, the edges are bidirectional. Such graphs are called undirected when needed, to distinguish from directed graphs. There are applications for this type of model, such as the flow of traffic along roads, or the flow of messages in a social network. Flow is important for chemical graph theory when it comes to modeling the flow of reactions within a chemical system. This direction of flow can be represented by an arrow, as in Figure 2.4. Reactant 2 Reaction 1 Product Reactant 1 Figure 2.4: A directed graph showing a simple chemical reaction involving two reactants and a single product. Formally a directed graph or digraph is a type of graph G which differs from an undirected graph in that edges between vertices are ordered pairs and are typically referred to as arcs. Each arc has a specified orientation. This is to say that if u, v 2 V (G) and uv is an arc in G, then u is the tail of uv and v is the head. Alternatively, if (e) = (u, v) then e is said to join u to v, but does not join v to u. A network is a digraph where vertices are considered to be sources, sinks or intermediate vertices. A source is a vertex that initiates a process and all edges incident to that vertex are directed away from it. A sink is a vertex that only has in directed arcs, while intermediate OTHER TYPES OF GRAPHS 6 vertices have arcs directed toward and away from them. In Example 2.4, the sources are Reactant 1 and Reactant 2, while the sink is the Product vertex. The only intermediate vertex is the Reaction 1 vertex. In the case of reaction networks, the sources will be any initial reactants in a system, while the sinks will be any final non-reactive products. Intermediate vertices will include reaction steps and intermediate chemical species. Figure 2.4 is an example of a very simple reaction route network or RRN. These will be discussed in greater detail in Chapter 5. 2.2.2 Chemical Graphs In addition to chemical reactions, chemical molecules can also be modeled as graphs. In some of his later works, Arthur Cayley referred to these diagrams representing chemical structures as plerograms. In chemical graphs, nodes represent atoms, and edges represent the bonds between atoms. Labeling is used to differentiate between different atoms and groupings of atoms. The following example shows the importance of labeling when modeling a chemical molecule as a graph. Example 2.6. The figure on the left is an example of a graph on three vertices with V (G) = {u, v, w} and E(G) = {uv, vw}. By labeling the vertices as `(u) = H, `(v) = O and `(w) = H, the graph can be used to model a molecule of water with chemical formula H2 O. In this figure colors have been added which correspond to the label of the vertices. u w H H v O Figure 2.5: A graph on 3 vertices, and a labeling of that same graph to model a molecule of water (H2 O). We can see that the labels allow for the distinction of different atoms in the molecule. The following are two more examples of graphs with the same set of vertices, but a different labeling. The molecules represented are carbon dioxide (CO2 ) and ozone (O3 ). Note the additional edges that have been added to reflect the bonds present in these two molecules. As a result, the underlying graph will have different edges with the same end points. Such edges are called parallel. OTHER TYPES OF GRAPHS 7 O O O O O+ C Figure 2.6: Graphs representing carbon dioxyde (CO2 ) and ozone (O3 ). For larger and more complex molecules, there are times when it is appropriate to omit certain atoms, such as hydrogen atoms, or to use a labeled vertex to represent multiple atoms rather than a single atom. In fact, in chemical graph theory, it is more common to omit hydrogen atoms than it is to use true plerograms which show every atom as an individual vertex [8]. Consider the molecular structure of D-Galacturonic Acid shown in Figure 2.7. O H H H O C O O C C C C O H H H C H O O H H H Figure 2.7: The structure of D-Galacturonic Acid. We can see that in the full molecule of Galacturonic acid we have 23 nodes, and therefore the corresponding adjacency matrix, which is defined in Section 2.3.1, is a 23 ⇥ 23 matrix. By removing the hydrogen atoms attached to the carbon atoms (shown in red),the reduced graph has 18 nodes and thus the adjacency matrix is an 18 ⇥ 18 matrix. OTHER TYPES OF GRAPHS 8 O H C O O H H O O C C C C C O H O H Figure 2.8: D-Galacturonic Acid with some hydrogen atoms removed. Although there are cases where this will not be possible, it is possible to remove hydrogen atoms from most models while preserving the relevant aspects of the molecules structure. In fact, Arthur Caley coined the term kenogram to refer to “hydrogen suppressed” graphs [1] and it has been noted that these are used far more often than plerograms [8]. If required, it is also possible to represent small groupings of atoms as a single vertex, for example using distinct labels for carbons based on the number of hydrogen atoms they are bonded to. Using this method, ( CH3 ) would become a single vertex as opposed to 4 vertices, and different labels would be used for ( CH2 ), ( CH ) and ( C ) groups. In the example of D-galacturonic acid, one could further reduce the number of vertices by using labels to differentiate between oxygen bonded to a hydrogen and the oxygen molecules bonded to two non-hydrogen atoms. When reducing the number of vertices in this manner, it is important to carefully consider which patterns and substructures will be significant. 2.2.3 Subgraphs and Induced Subgraphs Definition 2.7. A graph G0 is a subgraph of the graph G if V (G0 ) ✓ V (G) and E(G0 ) ✓ E(G) where for each edge uv 2 E(G0 ), we have u, v 2 V (G0 ). Consider the graph from Figure 2.1 and two graphs constructed on a subset of V (G). OTHER TYPES OF GRAPHS 9 b b a b a c a c d e c d d e Figure 2.9: The graph G from Figure 2.1 and two smaller graphs, G0 in the center and G00 on the right. The graph G0 is not a subgraph of G. This can be proved simply by observing that vertex c in G0 has degree 4, while (G) =3. Therefore, G0 cannot be a subgraph of G. The graph G00 , however, is a subgraph of G. Definition 2.8. A subgraph G0 is an induced subgraph of G if whenever u, v 2 V (G0 ) and uv 2 E(G) we have uv 2 E(G0 ). The subgraph G0 containing vertices S = V (G0 ) is called the subgraph of G induced by S and is denoted by G[S]. Of the previous examples, neither is an induced subgraph. G0 not a subgraph, and therefore cannot be an induced subgraph, while G00 is missing the edges ab and bd. The subgraphs of G induced by {b, c, d, e}, and {a, c, d, e} are shown below. b e4 e3 a c c d e7 e6 e e2 e5 e Figure 2.10: Two induced subgraphs of G. e6 e5 d MATRICES 2.2.4 10 Bipartite Graphs Definition 2.9. A graph or digraph, G = (V, E) is bipartite if the vertices of G can be partitioned into two sets, A and B, called partite sets, such that every edge in the graph has one end in A and the other in B. Bipartite graphs can be drawn in such a way as to make the two partite sets visually distinct using different shapes or colours for the vertices. a3 b3 a3 a2 b2 a1 b2 b1 b3 a2 a1 b1 Figure 2.11: Two drawings of a bipartite graph with with partite sets A = {a1 , a2 , a3 } and B = {b1 , b2 , b3 }. These types of graphs can be very useful in a number of applications, such as modeling the association of individuals with groups or affiliations within a social network, or the association between specific genes and types of cancer. In chemical graph theory, bipartite graphs are used to model reaction networks, where nodes are classified as either a chemical species node or a reaction node. Chemical species nodes have directed edges towards those reactions that they are able to participate in. Reaction nodes then have edges towards the products of those reactions. A very simple example is shown in Figure 2.4. Reaction networks are discussed in greater detail in Chapter 5. 2.3 Matrices 2.3.1 Adjacency Matrices A common way to represent a graph G, is as an adjacency matrix. If G has n vertices, then its adjacency matrix, A is an n ⇥ n matrix with one row and one column for each vertex. The entry aij is the number of edges that exist between vertex i and vertex j. For example, we have aij = 1 if there is a single edge between vertex i and vertex j, and if there are no edges between i and j, then aij = 0. The adjacency matrix A for a graph G used to represent a chemical molecule will have the following properties. MATRICES 11 1. aij = aji (as bonds are undirected). 2. A = AT (hence A is symmetric). 3. The diagonal entries of A are all zero. These are entries aij where i = j. Example 2.10. Consider the graph G with order 4 shown in Figure 2.12. The set of vertices is V (G) = {a, b, c, d} and the set of edges is E(G) = {ab, bc, bd}. The adjacency matrix for this graph is also shown in Figure 2.12. We can see that the edges in E correspond to the non-zero entries in A. a 2 a 0 b6 61 c4 0 d 0 c b a b 1 0 1 1 c 0 1 0 0 d 3 0 17 7 05 0 d Figure 2.12: The graph G and its corresponding adjacency matrix A. Consider the graph G0 with V (G0 ) = {w, x, y, z} shown in Figure 2.13. w w 0 x6 61 y4 1 z 1 2 w x y x 1 0 0 0 y 1 0 0 0 z 3 1 07 7 05 0 z Figure 2.13: Another graph, G0 with corresponding adjacency matrix B. We can see that both matrices A and B are symmetric and zero-diagonal, and that both contain the same number of non-zero entries. We also see a natural correspondence between the graphs: a ! x, b ! w, c ! y and d ! z. Despite the matrices looking different, we can obtain B from A by reordering the rows and columns as b, a, c, d. This is the heart of the “graph isomorphism problem” discussed in great detail in Chapter 4. MATRICES 2.3.2 12 Other Types of Matrices Definition 2.11. The identity matrix is a square matrix with 1’s as its diagonal entries and all other entries equal to 0. This is equivalent to saying that for an identity matrix I = [aij ], aij = 1 if i = j; otherwise, aij = 0. By the above definition, each row and column of the identity matrix has exactly one 1. By re-arranging the rows and columns of the identity matrix, we obtain a matrix that retains this property, however, the resulting matrix no longer has the condition that aij = 1 if i = j. We call the process of re-arranging rows and columns in a matrix permuting the rows and columns. Definition 2.12. A permutation matrix is an n ⇥ n matrix with exactly one 1 in each row and each column. Permutation matrices have the property, that multiplying a matrix A on the left by a permutation matrix P the rows of A are permuted. When multiplying A by P on the right, the columns of A are permuted. More formally, let P be a permutation matrix, and suppose that the unique 1 in row i is found in column j, which is to say pij = 1 and pik = 0, k 6= j. Let A be an n ⇥ n matrix and B = P A. Then bik = ⌃nt=1 pit · atk = pij · ajk = ajk , since pij is the only non-zero entry in row i of P . Thus the ith row of the matrix P A will be the jth row of A. By similar reasoning, P T has a unique 1 in column i at row j. Hence the ith column of of AP T will be the jth column of A. We record this as the following observation. 2 j 3 6 7 .. 6 7 . 6 7 7 P = i6 60 · · · 0 1 0 · · · 7 6 7 .. 4 5 . 2 6 6 6 T P = j6 60 · · · 0 6 4 Figure 2.14: The structure of P and P T . i 0 .. . 3 7 7 7 1 0 ··· 7 7 7 .. 5 . 0 Observation 1. Let P be a permutation matrix, and A another matrix. If pij = 1, then row j of P A is the same as row i of A. Further, if pij = 1, then pTji = 1 and column i of AP T = column j of A. Furthermore, we can use these together to get the following observation. Observation 2. Suppose for a permutation matrix P , pij = 1 and pkl = 1. Then the entry ajl in the matrix A is equal to the entry bik in the matrix B = P AP T . The following example shows this principle for a 3 ⇥ 3 matrix A. For this particular P , we have P = P T ; however, this is not always the case. First we see the effect of multiplying A MATRICES 13 by P on the left and by P T on the right. 2 32 3 2 3 1 0 0 a b c a b c P A = 4 0 0 15 4 d e f 5 = 4 g h i 5 0 1 0 g h i d e f 2 32 3 2 3 a b c 1 0 0 a c b AP T = 4d e f 5 40 0 15 = 4d f e 5 g h i 0 1 0 g i h Now, if both are applied simultaneously, we see a permutation of both rows and columns. 2 32 32 3 2 3 1 0 0 a b c 1 0 0 a c b P AP T = 40 0 15 4d e f 5 40 0 15 = 4g i h5 0 1 0 g h i 0 1 0 d f e Returning to the adjacency matrices in Figure 2.12 and Figure 2.13, we see that the matrix B can be obtained by multiplying A by a permutation matrix on the left and by its transpose on the right. 2 3 2 3 0 1 1 1 0 1 0 0 6 1 0 0 07 6 1 0 0 07 T 7 6 7 B=6 41 0 0 05 = 40 0 1 05AP 1 0 0 0 0 0 0 1 When B = P AP T for a permutation matrix P , we say A and B are permutation similar. In the case of adjacency matrices, this corresponds to relabeling or reordering the vertices, but it does not change the structure of the graph. Similar and permutation similar matrices will be discussed in greater detail in Section 4.1.1. It is also important to note that the transpose of a permutation matrix is also its inverse. Consider a permutation matrix P = [pij ] and inverse P 1 . By definition of the inverse, P P 1 = P 1 P = I where I is the identity matrix and, by definition of the transpose, P T = [pji ]. Suppose pij is the unique 1 in row i of P . If we take the product of P P T , then by Observation 1 the ith column of P P T is the jth column of P . Since the jth column of P has its unique 1 in the ith row, the unique 1 in column i of P P T will occur in row i. By this logic, all the 1’s in P P T will occur as diagonal entries, meaning P P T = I. The same can be shown for P T P . Chapter 3 Background in Chemistry In order to understand the application of graph theory to chemical molecules and reactions, some basic terminology is required. Definition 3.1. An atom is the smallest unit of a chemical element. Some common atoms, particularly in organic compounds are carbon, oxygen, hydrogen, nitrogen and metals like sodium or magnesium. Atoms are often referenced by their one or two letter abbreviation from the periodic table. The abbreviations for atoms referenced in this paper include C for carbon, O for oxygen, and H for hydrogen. Other atoms may have less obvious abbreviations such as Fe for iron or Na for sodium. Definition 3.2. Atoms bond with other atoms to form molecules. Molecules can contain atoms of the same element, or atoms of different elements. Chemical compounds are molecules which contain more than one type of atom. The terms molecule and compound are often used interchangeably, as most molecules are compounds; however, there a number of common molecules that are not compounds such as, H2 , N2 and O3 . The term chemical species includes molecules, compounds as well as individual atoms or other chemical complexes that may be involved in a chemical reaction. Definition 3.3. A functional group is a grouping of atoms within a larger molecule that exhibits a specific set of chemical properties. Functional groups play an important role in the reactivity of a molecule. Some common functional groups are listed in the next section. For the purposes of modeling chemical molecules, it will be useful to think of functional groups as subgraphs of the larger molecular graph. When thought of in this way, functional groups may be referred to as patterns. In the context of chemical graph theory, a pattern refers to the portion of the graph that will undergo a transformation during a particular reaction. A molecule may contain more than one pattern, and the pattern of interest may depend on the specific reaction being studied. In order to determine how a molecule will react, it is important to determine which patterns it contains. 14 COMMON MOLECULES AND FUNCTIONAL GROUPS 3.1 15 Common Molecules and Functional Groups There are several common functional groups with distinctive properties that are frequently seen in organic chemistry. Several of these groups are listed in Table 3.1. Molecular Formula -CH3 -CH2 CH3 -OH -COH 6 membered carbon ring IUPAC name methyl group ethyl group hydroxyl group aldehyde phenyl, benzene Chemical characteristic small, non-polar, non-reactive non-polar, non-reactive polar, reactive polar, reactive bulky, non-reactive Table 3.1: Common Functional Groups Other important functional groups include ketones, carboxyls and esters. These are shown below, with dotted lines showing where these groups would connect to the rest of the molecule. O ···C C O C··· ···C C O O ···C H C O··· Figure 3.1: The structure of (from left to right) a ketone, a carboxyl and an ester. 3.2 R-Group Notation When referring to chemical structures, it is common to use the letter R to denote any grouping of molecules where a carbon or hydrogen is directly bonded to the atom shown. Different R groups are differentiated through using dashes or apostrophes (R, R0 , R00 ). Consider the following example. Example 3.4. The molecules shown below including an R group are the general forms of a carboxylic acid and an alcohol. O R C O H R0 O H Figure 3.2: The General Form of a Carboxylic Acid and an Alcohol. If an alcohol has an R0 group that is a methyl group or an ethyl group, then we have the specific alcohol methanol or ethanol respectively. The structure of these two alcohols is shown in Figure 3.3. DRAWING MOLECULAR STRUCTURE 16 H H C O H H H H H C C H H O H Figure 3.3: Methanol and Ethanol with the R groups shown in the box. Molecules can also contain multiple function groups, as with ethyl acetoacetate shown in Figure 3.4 which contains an ethyl group, an ester and a ketone. H3 C O O C C CH2 CH2 O CH3 Figure 3.4: The structure of ethyl acetoacetate which include an ethyl group, shown in blue (dashed), an ester shown in red (solid), and a ketone group shown in green (dotted). 3.3 Drawing Molecular Structure There are several ways in which chemical molecules are commonly shown, three of which are described in this section. Figure 3.2 and 3.3 show Kekulé structures, where each atom is shown and bonds between atoms are drawn as lines between the atoms. This type of diagram is very useful for molecules that contain only a small number of atoms, and make it easy to visualize the arrangement of atoms within the molecules structure. A similar type of diagram called a line-bond formula or the skeletal formula also uses lines to represent bonds, however it generally does not explicitly include carbon or hydrogen atoms. Instead, it is assumed the each bond is connected to a carbon atom unless another atom is specified. Carbon atoms are also assumed to have a total of 4 bonds, with any bonds not explicitly shown being to hydrogen atoms. Hydrogen atoms are typically not shown, unless they are part of a hydroxyl group or they are required for a particular reaction step. Like Kekulé structures, line-bond formulas clearly show the connectivity and arrangement of atoms within a molecule. As an example, the Kekulé structure and line-bond formulas for ethanol are shown in Figure 3.5. The line-bond formula for benzene is shown in Figure 3.6. DRAWING MOLECULAR STRUCTURE H H H C C H H O 17 H OH Figure 3.5: The Kekulé and line-bond formulas for ethanol. When drawing line-bond formulas, benzene rings or phenol groups are typically shown with a circle in the center to show the delocalization of the pi electrons. Figure 3.6 shows the line-bond formula of a benzene ring and an ester. Although these models do not have explicit vertices, they are useful for visualizing large molecules and for visualizing the structure without C H bonds. O O O Figure 3.6: Line-bond formulas for benzene and ethyl acetoacetate. Unlike the previous two methods, condensed formulas simply list the type and quantity of atoms. In this type of model, most or all of the bonds in a molecule are omitted and the order in which the atoms are written implies the structure. Parentheses are used to differentiate substituents from the base structure when needed. Ethyl acetoacetate shown in Figure 3.6 has the condensed formula: CH3 CHOCH2 CHOOCH2 CH3 . The functional groups in Table 3.1 are shown as condensed formulas. 3.3.1 Corey–Pauling–Koltun (CPK) Colouring A common way to differentiate atoms in a chemical structure is to use a standard colouring system for specific atoms, namely the Corey–Pauling–Koltun (CPK) colouring system which originated from a paper published by Corey and Pauling in 1952 [5]. The colours for some common atoms are shown in Table 3.2. DRAWING MOLECULAR STRUCTURE Atom Carbon Oxygen Hydrogen Nitrogen Halogens Metals 18 Colour Black Red White Blue Green Silver Table 3.2: CPK colourings for some common atoms. Chapter 4 Graph Isomorphism This chapter provides an introduction to the graph and subgraph isomorphism problems, as well as several algorithms which can be used to solve these types of problems. 4.1 Graph Isomorphism Problem Let us consider the graphs G and G0 shown in Figure 4.1. c w b a d x y z Figure 4.1: A graph G with V (G) = {a, b, c, d} and a second graph G0 with V (G0 ) = {w, x, y, z}. In Section 2.3.1, we observed that we could permute the rows and columns of the adjacency matrix of G to get the adjacency matrix of G0 . This was done through matrix multiplication and permutation matrices. Just from looking at the structure of the graphs, it can be observed that there is a mapping from V (G) to V (G0 ) that preserves adjacency and non-adjacency. That is, ij is an edge in E(G) if and only if (i) (j) is an edge in E(G0 ) where, (a) ! x (b) ! w (c) ! y (d) ! z. 19 GRAPH ISOMORPHISM PROBLEM 20 This mapping is an example of an isomorphism between the two graphs. As another example, consider the bipartite graphs from Figure 2.11 (shown below). Visually these seem very different, but when we examine the connectivity between nodes, it can be seen that ax , by are adjacent in G, if an only if they are also adjacent in G0 . Therefore, there exists an isomorphism between these two graphs. a3 b3 a3 a2 b2 a1 b2 b1 b3 a2 a1 b1 Figure 4.2: Isomorphic bipartite graphs G and G0 with partite sets A = {a1 , a2 , a3 } and B = {b1 , b2 , b3 }. Definition 4.1. Let G and H be graphs. A mapping : V (G) ! V (H) is an isomorphism between G and H if it is a bijection, and preserves the edges and non-edges. That is, for all u, v 2 V (G), uv 2 E(G) if and only if (u) (v) 2 E(H). We say G and H are isomorphic if such a mapping exists. For labeled graphs G` = (`, V, E), H` = (`0 , V 0 , E 0 ) an isomorphism must also preserve labels. That is (`(u)) = `0 ( (u)). In the context of a chemical molecule, an isomorphism that preserves labels indicates that the bond connectivity is the same, and so the molecules are identical or stereoisomers of one another. 4.1.1 Similar and Permutation Similar Matrices Definition 4.2. Let A and B be n ⇥ n matrices. Then A is similar to B if there exists an invertible matrix P such that A = P BP 1 . Recall from Section 2.3.2 that a permutation matrix is an n ⇥ n matrix with exactly one 1 in each row and each column. Definition 4.3. If A and B are matrices where there exists a permutation matrix P such that A = P BP 1 then A is said to be permutation similar to B. Recall that for every permutation matrix P we have P 1 = P T . In particular, the inverse of a permutation matrix is a permutation matrix. Hence, if A and B are permutation similar, then we can write A = P BP T where P is a permutation matrix. An example of this operation is as follows. GRAPH ISOMORPHISM PROBLEM 21 Let 2 3 0 1 0 A = 4 1 0 15 0 1 0 and 2 2 3 0 0 1 B = 4 0 0 15 . 1 1 0 3 0 1 0 The permutation matrix P = 40 0 15 can be applied to show that A and B are 1 0 0 permutation similar. 2 32 32 3 2 3 0 1 0 0 0 1 0 0 1 0 1 0 P BP T = 40 0 15 40 0 15 41 0 05 = 41 0 15 = A 1 0 0 1 1 0 0 1 0 0 1 0 Thus A and B are permutation similar. Proposition 4.4. Permutation similarity is an equivalence relation on n ⇥ n matrices. Proof. We must show that permutation similarity is reflexive, symmetric and transitive. Since the identity matrix I contains exactly one 1 in each row and each column, it is a permutation matrix. Let A be an n ⇥ n matrix A, and I be the n ⇥ n identity matrix. Since IAI 1 = A, A is permutation similar to itself, and we conclude that permutation similarity is reflexive. Let A and B be n ⇥ n matrices, where A is permutation similar to B. Then there exists a permutation matrix P such that P BP 1 = A () P BP 1 P = AP () P 1 PB = P 1 AP () B = (P 1 )A(P 1 ) 1. Thus, if A is permutation similar to B, then B is permutation similar to A, so permutation similarity is symmetric. Suppose there are n ⇥ n matrices A, B, C where for some permutation matrices P, Q where P AP 1 =B and QBQ 1 = C. Then C = QBQ 1 1 = Q(P AP = QP AP 1 )Q 1 Q 1 = (QP )A(QP ) 1 GRAPH ISOMORPHISM PROBLEM 22 Note that QP is a permutation matrix, as permutation matrices are closed under taking products. Thus A and C are permutation similar, and permutation similarity is transitive. Therefore, we conclude that the relation is an equivalence relation. Observe that the above proof works for general matrices P and Q, thus similarity is also an equivalence relation. 4.1.2 Matrix Representation of an Isomorphism Let G be a graph on n vertices with adjacency matrix A and G0 be a graph on n vertices with adjacency matrix B. Both A and B are n ⇥ n matrices. Suppose both A and B have vertices {1, 2, ..., n}. If : G ! G0 is an isomorphism, then the permutation matrix P with pi (i) = 1 for each i shows B = P AP T , by Observation 2. Conversely, if A and B are permutation similar, then G and G0 are isomorphic. This, along with Observation 2 gives the following additional observation. Observation 3. Two graphs G and G0 are isomorphic if and only if their adjacency matrices are permutation similar. Suppose B and C are symmetric matrices and suppose B is permutation similar to C. Then there is a matrix P such that B =P CP T = P (P C T )T = P (P C)T . This reformation is used by Ullmann’s algorithm for the subgraph isomorphism in Section 4.2. Let G be a graph on n vertices with adjacency matrix A, and G0 be a graph on m vertices with adjacency matrix B where m  n. Let V (G0 ) = {j1 , j2 , ..., jm } ✓ {1, 2, ..., n} = V (G). Define S to be the m ⇥ n matrix with row i containing exactly one 1 in column ji . Then row i of SA is row ji of A. These are the rows of A correspondng to the subgraph G0 . When A is multiplied on the left by the m ⇥ n matrix S, the result will be an m ⇥ n matrix, and (SA)T will therefore be an n ⇥ m matrix. Multiplying this again by S on the left, the result will be an m ⇥ m matrix, which is the adjacency matrix C of G0 . To show a graph G00 with adjacency matrix B is isomorphic to G0 we find S such that B = P CP 1 = P (S(SA)T )P 1 = P (SAT S T )P 1 = (P S)AT (S T P T ) = (P S)AT (P S)T = (P S)((P S)A)T . To recap, if S is a matrix which contains exactly one 1 in each row, and at most one 1 in each column, the result of the operation S(SA)T is the adjacency matrix of a subgraph of GRAPH ISOMORPHISM PROBLEM 23 G. Thus, to identify a subgraph of A which is isomorphic to B, requires an m ⇥ m permutation matrix P and an m ⇥ n matrix S which meets the criteria listed above. That is, C = P S((P S)A)T or C = M (M A)T where M = P S. Observe M will have exactly one 1 in each row and at most one 1 in each column, as this is a property of S and is preserved by P . Consider again the graph from Figure 2.1, and a smaller graph H. b x a c y z d e Figure 4.3: Graph G on the left, and graph H on the right. Then we have a b c d e 2 3 a 0 1 1 0 1 7 b6 6 1 0 1 1 07 6 7 c 1 1 0 1 0 A= 6 7 4 d 0 1 1 0 15 e 1 0 0 1 0 the adjacency matrix of G, and x y z 2 3 x 0 1 0 B = y 4 1 0 15 the adjacency matrix of H. z 0 1 0 To show that there is an isomorphism between G[{a, d, e}] and H, we first multiply A by the 3 ⇥ 5 matrix S, with 1’s in the columns corresponding to the vertices a, d and e. ULLMAN’S ALGORITHM 24 2 0 1 0 0 0 0 6 61 T 4 5 SAS = 0 0 0 1 0 6 61 0 0 0 0 1 40 1 2 3 1 0 1 1 0 1 1 0 1 0 0 1 1 0 1 32 1 1 6 07 760 7 07 6 60 15 4 0 0 0 0 0 0 1 0 3 0 2 3 07 0 0 1 7 4 5 07 7= 0 0 1 =C 5 0 1 1 0 1 We can see that C 6= B, however they are permutation similar as there is a permutation matrix P such that 2 32 32 3 2 3 1 0 0 0 0 1 1 0 0 0 1 0 P CP T = 40 0 1540 0 1540 0 15 = 41 0 15 = B. 0 1 0 1 1 0 0 1 0 0 1 0 Thus there is an isomorphism between H and the induced subgraph G[{a, d, e}] of G. If we relax the condition that B = P CP T to simply whenever bij = 1, then the ij entry of P CP T is also 1, then we may conclude that H is isomorphic to a subgraph of G, but it may not be induced. Consider 2 3 1 0 0 0 0 S = 40 1 0 0 05. 0 0 1 0 0 Then while 2 3 0 1 1 C = SAS T = 41 0 15, 1 1 0 2 3 0 1 0 B = 4 1 0 15 . 0 1 0 Definition 4.5. Let B and C be m ⇥ m (0, 1)-matrices. If bij  cij for all i, j, then we say that C dominates B. 4.2 Ullman’s Algorithm In this section we describe an algorithm for finding all the isomorphisms between a given graph H and subgraphs of another graph G. The algorithm uses the adjacency matrices of H and G to identify possible isomorphisms, and then verify them. Checking all possible mappings from V (H) into V (G) is computationally expensive, so a neighborhood condition is added to reduce the number of mappings and improve the efficiency of the algorithm. For the remainder of this section, we consider a given graph G with adjacency matrix A and a second graph H with adjacency matrix B where |V (H)|  |V (G)|. Let |V (G)| = m and |V (H)| = n, so A is a m ⇥ m matrix and B is an n ⇥ n matrix. ULLMAN’S ALGORITHM 4.2.1 25 The Algorithm The algorithm first computes an n ⇥ m matrix M where the entries of M are determined by comparing the degrees of vertices in G with degrees of vertices in H. mij = 8 > <1, > : 0, if the sum of entries in column j in matrix A is greater than or equal to the sum of entries in column i of matrix B; otherwise. Observe that the sum of entries in a row or column of an adjacency matrix is equal to the degree of the corresponding vertex. Therefore mij = 1 means vertex i in H has degree less than or equal to vertex j in G. This is clearly a necessary condition for vertex i to map to vertex j under a subgraph isomorphism. Symbolically, mij = ( 1 0 if deg(j) deg(i) for j 2 V (G), i 2 V (H) otherwise The matrix M may have several 1’s in each row. Taking the matrix M the algorithm considers every matrix Mk , k = 1, 2, 3, .... generated from M by choosing exactly one 1 in each row so that there is at most one 1 in each column. The matrix C = Mk (Mk A)T is the adjacency matrix of a subgraph G0 of G. If C dominates B, then B is isomorphic to a subgraph of G0 and the matrix Mk describes this isomorphism. As an example, let G and H be the following graphs. c b a d x y e Figure 4.4: Graphs G and H. The adjacency matrices of G and H are z ULLMAN’S ALGORITHM a 2 a 0 b6 61 A = c6 60 d4 0 e 0 The corresponding matrix M is 26 b 1 0 1 1 0 c 0 1 0 1 0 d 0 1 1 0 1 e 3 0 07 7 07 7 15 0 and a x 1 M = y 40 z 1 b 1 1 1 a 2 x 0 M1 = y 4 0 z 1 b 0 0 0 2 x 2 x 0 B = y41 z 0 c 1 1 1 d 1 1 1 y 1 0 1 z 3 z 1 5. 0 e 3 1 05 . 1 We see that deg(y) = 2 and that deg(a) = deg(e) = 1 therefore the corresponding entries in M are 0. There are 3 ⇥ 4 ⇥ 3 = 36 choices for Mk , two of which are shown below. Giving c 1 0 0 d 0 1 0 2 e 3 0 05 0 3 0 1 0 C = M1 (M1 A)T = 41 0 05 0 0 0 Which maps z ! a, y ! d and x ! c, however, since a and d are not adjacent, but y and z are, {a, c, d} is not a subgraph isomorphic to H. This can be seen as B dominates C, specifically because c32 = 0 but b32 = 1. a x 0 M2 = y 4 0 z 0 2 b 0 0 1 c 1 0 0 d 0 1 0 e 3 0 05 0 Working through this example in more detail, we get 0 2 3 1T 2 3 2 3 0 1 0 0 0 2 3 C 0 0 1 0 0 B 0 1 1 0 0 1 0 0 6 1 0 1 1 07 B 6 7 C 7C 4 56 4 5 D = M2 (M2 A)T = 40 0 0 1 05B B 0 0 0 1 0 60 1 0 1 07 C = 1 0 1 @ 4 5 A 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 The matrix M2 is like M1 except that it maps z ! b. Here we observe that if bij =1, then dij = 1. In this case, adjacency is preserved and H is isomorphic to a subgraph of G on {b, c, d}. Note H is not isomorphic to G[{b, c, d}] as b13 = 0 but d13 = 1. ULLMAN’S ALGORITHM 4.2.2 27 Labeling Condition A labeling condition can be added to reduce the number of 1’s in the matrix M and therefore reduce the number of matrices Mk to check for the subgraph isomorphism described above. The application of a labeling condition is natural for the case of chemical graphs which are comprised of different types of atoms. When searching for copies of a particular functional group it is important to take into consideration the type of molecule. 4.2.3 Neighbourhood Condition Another method of reducing the number of 1’s in the matrix M is to add a neighborhood condition. The idea behind this condition is to impose the restriction that a vertex in H can only be matched to a vertex in G if their neighbours are the same. This can be done both in the context of labeling, and in the context of comparing vertex degrees. Let be a vertex in V (H) where 1 , 2 , ...., s are the vertices adjacent to , and let ↵ be a vertex in V (G) where ↵1 , ↵2 , ..., ↵t are the vertices adjacent to to ↵ in V (G). If there exists an isomorphism where maps to ↵, then for each x = 1, 2, ..., s there must be a vertex ↵y that is adjacent to ↵ such that x maps to ↵y . 4.2.4 Example Consider the following graph G representing a simplified sugar molecule, and the smaller graph H on 3 vertices. The adjacency matrix B which corresponds to H is shown. The adjacency matrix A which corresponds to G is an 18 ⇥ 18 matrix, and is not shown. We show how using a labelling conditions speeds up the search for the functional group H in G. m r g n l a b h c k f d e j q o i p Figure 4.5: Graph G corresponding to a molecule of D-Galaturonic acid. ULLMAN’S ALGORITHM 28 x z y x 2 x 0 B= y 4 1 z 0 y 1 0 1 z 3 0 15 0 Figure 4.6: The graph H corresponding to a hydroxyl functional group, and its adjacency matrix B. The Matrix M We begin by defining the matrix M using the conditions set by Ullman’s algorithm. Since deg(y) = 2, the zero entries in the second row of the matrix M correspond to those vertices in G with degree less than 2. From this matrix we can see that there are 13 ⇥ 17 ⇥ 16 = 3536 ways to choose exactly one 1 from each row, each from a distinct column, and therefore there would be many possible isomorphic mappings to be checked. a b c d e f g h i j k l m n o p q r 2 3 x 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M = y 4 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 05 z 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Applying the Labeling Condition Since G represents a sugar molecule, and H represents a hydroxyl group, the vertices can be labeled according to the type of atom they represent. Both G and H use the set of labels {C, O, H}. For this example, the subscripts are only shown for ease of reference and identification. Vertices labeled with the same letter are considered to have the same label regardless of subscript. ULLMAN’S ALGORITHM 29 O7 H O1 H O6 C C O2 C O5 C C C O4 H H O3 H Figure 4.7: Graph G corresponding to a molecule of D-Galaturonic acid with labels assigned based on type of atom. When H is labeled to represent a hydroxy group attached to a carbon atom, we get the following graph and adjacency matrix B` C H O C O H 3 C 0 1 0 B` = O 4 1 0 1 5 H 0 1 0 2 Figure 4.8: The graph H corresponding to a hydroxyl functional group, and its adjacency matrix B. The labeling condition requires that mij = 1 only if `(i) = `(j). The resulting matrix M` by adding the labeling condition to the original matrix M is shown below. C C 2 C 3 C 4 C 5 C 6 O 1 O 2 O 3 O 4 O 5 O 6 O 7 H1 H2 H3 H 4 H5 2 1 3 C 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 O4 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 05 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 We can see that in M` , the only non-zero entries occur when the column and row labels indicate the same type of atom. Adding this condition allows us to reduce the number ways to choose exactly one 1 in each row, and therefore reduce the number of possible mappings to 6 ⇥ 7 ⇥ 5 = 210. THE GEMINI AND SUBGEMINI ALGORITHMS 30 Applying the Neighbourhood Condition The neighborhood condition allows us to take the labelling condition one step further. If we take the matrix M` from the previous step, we can see from examining Figure 4.7 that the columns corresponding to hydrogen atoms and carbons will not be affected by this new condition. This is because, the only condition for both hydrogen atoms and carbon atoms is that they each be adjacent to an oxygen atom. However, if we examine the vertices labeled as oxygens, we can see that not all oxygen atoms in G are adjacent to a carbon and a hydrogen as in H. It is therefore possible to change the second row entries for O1 and O7 from 1 to 0. There are now 6 ⇥ 5 ⇥ 5 = 150 mappings to check. C C 2 C 3 C 4 C 5 C 6 O 1 O 2 O 3 O 4 O 5 O 6 O 7 H1 H2 H3 H 4 H5 2 1 3 C 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 O4 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 05 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 4.3 The Gemini and SubGemini Algorithms The SubGemini Algorithm [10] was created to determine whether a larger circuit contained copies of a smaller circuit. For this algorithm, circuits were represented as graphs, where the vertices represented devices and nets, which were connected by edges. Devices and nets were labeled based on type as well as degree. The algorithm is based on the Gemini and GeminiII algorithms [7] which were both designed to find isomorphisms between graphs of the same order. This was done through partitioning the vertices into sets, following the principle outline by [6]. The labeling associated with chemical graphs make them very well suited to this particular algorithm. 4.3.1 Gemini and GeminiII The Gemini and GeminiII algorithms can simply begin by partitioning the vertices based on the original label of each vertex. Recall that partitioning a set involves grouping elements of the set into non-empty subsets where each element of the set appears in exactly one of the subsets. The Gemini algorithm, which was developed for use on electric circuits represented by bipartite graphs, starts by partitioning vertices into partite sets. Chemical graphs are rarely bipartite, however, the natural first step is to partition vertices based on the type of atom they represent. Next, each vertex is re-labeled to include information about its neighbours. For a current label Lo , we obtain a new label L+ , where + denotes concatenation, by setting X L+ = Lo + c i Li (4.1) THE GEMINI AND SUBGEMINI ALGORITHMS 31 where Li is the original label of the neighbour node, and ci is the number of bonds, or edges, between the nodes. For example, the oxygen atom in water (Figure 2.5) would have the label O1H1H, while a carbon double bonded to another carbon and attached to two hydrogen atoms would have the label C2C1H1H. The labeling conventions could follow the IUPAC standard for priority of atoms and functional groups. This relabeling procedure can then be applied to neighbours of neighbours until each partition contains only a single element. From there, the algorithm would look for an isomorphic mapping between the partitions of one graph and the partitions of the other. If such a mapping exists, then the two graphs are isomorphic. There are some exceptional cases where two vertices may remain identical through this relabeling process. Some examples are shown below. In these cases, molecules possess at least one plane of symmetry. H H H C C C H H C H H H Figure 4.9: The structure of cis-2-butene on the left, and a benzene ring on the right. Example 4.6. Benzene and cis-2-butene (Figure 4.9) are examples of molecules that contain planes of symmetry. Applying the partitioning algorithm to cis-2-butene would not be able to differentiate between the two central carbons and the two terminal carbons. In the benzene ring, all the carbon atoms would receive the same label and thus be indistinguishable for the algorithm. Definition 4.7. H-NMR or proton nuclear magnetic resonance is a form of spectroscopy which can be used to determine the structure of a molecule. The results of H NMR show peaks, whose position and number indicate the arrangement of hydrogen atoms. Hydrogen atoms in the same partition after applying the labeling procedure described above, show a peak at the same position. If we take the structure of cis-2-butene and consider only the hydrogen atoms, we can see that each of the hydrogen atoms shown in blue would receive the same label, as would the two shown in red. For example, the red hydrogen atoms would receive the label H1C2C1C.... while the blue hydrogen atoms would receive the label H1C1C1H1H.... THE GEMINI AND SUBGEMINI ALGORITHMS H H H C C C H 32 H C H H H Figure 4.10: The partitioning of hydrogen atoms in a molecule of cis-2-butene. It is also possible to determine the structure of a molecule from one of the labels. Choosing one of the red hydrogen atoms, and applying the labeling algorithm, the label would be created according to the following sequence. H, H1C, H1C2C1C, H1C2C1C1C1H1H1H1H, H1C2C1C1C1H1H1H1H1H1H1H Given a label, we can reconstruct a graph using information about the types of atoms. We begin with a hydrogen which can form only one bond, in this case to a carbon. The carbon atom forms 4 bonds, which in this case includes a double bond to another carbon. The full process for this reconstruction is shown below. THE GEMINI AND SUBGEMINI ALGORITHMS 33 H H C H C C C H H H C C C C H H H H H C C C H H C H H H Figure 4.11: Determining the structure of cis-2-butene from one of the labels. Vertices with identical labels are in the same orbit under the automorphism group of a graph G. More information about group actions on a set, and orbits can be found in the text by Bondy and Murty [2]. In the case of cis-2-butene, there are two distinct partite sets containing hydrogen atoms, and so we would expect to see two distinct peaks on an H-NMR spectrum for cis-2benzene, as is shown in Figure 4.12. THE GEMINI AND SUBGEMINI ALGORITHMS 34 Figure 4.12: An H-NMR of 2-cis-butene showing two distinct peaks [4]. The first peak indicates the two red hydrogen atoms, and the taller peak indicates the six blue hydrogen atoms. The splitting of the peaks indicates the number of nearby, nonequivalent hydrogen atoms and gives further information about the structure. Most chemical molecules will have more than one equivalent hydrogen meaning they will always have the same labels. For this reason, the Gemini algorithm would require a condition that hydrogen atoms do not need to be in singleton partitions, or would require equivalent hydrogen atoms to be omitted altogether. Example: Propene Alcohol Isomers In chemistry, the term isomer is used to refer to two molecules with the same number of each type of atom, but with different bond connectivity. There are different types of isomers such as stereoisomers and structural isomers with different characteristics and properties. An example of a pair of isomers is propen-2-ol shown in Figure 4.13 and 2-propen-1-ol shown in Figure 4.14. Both of these species contain 1 oxygen, 3 carbon and 6 hydrogen atoms. We will use the Gemini algorithm to show that these two structural isomers are not isomorphic. For this example, each labeling step will use the order O, C, H and in the case of two THE GEMINI AND SUBGEMINI ALGORITHMS 35 atoms with the same initial label, the atoms connected by multiple bonds will be listed first. g d e a h b c i f j Figure 4.13: A graph representation of propen-2-ol using CPK colouring. We begin by labeling each vertex from the set {C, O, H} based on the atom type. C ={a, b, c} O ={d} H ={e, f, g, h, i, j} The partition of vertices with the label O contains only one vertex, so it does not require further labeling. Furthermore, we will omit the hydrogen atoms for the next labeling step, as both vertex a and vertex c clearly show equivalent hydrogen atoms. Next, we re-label the remaining carbons based on those atoms immediately adjacent. C2C1H1H ={a} C1O2C1C ={b} C1C1H1H1H ={c} Since each vertex has a unique label, we stop. u q t r v x s y w z Figure 4.14: A graph representation of 2-propen-1-ol shown using CPK colouring. As we did with the first molecule, we begin by partitioning based on the type of atom. O ={y} C ={q, r, s} H ={t, u, v, w, x, z} Once again the oxygen results in a singleton partition. Omitting the hydrogen, we proceed with the labeling process. THE GEMINI AND SUBGEMINI ALGORITHMS C2C1H1H ={q} C2C1C1H ={r} 36 C1O1C2H ={s} This result shows that {d} could be mapped to {y} and {a} could be mapped to {q}, however the remaining vertices have different labels and so have different bond connectivity. Therefore, we conclude that the two isomers are not isomorphic. 4.3.2 SubGemini Although the Gemini algorithm can identify isomorphisms between two graphs with the same number of vertices, the method of identifying an isomorphism between partite sets containing single vertices does not work for graphs of different order. The goal of the SubGemini algorithm is therefore to identify a key vertex in the smaller graph and use that to create a candidate vector of all possible vertices in the larger graph that it could map to. To visualize this, consider a large organic molecule containing mostly carbon and hydrogen atoms, with a small number of oxygen and other atoms as well. If the pattern of interest contained an oxygen molecule, the SubGemini algorithm would identify the oxygen as the key vertex, and then generate a vector of all oxygen molecules with the same adjacencies within the larger molecule. This approach would be more effective than checking every possible mapping that included a carbon or a hydrogen atom which are much more numerous. The process of identifying the key vertex uses the same basic principle as the Gemini algorithm, but with the important consideration that there will be one or more vertices in the smaller graph that will be mapped to a vertex in the larger graph with more edges. More specifically, let G and H be connected graphs such that |V (G)| |V (H)| and suppose that an isomorphism exists between H and the subgraph G0 of G. Then there is at least one vertex in G0 that is connected to vertices in G that are not in G0 . Subgemini accounts for this type of vertex by marking it as “corrupted” meaning that it does not undergo further labeling. These corrupted vertices can not be used as the key vertex, and if the next iteration of the labeling process for a particular vertex v would include a corrupted vertex, the labeling process terminates for v. At this point, the label for vertex v is complete. The outline of the algorithm described by Ohrich, Ebeling, Ginting and Sather [10] is as follows. • Use the labeling procedure to partition the vertices in H until all labels are complete and the parts contain only a single vertex. • Label the vertices in G using the same number of iterations used to label the vertices in H. • Choose the partition of G with the fewest elements to be the candidate vector. The vertex in H with the corresponding label is the key vertex K. • If there is more than one partition of V (G) with the smallest number of elements, K can be selected arbitrarily. THE GEMINI AND SUBGEMINI ALGORITHMS 37 • Check for isomorphisms between H and subgraphs of G containing vertices in the candidate vector. In some cases, identifying the key vertex in the smaller graph may be obvious just from the initial label; however, the labeling process is still valuable as it can further reduce the size of the candidate vector based on those vertices adjacent to it as is shown the following example. Example - Ethyl Acetoacetate O2 O3 C C O C C C C1 C O1 C C X X C2 O4 C C C N+ O5 Figure 4.15: 4-Nitrobenzyl Acetoacetate and the carbonyl pattern. The dashed lines and vertices labeled X are corrupted vertices. The subscripts are shown for ease of identification. Key Vertex To begin, the labeling process is applied to the carbonyl pattern using initial labels only, giving {O} and {C1 , C2 }. For the next step, the oxygen would receive the label O2C, however both carbons are corrupted at this step and so do not receive labels. Although it was possible to identify oxygen as the key vertex based solely on the initial labels, it is important to proceed with the labeling to reduce the number of vertices in the subsequent candidate vector. Candidate Vector Since the vertices in the candidate vector must have labels that begin with O, we will only consider the vertices in 4-nitrobenzyl acetoacetate which correspond to oxygen atoms. After the initial partitioning we have the partition. {O1 , O2 , O3 , O4 , O5 } Had we not fully labeled the key vertex, we would need to check for isomorphisms involving all of these oxygen atoms, however the additional labeling step allows us to eliminate some of THE GEMINI AND SUBGEMINI ALGORITHMS 38 these vertices. The next labeling iteration give us the following partitions. O2N ={O4 } O1N ={O5 } O1C1C ={O1 } O2C ={O2 , O3 } So our final candidate vector, {O2 , O3 } contains only 2 vertices as opposed to 5. From this it can be determined that there are multiple subgraph isomorphisms between 4-nitrobenzyl acetoacetate and the carbonyl pattern. Chapter 5 Reaction Networks Reactions networks are typically modeled in one of two ways. In a chemical reaction network, or CRN, the vertices represent the chemical species, and the edges represent the reactions occurring between them [14]. In a reaction route network, or RRN, a system is modeled as a bipartite digraph, where chemical species and chemical reactions are modeled as partite sets of nodes. In this model, species nodes are only adjacent to reaction nodes and vice versa. Species 1 Species 2 Reaction 1 Species 1 Species 2 Intermediate 1 Intermediate 1 Product 1 Product 2 Reaction 2 Reaction 3 Product 1 Product 2 Figure 5.1: The same simple chemical system modeled as a CRN and as an RRN. There are benefits to both models, depending on the specifics of the reaction and system being modeled. 5.1 Orlova’s Algorithm In this section we present the algorithm from Orlova’s thesis [11] which also appears in the paper [12]. The algorithm overview is as follows. 39 REACTION NETWORK FOR ETHYL ACETOACETATE 40 Algorithm 1: Algorithm input: list of initial chemical species and complete list of reaction rules/transformations 1 update = true; 2 while update=true do 3 update=false; 4 for 1  i  # of reaction rules do 5 Find all candidates to undergo reaction i from the current list of chemical species 6 Apply the transformation to create products 7 end 8 if A product is not in the list of chemical species then 9 Add the product or products to the list 10 update=true; 11 end 12 end 5.1.1 Algorithm Components Line 5: To find candidates for a particular reaction, the algorithm searches the current list of chemical species to determine which species have the requisite pattern or patterns. This is done by employing Ullmann’s algorithm to identify isomorphisms between the required pattern and subgraphs of the molecular graphs on specific vertices. Line 8: To verify if any of the species generated by the previous step have already been included in the list of chemical species the algorithm checks for graph isomorphism between the new and existing chemical molecules. Line 6: Pre-determined reaction rules and transformations are applied to reaction candidates by replacing or altering portions of the reacting species. An example of such a transformation is shown below. 5.2 Reaction Network for Ethyl Acetoacetate This section details an example of a reaction network constructed using the algorithm in Section 5.1. A complete list of the patterns and reaction rules is shown, as well as the reaction network that is generated. 5.2.1 Input: Initial Species For this example, we start with only two chemical species, ethyl acetoacetate and methanol. The structure of these two molecules is shown in Figure 5.2. REACTION NETWORK FOR ETHYL ACETOACETATE O O C C H3 C CH2 41 CH2 O CH3 H3 C O H Figure 5.2: The structure of ethyl acetoacetate and methanol. 5.2.2 Input: Patterns In order to create the list of reaction rules, we must identify the patterns capable of undergoing reactions in this system. Some molecules, like the initial species ethyl acetoacetate, may have more than one pattern and so be a candidate species for more than one reaction. Pattern 1. Ketone - Keto Form O R C R0 C Pattern 2. Alcohol R CH2 O H Pattern 3. Ketone - Enol Form O R C R0 C Pattern 4. Acetoacetate Enolate O O C C R C Pattern 5. Carbon Dioxide O O C Pattern 6. Carbon Chain CH3 (CH2 )n CH3 R0 O REACTION NETWORK FOR ETHYL ACETOACETATE 42 Pattern 7. Acetone Enolate O H2 C C CH3 Pattern 8. Diacetone Alcohol ··· O O C C ··· C H H C··· Pattern 9. Water O Pattern 10. Ester O R0 C R O Pattern 11. Carboxylic Acid O C R 5.2.3 H O Input: Reaction Rules From the patterns we generate a list of reaction rules. The Reaction number and the patterns involved in each reaction are summarized in Table 5.1. Colours are used in each reaction to show the parts of each pattern that are undergoing transformation. Reaction 1. Keto-Enol Tautomerization O R C O R0 Reaction 2. Decarboxylation ! R C R0 REACTION NETWORK FOR ETHYL ACETOACETATE 43 O O O C R0 C R C ! O R C + O O R 0 + H2 C C CH3 O R O C C CH3 ! C C C R00 ! R0 O C C R R R0 + HO R00 R00 O Reaction 1 2 3 4 5 6 R00 C ↵ R O Reaction 6. Ester Exchange (Transesterification) O C +H H O O R0 + HO C CH3 R0 Reaction 4. Aldol Condensation O O R C R0 Reaction 5. Alcoholysis O CH3 (CH2 )n CH3 O C C + C Reaction 3. Aldol Addition O R C C R00 R R00 + HO R0 O O ↵ C R Reactants Pattern 1 Pattern 4 Pattern 1 + Pattern 7 Pattern 8 Pattern 3 + Pattern 2 Pattern 10 + Pattern 2 R0 + HO O Products Pattern 3 Pattern 1 + Pattern 5 + Pattern 6 Pattern 8 Pattern 1 + Pattern 9 Pattern 1 Pattern 2 + Pattern 10 Table 5.1: Summary of Reactions. REACTION NETWORK FOR ETHYL ACETOACETATE 44 Example 5.1. The ester exchange transformation between ethyl acetate and methanol, and the resulting products. One of the reactions that occurs as a part of this chemical network is an ester exchange between ethyl acetate and methanol, shown below. In Line 2 of the algorithm, Ullman’s algorithm would identify pattern 2 in methanol and pattern 10 in ethyl acetate. Both species are shown below. O CH2 C H3 C O CH3 + HO CH3 The R groups of each molecule are highlighted in their corresponding adjacency matrices shown below. The adjacency matrix for methanol has the R group highlighted in blue and the atom where it attaches to the pattern is highlighted in green. C C C C O O 2 3 C 0 1 0 0 0 0 C O H 7 2 3 C6 61 0 0 0 2 17 C 0 1 0 6 7 C6 0 0 0 1 0 1 7 O4 1 0 1 5 7 C6 60 0 1 0 0 07 H 0 1 0 O4 0 2 0 0 0 0 5 O 0 1 1 0 0 0 In order to more easily visualize the transformation for ethyl acetate, we can permute the rows and columns of the adjacency matrix to move the submatrix corresponding to the R group to the top left hand corner of the matrix. Note that the only other non zero entry in the first two rows and columns is the atom where that group connects to the rest of the molecule. This entry is highlighted in green. C C C C O O 2 3 C 0 1 0 0 0 1 C O H 7 2 3 C6 61 0 0 0 0 07 C 0 1 0 7 C6 0 0 0 1 0 0 6 7 O4 1 0 1 5 7 C6 0 0 1 0 2 1 6 7 H 0 1 0 O4 0 0 0 2 0 0 5 O 1 0 0 1 0 0 After the candidate species have been identified, the transformation is applied. From the list of reaction rules, we know that the reaction results in the R groups of the two molecules being exchanged, resulting in the products shown below. REACTION NETWORK FOR ETHYL ACETOACETATE 45 O CH3 C H3 C C C C O O 3 C 0 0 0 0 1 7 C6 60 0 1 0 07 7 C6 0 1 0 2 1 6 7 4 O 0 0 2 0 05 O 1 0 1 0 0 2 5.2.4 O + HO CH2 CH3 C C O H 2 3 C 0 1 1 0 7 C6 61 0 0 0 7 O4 1 0 0 1 5 H 0 0 1 0 Network This reaction network is bipartite, with the chemical species shown in round nodes, while the reaction nodes are shown in green boxes. The input species, ethyl acetoacetate and methanol, are shown in purple, while the final nonreactive products are shown in red. The intermediate species, which are those chemical species that are generated by the algorithm but also having patterns which allow them to react further, are those species shown in the black circles. A few reactions have been omitted from this diagram because their products are already a part of the network. For instance, methyl acetoacetate is a candidate for Reaction 6 with methanol, however the products of that reaction would be methyl acetoacetate and methanol which are already in the network. REACTION NETWORK FOR ETHYL ACETOACETATE Ethyl Acetoacetate 46 Methanol R1 Ethyl Acetoacetate Enolate R6 R5 Methyl Acetoacetate Methyl Acetate R12 Methyl Acetoacetate Enolate R21 R2 Ethane Acetone CO2 Methane R11 Acetone Enolate R3 Diacetone Alcohol R4 H2 O Mesityl Oxide Figure 5.3: The reaction network generated by Orlova’s Algorithm generated from ethyl acetoacetate and methanol. REACTION NETWORK FOR ETHYL ACETOACETATE 5.2.5 47 Output: List of Chemical Species Intermediate Species Species Name Ethyl Acetoacetate Enolate Methyl Acetoacetate Methyl Acetoacetate Enolate Acetone Acetone Enolate Diacetone Alcohol Patterns Pattern 3, Pattern 4 Pattern 1 Pattern 7, Pattern 4 Pattern 1 Pattern 7 Pattern 8 Condensed Formula CH3 C(OH) = CHCOOC2 H5 CH3 COCH2 COOCH3 CH3 COCH = C(O )CH3 (CH3 )2 CO CH2 C(O)CH2 CH3 C(O)CH2 C(OH)(CH3 )2 Table 5.2: The reaction intermediates from the reaction network for ethyl acetoacetate and methanol. Non-Reactive Final Products Species Name Methyl Acetate* CO2 Ethane Methane H2 O Mesityl Oxide * * Patterns Pattern 1 Pattern 5 Pattern 6 Pattern 6 Pattern 9 Pattern 1 Condensed Formula CH3 COOCH3 CO2 CH3 CH3 CH4 H2 O (CH3 )2 C = CHCOCH3 Table 5.3: Final products of the reaction network for ethyl acetoacetate and methanol. *Methyl acetate can undergo Reaction 6 with methanol, however the products of that reaction will be the same as the inputs. ** Mesityl Oxide is a candidate for Reaction 1, however, its enol form does not react further in this network. Chapter 6 Conclusion Chemical graph theory is a field that requires an understanding of both graph theory and chemistry. Questions that are studied in one area, often have parallels to topics of interest in the other. An example of such a parallel is the subgraph isomorphism problem discussed in this paper. The subgraph isomorphism problem is directly related to chemistry, on account of the importance of functional groups in chemical reactions. Some features of chemical graphs, such as the natural partitioning of vertices into groupings based on atom type, make the algorithms used in graph theory particularily effective when applied to chemical graphs. The algorithms described in this paper focus primarily on the structural characteristics of chemical species, however there are applications of these concepts to other aspects of chemistry including reaction rates and kinetics. There may also be improvements to consider for the reaction network algorithm described by Orlova [11]. For instance, the current algorithm does not track which reactions have already been considered, and so each time a new chemical species is added to the list, each reaction rule will be applied to all species with the requisite patterns, even those that have already been checked. It would be possible to generate a list of species that have already had specific transformations applied so that when a new species is added, only those species that have not reacted with one another will be considered. It is possible, however, that the size of most reaction networks mean that improvements like this do not significantly improve time or efficiency of the algorithm. Graph theory provides many useful tools and techniques which can be used to effectively model molecular structure and chemical reactions. A few of these techniques were discussed in this paper, however there are many more applications to many different areas of chemistry. The algorithms described in this paper focus primarily on the structural characteristics of chemical species, however there are also applications of these concepts to areas such as reaction rates and reaction kinetics, and the physical and chemical properties of specific molecules. 48 Bibliography [1] N. Biggs, E.K. Lloyd, and R.J. Wilson. Graph Theory, 1736-1936. Clarendon Press, 1986. isbn: 9780198539162. url: https://books.google.ca/books?id=XqYTk0sXmpoC. [2] J. A. Bondy and U. S. R. Murty. Graph Theory With Applications. Elsevier Science Publishing Co., 1982. [3] Arthur Cayley. “On the mathematical theory of isomers”. In: The Collected Mathematical Papers. Cambridge Library Collection - Mathematics. Cambridge University Press, 2009, pp. 202–204. [4] 2-BUTENE(624-64-6) 1H NMR. Chemical Book, CAS DataBase List. url: https:// www.chemicalbook.com/SpectrumEN_590-18-1_1HNMR.htm. [5] Robert B. Corey and Linus Pauling. “Molecular Models of Amino Acids, Peptides, and Proteins”. In: Review of Scientific Instruments 24.8 (Aug. 1953), pp. 621–627. issn: 00346748. doi: 10.1063/1.1770803. eprint: https://pubs.aip.org/aip/rsi/articlepdf/24/8/621/19099249/621\_1\_online.pdf. url: https://doi.org/10.1063/1. 1770803. [6] D.G. Corneil and C.C. Gotlieb. “An Efficient Algorithm for Graph Isomorphism”. In: Journal of the Association for Computing Machinery (1970). [7] C. Ebeling. “GeminiII: a second generation layout validation program”. In: [1988] IEEE International Conference on Computer-Aided Design (ICCAD-89) Digest of Technical Papers. 1988, pp. 322–325. doi: 10.1109/ICCAD.1988.122520. [8] Ivan Gutman et al. “Why plerograms are not used in chemical graph theory? The case of terminal-Wiener index”. In: Chemical Physics Letters 568-569 (2013), pp. 195–197. issn: 0009-2614. doi: https : / / doi . org / 10 . 1016 / j . cplett . 2013 . 03 . 045. url: https://www.sciencedirect.com/science/article/pii/S0009261413003771. [9] Compiled by Alan D. McNaught IUPAC and Andrew Wilkinson. Compendium of Chemical Terminology: IUPAC Recommendations 2nd ed. Blackwell Scientific Publications, 2019. 49 BIBLIOGRAPHY 50 [10] Miles Ohlrich et al. “SubGemini: identifying subcircuits using a fast subgraph isomorphism algorithm”. In: Proceedings of the 30th International Design Automation Conference. DAC ’93. Dallas, Texas, USA: Association for Computing Machinery, 1993, pp. 31– 37. isbn: 0897915771. doi: 10.1145/157485.164556. url: https://doi.org/10.1145/ 157485.164556. [11] Yuliia Orlova. “Graph-theoretical approach to algorithmic construction of complex reaction networks”. PhD thesis. Sept. 2020. [12] Yuliia Orlova, Ivan Kryven, and Piet D. Iedema. “Automated reaction generation for polymer networks”. In: Computers & Chemical Engineering 112 (2018), pp. 37–47. issn: 0098-1354. doi: https : / / doi . org / 10 . 1016 / j . compchemeng . 2018 . 01 . 022. url: https://www.sciencedirect.com/science/article/pii/S0098135418300462. [13] J. R. Ullmann. “An Algorithm for Subgraph Isomorphism”. In: J. ACM 23.1 (Jan. 1976), pp. 31–42. issn: 0004-5411. doi: 10.1145/321921.321925. url: https://doi.org/10. 1145/321921.321925. [14] J.J.P. Veerman, T. Whalen-Wagner, and Ewan Kummel. “Chemical reaction networks in a Laplacian framework”. In: Chaos, Solitons Fractals 166 (2023), p. 112859. issn: 0960-0779. doi: https : / / doi . org / 10 . 1016 / j . chaos . 2022 . 112859. url: https : //www.sciencedirect.com/science/article/pii/S0960077922010384.