h?2 lb2 Q7 :`T? AbQKQ`T?BbKb BM *?2KB+H :`T?
h?2Q`v
#v

J2;M >MFb
 i?2bBb bm#KBii2/ BM T`iBH 7mH7BHHK2Mi
Q7 i?2 `2[mB`2K2Mib 7Q` i?2 /2;`22 Q7
"+?2HQ` Q7 a+B2M+2 U>QMbXV
BM i?2
.2T`iK2Mi Q7 Ji?2KiB+b  aiiBbiB+b

ÜJ2;M >MFb kyk8

q2 ++2Ti i?Bb i?2bBb b +QM7Q`KBM; iQ i?2 `2[mB`2/ biM/`/b,

.`X _B+?`/ "`2rbi2`
.2TiX Q7 Ji?2KiB+b  aiiBbiB+b
h?2bBb amT2`pBbQ`
.`X Gm+b JQH
.2TiX Q7 Ji?2KiB+b  aiiBbiB+b
.`X C2bbB+ HHBM;?K
.2TiX Q7 *?2KBbi`v
.i2/ CmHv k kyk8- EKHQQTb- "`BiBb? *QHmK#B- *M/

THOMPSON RIVERS UNIVERSITY
DEPARTMENT OF MATHEMATICS & STATISTICS
Permission is herewith granted to Thompson Rivers University to circulate and to have copied
for non-commercial purposes, at its discretion, the above title upon request of individuals or
institutions.

-------------------------------Signature of Author

the author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the
author’s written permission.
the author attests that permission has been obtained for the use of
any copy-righted material appearing in this thesis (other than brief excerpts
requiring only proper acknowledgement in scholarly righting) and that all
such use is clearly acknowledged.

ii

Abstract
This thesis examines the use of graphs as models for chemical molecules and processes. We
begin by defining chemical graphs and their properties.
We then define the subgraph isomorphism problem and examine two algorithms which
can be used to solve it: Ullmann’s algorithm and the SubGemini algorithm. We further discuss
the application of these algorithms to chemical graph theory and provide examples for each.
In addition to structural models, we describe an algorithm for the identification of
intermediates and products of a reaction modeled as a network. The algorithm is demonstrated
by means of an example.

iii

Acknowledgements
I would like to extend my sincere gratitude to my supervisor, Dr. Richard Brewster. This
project would not have been possible without his guidance, expertise and patience.
I would also like to thank my committee members, Dr. Lucas Mol and Dr. Jessica
Allingham for their time and feedback on this project.
Finally I would like to thank the Department of Mathematics and Statistics at Thompson Rivers University, and all the faculty members from whom I have gained a deeper appreciation for mathematics and research.

iv

Contents
Abstract

iii

Acknowledgements

iv

1 Introduction

1

2 Background in Graph Theory
2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Other Types of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Chemical Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Subgraphs and Induced Subgraphs . . . . . . . . . . . . . . . . . . . . .
2.2.4 Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Adjacency Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Other Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .

2
2
5
5
6
8
10
10
10
12

3 Background in Chemistry
3.1 Common Molecules and Functional Groups . . . . . . . . . . . . . . . . . . . .
3.2 R-Group Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Drawing Molecular Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Corey–Pauling–Koltun (CPK) Colouring . . . . . . . . . . . . . . . . . .

14
15
15
16
17

4 Graph Isomorphism
4.1 Graph Isomorphism Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Similar and Permutation Similar Matrices . . . . . . . . . . . . . . . . .
4.1.2 Matrix Representation of an Isomorphism . . . . . . . . . . . . . . . . .
4.2 Ullman’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Labeling Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 Neighbourhood Condition . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19
19
20
22
24
25
27
27
27

v

4.3

The Gemini and SubGemini Algorithms . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Gemini and GeminiII . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 SubGemini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30
30
36

5 Reaction Networks
5.1 Orlova’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Algorithm Components . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Reaction Network for Ethyl Acetoacetate . . . . . . . . . . . . . . . . . . . . .
5.2.1 Input: Initial Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 Input: Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Input: Reaction Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.5 Output: List of Chemical Species . . . . . . . . . . . . . . . . . . . . . .

39
39
40
40
40
41
42
45
47

6 Conclusion

48

References

48

vi

List of Figures
2.1
2.2

A graph on 5 vertices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The graph on the left is labeled from the set {C, O, N }. The graph on the right
uses colours and shapes to distinguish vertices. . . . . . . . . . . . . . . . . . .
2.3 The same underlying graph with two different colourings. The colouring on the
left is a proper colouring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 A directed graph showing a simple chemical reaction involving two reactants
and a single product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 A graph on 3 vertices, and a labeling of that same graph to model a molecule
of water (H2 O). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Graphs representing carbon dioxyde (CO2 ) and ozone (O3 ). . . . . . . . . . . .
2.7 The structure of D-Galacturonic Acid. . . . . . . . . . . . . . . . . . . . . . . .
2.8 D-Galacturonic Acid with some hydrogen atoms removed. . . . . . . . . . . . .
2.9 The graph G from Figure 2.1 and two smaller graphs, G0 in the center and G00
on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Two induced subgraphs of G. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.11 Two drawings of a bipartite graph with with partite sets A = {a1 , a2 , a3 } and
B = {b1 , b2 , b3 }. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.12 The graph G and its corresponding adjacency matrix A. . . . . . . . . . . . . .
2.13 Another graph, G0 with corresponding adjacency matrix B. . . . . . . . . . . .
2.14 The structure of P and P T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
3.2
3.3
3.4
3.5
3.6
4.1

The structure of (from left to right) a ketone, a carboxyl and an ester. . . . . .
The General Form of a Carboxylic Acid and an Alcohol. . . . . . . . . . . . . .
Methanol and Ethanol with the R groups shown in the box. . . . . . . . . . . .
The structure of ethyl acetoacetate which include an ethyl group, shown in blue
(dashed), an ester shown in red (solid), and a ketone group shown in green
(dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Kekulé and line-bond formulas for ethanol. . . . . . . . . . . . . . . . . . .
Line-bond formulas for benzene and ethyl acetoacetate. . . . . . . . . . . . . .
A graph G with V (G) = {a, b, c, d} and a second graph G0 with V (G0 ) =
{w, x, y, z}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii

3
4
4
5
6
7
7
8
9
9
10
11
11
12
15
15
16
16
17
17
19

4.2

Isomorphic bipartite graphs G and G0 with partite sets A = {a1 , a2 , a3 } and
B = {b1 , b2 , b3 }. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Graph G on the left, and graph H on the right. . . . . . . . . . . . . . . . . . .
4.4 Graphs G and H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Graph G corresponding to a molecule of D-Galaturonic acid. . . . . . . . . . .
4.6 The graph H corresponding to a hydroxyl functional group, and its adjacency
matrix B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7 Graph G corresponding to a molecule of D-Galaturonic acid with labels assigned
based on type of atom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8 The graph H corresponding to a hydroxyl functional group, and its adjacency
matrix B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9 The structure of cis-2-butene on the left, and a benzene ring on the right. . . .
4.10 The partitioning of hydrogen atoms in a molecule of cis-2-butene. . . . . . . . .
4.11 Determining the structure of cis-2-butene from one of the labels. . . . . . . . .
4.12 An H-NMR of 2-cis-butene showing two distinct peaks [4]. . . . . . . . . . . . .
4.13 A graph representation of propen-2-ol using CPK colouring. . . . . . . . . . . .
4.14 A graph representation of 2-propen-1-ol shown using CPK colouring. . . . . . .
4.15 4-Nitrobenzyl Acetoacetate and the carbonyl pattern. The dashed lines and
vertices labeled X are corrupted vertices. The subscripts are shown for ease of
identification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
5.2
5.3

The same simple chemical system modeled as a CRN and as an RRN. . . . . .
The structure of ethyl acetoacetate and methanol. . . . . . . . . . . . . . . . .
The reaction network generated by Orlova’s Algorithm generated from ethyl
acetoacetate and methanol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

20
23
25
27
28
29
29
31
32
33
34
35
35
37
39
41
46

List of Tables
3.1
3.2

Common Functional Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPK colourings for some common atoms. . . . . . . . . . . . . . . . . . . . . .

15
18

5.1
5.2

Summary of Reactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The reaction intermediates from the reaction network for ethyl acetoacetate and
methanol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Final products of the reaction network for ethyl acetoacetate and methanol. . .

43

5.3

ix

47
47

Chapter 1

Introduction
Graph theory is an area of mathematics that studies structures composed of points and the
connections between them. These structures, which are referred to as graphs, have many
applications and are used for modeling the relationships between objects in a wide array of
different fields. Some applications of graph theory include modeling social networks, route
optimization, search and sort type algorithms and even language structure. One of the first
mathematicians to employ this type of modeling technique was Arthur Cayley [3]. In 1874, he
published a paper detailing a method for enumerating isomers using diagrams involving linked
points. From these types of linked-point models came the field of chemical graph theory which
applies graph theory to the modeling of molecules, chemical properties, and reactions.
In this paper, we define chemical graphs and the ways in which graph theory can be
applied to the study of chemical molecules and reactions. This includes the use of labeling and
hydrogen-reduction to most efficiently model the structure of individual molecules, as well as
the parts of a molecule which undergo reactions.
We also discuss the subgraph isomorphism problem and how it relates to the identification of functional groups in a molecule. Two algorithms for subgraph isomorphism, Ullmann’s
Algorithm [13] and the SubGemini Algorithm [10], are described along with conditions for
their application to chemical graphs. Each algorithm is also applied to chemical examples.
Finally, we describe reaction networks, and an algorithm for finding all the chemical
species and transformations that occur within that network [11]. This algorithm is demonstrated using an example with two initial species: methanol and ethyl acetoacetate.

1

Chapter 2

Background in Graph Theory
We will define key terms from graph theory and chemistry used in this thesis following Bondy
and Murty for graph theory [2], and IUPAC (the International Union of Pure and Applied
Chemisty) gold book for chemistry [9]. We refer the reader to these texts for terminology not
defined in this thesis.

2.1

Basic Definitions

Definition 2.1. A graph G = (V, E, G ) is a non-empty set of vertices V together with a set
of edges, E. The incidence function G associates each edge in E with an unordered pair of
vertices in V . If G (e) = {u, v} then we say that e joins u and v. For the purposes of this
paper, a pair of vertices {u, v} will be denoted as uv.
All graphs in this work are finite, which means that they contain a finite number of
vertices and edges. Although a graph is an ordered triple, in this paper a graph will be denoted
by G = (V, E) where the incidence function G is not explicitly stated.

Example 2.2. The following is an example of a graph G with V (G) = {a, b, c, d, e}, E(G) =
{e1 , e2 , e3 , e4 , e5 , e6 , e7 } and G is indicated by the diagram. For example G (e1 ) = ab,
G (e2 ) = ac etc.

2

BASIC DEFINITIONS

3
b
e1
a

e4

e3
e2
c

e7

e5
d

e6

e
Figure 2.1: A graph on 5 vertices.
If (e) = uv, the edge e is said to be incident to u and v. The vertices u and v are
the end points of e. In Figure 2.1 e2 , e3 and e5 are incedent to c. Two edges that share an
endpoint are said to be adjacent, as are two vertices connected by an edge. In Figure 2.1 a, c
and d are adjacent to b.
Definition 2.3. The number of edges incident to a particular vertex v is called the degree of
v, which is denoted as deg(v). The maximum degree of G, denoted (G), is the highest degree
of any vertex in G. Conversely, the minimum degree of G, (G), is the lowest degree of any
vertex in G.
In the graph G drawn in Figure 2.1, deg(a) = 3 and deg(e) = 2. Since the maximum
degree of any vertex in G is 3, we have (G) = 3. It is possible for a vertex to have a degree
of zero, as would be the case for the lone vertex in the graph containing only one vertex and
no edges.
It is interesting to note that in early graph theory works, the degree was called the
valence of a vertex. In chemistry, the valence of an atom is the number of bonds that the atom
is able to form with other atoms. It is directly related to the valence electrons which are the
outermost electrons of the atom.
Definition 2.4. The number of vertices in a graph G is called the order of G. The number
of edges in G is called the size of G.
Unlike (G), the order of a graph must always be greater than or equal to 1, since the
set of vertices V is non-empty. Many papers on chemical graph theory use the word size to
refer to the number of vertices in a graph, however this paper will use the definitions as given
above. Generally, the vertices of a graph are distinguished only by their connections to other
vertices. However, it is possible to assign labels to further distinguish between vertices.
Definition 2.5. Given a graph G = (V, E) and a set of labels L, a vertex labeled graph or
simply a labeled graph is a triple G` = (`, V, E) where ` : V ! L assigns each vertex v a label

BASIC DEFINITIONS

4

`(v). An edge labeled graph is similar, except that labels are assigned to edges. As an abuse of
notation, we may write G to denote either the graph or the labeled graph.
Note that the vertices in the graph G from Figure 2.1 have unique labels to distinguish
them. This is not always the case, and more than one vertex in the same graph can have
identical labels as in Figure 2.2.
C

C

C

N

O

C

Figure 2.2: The graph on the left is labeled from the set {C, O, N }. The graph on the right
uses colours and shapes to distinguish vertices.
A labeled graph G` = (`, V, E) has an underlying graph G = (V, E). Two labeled graphs
may have the same underlying graph and differ only in their labeling.
A special case of graph labeling is graph colouring. Colours can be used simply to
differentiate between vertices in the same way that labels do, however a proper graph colouring
refers to a graph where colours are assigned to labels so that no two adjacent vertices have the
same label. Consider the underlying graph G shown below.

Figure 2.3: The same underlying graph with two different colourings. The colouring on the
left is a proper colouring.
Labeled graphs are particularly useful for modeling chemical molecules, as the labels

OTHER TYPES OF GRAPHS

5

allow for differentiation between types of atoms in a structure. In reaction networks, nodes
can be labeled as chemical species and reaction types.

2.2

Other Types of Graphs

2.2.1

Digraphs

Generally, the edges of a graph are considered to be bidirectional, meaning it goes in both
directions; however, there are times when it is useful to specify the direction of an edge. This
can be thought of as the flow from one vertex to another. In all the graphs discussed so far
in this paper, the edges are bidirectional. Such graphs are called undirected when needed, to
distinguish from directed graphs.
There are applications for this type of model, such as the flow of traffic along roads, or
the flow of messages in a social network. Flow is important for chemical graph theory when it
comes to modeling the flow of reactions within a chemical system. This direction of flow can
be represented by an arrow, as in Figure 2.4.

Reactant 2

Reaction 1

Product

Reactant 1

Figure 2.4: A directed graph showing a simple chemical reaction involving two reactants and
a single product.
Formally a directed graph or digraph is a type of graph G which differs from an undirected graph in that edges between vertices are ordered pairs and are typically referred to as
arcs. Each arc has a specified orientation. This is to say that if u, v 2 V (G) and uv is an arc
in G, then u is the tail of uv and v is the head. Alternatively, if (e) = (u, v) then e is said to
join u to v, but does not join v to u.
A network is a digraph where vertices are considered to be sources, sinks or intermediate
vertices. A source is a vertex that initiates a process and all edges incident to that vertex are
directed away from it. A sink is a vertex that only has in directed arcs, while intermediate

OTHER TYPES OF GRAPHS

6

vertices have arcs directed toward and away from them.
In Example 2.4, the sources are Reactant 1 and Reactant 2, while the sink is the
Product vertex. The only intermediate vertex is the Reaction 1 vertex. In the case of reaction
networks, the sources will be any initial reactants in a system, while the sinks will be any
final non-reactive products. Intermediate vertices will include reaction steps and intermediate
chemical species. Figure 2.4 is an example of a very simple reaction route network or RRN.
These will be discussed in greater detail in Chapter 5.

2.2.2

Chemical Graphs

In addition to chemical reactions, chemical molecules can also be modeled as graphs. In some
of his later works, Arthur Cayley referred to these diagrams representing chemical structures as
plerograms. In chemical graphs, nodes represent atoms, and edges represent the bonds between
atoms. Labeling is used to differentiate between different atoms and groupings of atoms. The
following example shows the importance of labeling when modeling a chemical molecule as a
graph.
Example 2.6. The figure on the left is an example of a graph on three vertices with V (G) =
{u, v, w} and E(G) = {uv, vw}. By labeling the vertices as `(u) = H, `(v) = O and `(w) = H,
the graph can be used to model a molecule of water with chemical formula H2 O. In this figure
colors have been added which correspond to the label of the vertices.
u

w

H

H

v
O
Figure 2.5: A graph on 3 vertices, and a labeling of that same graph to model a molecule of
water (H2 O).
We can see that the labels allow for the distinction of different atoms in the molecule.
The following are two more examples of graphs with the same set of vertices, but a different
labeling. The molecules represented are carbon dioxide (CO2 ) and ozone (O3 ). Note the
additional edges that have been added to reflect the bonds present in these two molecules. As
a result, the underlying graph will have different edges with the same end points. Such edges
are called parallel.

OTHER TYPES OF GRAPHS

7

O

O

O

O

O+

C

Figure 2.6: Graphs representing carbon dioxyde (CO2 ) and ozone (O3 ).
For larger and more complex molecules, there are times when it is appropriate to omit
certain atoms, such as hydrogen atoms, or to use a labeled vertex to represent multiple atoms
rather than a single atom. In fact, in chemical graph theory, it is more common to omit
hydrogen atoms than it is to use true plerograms which show every atom as an individual
vertex [8]. Consider the molecular structure of D-Galacturonic Acid shown in Figure 2.7.
O
H
H

H
O

C
O

O

C

C

C

C

O
H

H

H
C

H
O

O
H

H

H
Figure 2.7: The structure of D-Galacturonic Acid.
We can see that in the full molecule of Galacturonic acid we have 23 nodes, and therefore
the corresponding adjacency matrix, which is defined in Section 2.3.1, is a 23 ⇥ 23 matrix. By
removing the hydrogen atoms attached to the carbon atoms (shown in red),the reduced graph
has 18 nodes and thus the adjacency matrix is an 18 ⇥ 18 matrix.

OTHER TYPES OF GRAPHS

8
O

H

C
O

O
H

H

O

O

C

C

C

C
C

O
H

O
H
Figure 2.8: D-Galacturonic Acid with some hydrogen atoms removed.
Although there are cases where this will not be possible, it is possible to remove hydrogen atoms from most models while preserving the relevant aspects of the molecules structure.
In fact, Arthur Caley coined the term kenogram to refer to “hydrogen suppressed” graphs [1]
and it has been noted that these are used far more often than plerograms [8]. If required, it is
also possible to represent small groupings of atoms as a single vertex, for example using distinct labels for carbons based on the number of hydrogen atoms they are bonded to. Using this
method, ( CH3 ) would become a single vertex as opposed to 4 vertices, and different labels
would be used for ( CH2 ), ( CH ) and ( C ) groups. In the example of D-galacturonic
acid, one could further reduce the number of vertices by using labels to differentiate between
oxygen bonded to a hydrogen and the oxygen molecules bonded to two non-hydrogen atoms.
When reducing the number of vertices in this manner, it is important to carefully consider
which patterns and substructures will be significant.

2.2.3

Subgraphs and Induced Subgraphs

Definition 2.7. A graph G0 is a subgraph of the graph G if V (G0 ) ✓ V (G) and E(G0 ) ✓ E(G)
where for each edge uv 2 E(G0 ), we have u, v 2 V (G0 ).
Consider the graph from Figure 2.1 and two graphs constructed on a subset of V (G).

OTHER TYPES OF GRAPHS

9

b

b

a

b

a
c

a
c

d

e

c

d

d

e

Figure 2.9: The graph G from Figure 2.1 and two smaller graphs, G0 in the center and G00 on
the right.
The graph G0 is not a subgraph of G. This can be proved simply by observing that
vertex c in G0 has degree 4, while (G) =3. Therefore, G0 cannot be a subgraph of G. The
graph G00 , however, is a subgraph of G.
Definition 2.8. A subgraph G0 is an induced subgraph of G if whenever u, v 2 V (G0 ) and
uv 2 E(G) we have uv 2 E(G0 ). The subgraph G0 containing vertices S = V (G0 ) is called the
subgraph of G induced by S and is denoted by G[S].
Of the previous examples, neither is an induced subgraph. G0 not a subgraph, and
therefore cannot be an induced subgraph, while G00 is missing the edges ab and bd. The
subgraphs of G induced by {b, c, d, e}, and {a, c, d, e} are shown below.
b
e4

e3

a
c

c

d
e7

e6
e

e2

e5

e
Figure 2.10: Two induced subgraphs of G.

e6

e5
d

MATRICES

2.2.4

10

Bipartite Graphs

Definition 2.9. A graph or digraph, G = (V, E) is bipartite if the vertices of G can be
partitioned into two sets, A and B, called partite sets, such that every edge in the graph has
one end in A and the other in B.
Bipartite graphs can be drawn in such a way as to make the two partite sets visually
distinct using different shapes or colours for the vertices.
a3

b3
a3

a2

b2

a1

b2

b1

b3

a2
a1

b1

Figure 2.11: Two drawings of a bipartite graph with with partite sets A = {a1 , a2 , a3 } and
B = {b1 , b2 , b3 }.
These types of graphs can be very useful in a number of applications, such as modeling
the association of individuals with groups or affiliations within a social network, or the association between specific genes and types of cancer. In chemical graph theory, bipartite graphs are
used to model reaction networks, where nodes are classified as either a chemical species node
or a reaction node. Chemical species nodes have directed edges towards those reactions that
they are able to participate in. Reaction nodes then have edges towards the products of those
reactions. A very simple example is shown in Figure 2.4. Reaction networks are discussed in
greater detail in Chapter 5.

2.3

Matrices

2.3.1

Adjacency Matrices

A common way to represent a graph G, is as an adjacency matrix. If G has n vertices, then
its adjacency matrix, A is an n ⇥ n matrix with one row and one column for each vertex. The
entry aij is the number of edges that exist between vertex i and vertex j. For example, we
have aij = 1 if there is a single edge between vertex i and vertex j, and if there are no edges
between i and j, then aij = 0.
The adjacency matrix A for a graph G used to represent a chemical molecule will have
the following properties.

MATRICES

11

1. aij = aji (as bonds are undirected).
2. A = AT (hence A is symmetric).
3. The diagonal entries of A are all zero. These are entries aij where i = j.
Example 2.10. Consider the graph G with order 4 shown in Figure 2.12. The set of vertices
is V (G) = {a, b, c, d} and the set of edges is E(G) = {ab, bc, bd}. The adjacency matrix for this
graph is also shown in Figure 2.12. We can see that the edges in E correspond to the non-zero
entries in A.
a
2
a 0
b6
61
c4 0
d 0

c

b

a

b
1
0
1
1

c
0
1
0
0

d
3
0
17
7
05
0

d

Figure 2.12: The graph G and its corresponding adjacency matrix A.
Consider the graph G0 with V (G0 ) = {w, x, y, z} shown in Figure 2.13.
w
w 0
x6
61
y4 1
z 1
2

w

x

y

x
1
0
0
0

y
1
0
0
0

z
3
1
07
7
05
0

z

Figure 2.13: Another graph, G0 with corresponding adjacency matrix B.
We can see that both matrices A and B are symmetric and zero-diagonal, and that both
contain the same number of non-zero entries. We also see a natural correspondence between
the graphs: a ! x, b ! w, c ! y and d ! z. Despite the matrices looking different, we
can obtain B from A by reordering the rows and columns as b, a, c, d. This is the heart of the
“graph isomorphism problem” discussed in great detail in Chapter 4.

MATRICES

2.3.2

12

Other Types of Matrices

Definition 2.11. The identity matrix is a square matrix with 1’s as its diagonal entries and
all other entries equal to 0. This is equivalent to saying that for an identity matrix I = [aij ],
aij = 1 if i = j; otherwise, aij = 0.
By the above definition, each row and column of the identity matrix has exactly one 1.
By re-arranging the rows and columns of the identity matrix, we obtain a matrix that retains
this property, however, the resulting matrix no longer has the condition that aij = 1 if i = j.
We call the process of re-arranging rows and columns in a matrix permuting the rows and
columns.
Definition 2.12. A permutation matrix is an n ⇥ n matrix with exactly one 1 in each row
and each column.
Permutation matrices have the property, that multiplying a matrix A on the left by a
permutation matrix P the rows of A are permuted. When multiplying A by P on the right,
the columns of A are permuted.
More formally, let P be a permutation matrix, and suppose that the unique 1 in row i
is found in column j, which is to say pij = 1 and pik = 0, k 6= j. Let A be an n ⇥ n matrix and
B = P A. Then bik = ⌃nt=1 pit · atk = pij · ajk = ajk , since pij is the only non-zero entry in row
i of P . Thus the ith row of the matrix P A will be the jth row of A. By similar reasoning, P T
has a unique 1 in column i at row j. Hence the ith column of of AP T will be the jth column
of A. We record this as the following observation.
2

j

3

6
7
..
6
7
.
6
7
7
P = i6
60 · · · 0 1 0 · · · 7
6
7
..
4
5
.

2

6
6
6
T
P = j6
60 · · · 0
6
4

Figure 2.14: The structure of P and P T .

i
0
..
.

3

7
7
7
1 0 ··· 7
7
7
..
5
.
0

Observation 1. Let P be a permutation matrix, and A another matrix. If pij = 1, then row
j of P A is the same as row i of A. Further, if pij = 1, then pTji = 1 and column i of AP T =
column j of A.
Furthermore, we can use these together to get the following observation.
Observation 2. Suppose for a permutation matrix P , pij = 1 and pkl = 1. Then the entry
ajl in the matrix A is equal to the entry bik in the matrix B = P AP T .
The following example shows this principle for a 3 ⇥ 3 matrix A. For this particular P ,
we have P = P T ; however, this is not always the case. First we see the effect of multiplying A

MATRICES

13

by P on the left and by P T on the right.
2
32
3 2
3
1 0 0
a b c
a b c
P A = 4 0 0 15 4 d e f 5 = 4 g h i 5
0 1 0
g h i
d e f
2

32
3 2
3
a b c
1 0 0
a c b
AP T = 4d e f 5 40 0 15 = 4d f e 5
g h i
0 1 0
g i h

Now, if both are applied simultaneously, we see a permutation of both rows and
columns.
2
32
32
3 2
3
1 0 0
a b c
1 0 0
a c b
P AP T = 40 0 15 4d e f 5 40 0 15 = 4g i h5
0 1 0
g h i
0 1 0
d f e
Returning to the adjacency matrices in Figure 2.12 and Figure 2.13, we see that the
matrix B can be obtained by multiplying A by a permutation matrix on the left and by its
transpose on the right.
2
3 2
3
0 1 1 1
0 1 0 0
6 1 0 0 07 6 1 0 0 07
T
7 6
7
B=6
41 0 0 05 = 40 0 1 05AP
1 0 0 0
0 0 0 1

When B = P AP T for a permutation matrix P , we say A and B are permutation
similar. In the case of adjacency matrices, this corresponds to relabeling or reordering the
vertices, but it does not change the structure of the graph. Similar and permutation similar
matrices will be discussed in greater detail in Section 4.1.1.
It is also important to note that the transpose of a permutation matrix is also its
inverse. Consider a permutation matrix P = [pij ] and inverse P 1 . By definition of the
inverse, P P 1 = P 1 P = I where I is the identity matrix and, by definition of the transpose,
P T = [pji ]. Suppose pij is the unique 1 in row i of P . If we take the product of P P T , then by
Observation 1 the ith column of P P T is the jth column of P . Since the jth column of P has
its unique 1 in the ith row, the unique 1 in column i of P P T will occur in row i. By this logic,
all the 1’s in P P T will occur as diagonal entries, meaning P P T = I. The same can be shown
for P T P .

Chapter 3

Background in Chemistry
In order to understand the application of graph theory to chemical molecules and reactions,
some basic terminology is required.
Definition 3.1. An atom is the smallest unit of a chemical element.
Some common atoms, particularly in organic compounds are carbon, oxygen, hydrogen,
nitrogen and metals like sodium or magnesium. Atoms are often referenced by their one or
two letter abbreviation from the periodic table. The abbreviations for atoms referenced in this
paper include C for carbon, O for oxygen, and H for hydrogen. Other atoms may have less
obvious abbreviations such as Fe for iron or Na for sodium.
Definition 3.2. Atoms bond with other atoms to form molecules. Molecules can contain
atoms of the same element, or atoms of different elements. Chemical compounds are molecules
which contain more than one type of atom.
The terms molecule and compound are often used interchangeably, as most molecules
are compounds; however, there a number of common molecules that are not compounds such as,
H2 , N2 and O3 . The term chemical species includes molecules, compounds as well as individual
atoms or other chemical complexes that may be involved in a chemical reaction.
Definition 3.3. A functional group is a grouping of atoms within a larger molecule that
exhibits a specific set of chemical properties. Functional groups play an important role in the
reactivity of a molecule. Some common functional groups are listed in the next section.
For the purposes of modeling chemical molecules, it will be useful to think of functional
groups as subgraphs of the larger molecular graph. When thought of in this way, functional
groups may be referred to as patterns. In the context of chemical graph theory, a pattern refers
to the portion of the graph that will undergo a transformation during a particular reaction. A
molecule may contain more than one pattern, and the pattern of interest may depend on the
specific reaction being studied. In order to determine how a molecule will react, it is important
to determine which patterns it contains.
14

COMMON MOLECULES AND FUNCTIONAL GROUPS

3.1

15

Common Molecules and Functional Groups

There are several common functional groups with distinctive properties that are frequently
seen in organic chemistry. Several of these groups are listed in Table 3.1.
Molecular Formula
-CH3
-CH2 CH3
-OH
-COH
6 membered carbon ring

IUPAC name
methyl group
ethyl group
hydroxyl group
aldehyde
phenyl, benzene

Chemical characteristic
small, non-polar, non-reactive
non-polar, non-reactive
polar, reactive
polar, reactive
bulky, non-reactive

Table 3.1: Common Functional Groups
Other important functional groups include ketones, carboxyls and esters. These are
shown below, with dotted lines showing where these groups would connect to the rest of the
molecule.
O
···C

C

O
C···

···C

C

O
O

···C

H

C

O···

Figure 3.1: The structure of (from left to right) a ketone, a carboxyl and an ester.

3.2

R-Group Notation

When referring to chemical structures, it is common to use the letter R to denote any grouping
of molecules where a carbon or hydrogen is directly bonded to the atom shown. Different
R groups are differentiated through using dashes or apostrophes (R, R0 , R00 ). Consider the
following example.
Example 3.4. The molecules shown below including an R group are the general forms of a
carboxylic acid and an alcohol.
O
R

C

O

H

R0

O

H

Figure 3.2: The General Form of a Carboxylic Acid and an Alcohol.
If an alcohol has an R0 group that is a methyl group or an ethyl group, then we have
the specific alcohol methanol or ethanol respectively. The structure of these two alcohols is
shown in Figure 3.3.

DRAWING MOLECULAR STRUCTURE

16

H
H

C

O

H

H

H

H

H

C

C

H

H

O

H

Figure 3.3: Methanol and Ethanol with the R groups shown in the box.
Molecules can also contain multiple function groups, as with ethyl acetoacetate shown
in Figure 3.4 which contains an ethyl group, an ester and a ketone.

H3 C

O

O

C

C
CH2

CH2
O

CH3

Figure 3.4: The structure of ethyl acetoacetate which include an ethyl group, shown in blue
(dashed), an ester shown in red (solid), and a ketone group shown in green (dotted).

3.3

Drawing Molecular Structure

There are several ways in which chemical molecules are commonly shown, three of which are
described in this section.
Figure 3.2 and 3.3 show Kekulé structures, where each atom is shown and bonds between atoms are drawn as lines between the atoms. This type of diagram is very useful for
molecules that contain only a small number of atoms, and make it easy to visualize the arrangement of atoms within the molecules structure.
A similar type of diagram called a line-bond formula or the skeletal formula also uses
lines to represent bonds, however it generally does not explicitly include carbon or hydrogen
atoms. Instead, it is assumed the each bond is connected to a carbon atom unless another atom
is specified. Carbon atoms are also assumed to have a total of 4 bonds, with any bonds not
explicitly shown being to hydrogen atoms. Hydrogen atoms are typically not shown, unless
they are part of a hydroxyl group or they are required for a particular reaction step. Like
Kekulé structures, line-bond formulas clearly show the connectivity and arrangement of atoms
within a molecule. As an example, the Kekulé structure and line-bond formulas for ethanol
are shown in Figure 3.5. The line-bond formula for benzene is shown in Figure 3.6.

DRAWING MOLECULAR STRUCTURE

H

H

H

C

C

H

H

O

17

H

OH

Figure 3.5: The Kekulé and line-bond formulas for ethanol.
When drawing line-bond formulas, benzene rings or phenol groups are typically shown
with a circle in the center to show the delocalization of the pi electrons. Figure 3.6 shows the
line-bond formula of a benzene ring and an ester. Although these models do not have explicit
vertices, they are useful for visualizing large molecules and for visualizing the structure without
C H bonds.
O

O

O
Figure 3.6: Line-bond formulas for benzene and ethyl acetoacetate.
Unlike the previous two methods, condensed formulas simply list the type and quantity
of atoms. In this type of model, most or all of the bonds in a molecule are omitted and the
order in which the atoms are written implies the structure. Parentheses are used to differentiate
substituents from the base structure when needed. Ethyl acetoacetate shown in Figure 3.6 has
the condensed formula: CH3 CHOCH2 CHOOCH2 CH3 . The functional groups in Table 3.1
are shown as condensed formulas.

3.3.1

Corey–Pauling–Koltun (CPK) Colouring

A common way to differentiate atoms in a chemical structure is to use a standard colouring
system for specific atoms, namely the Corey–Pauling–Koltun (CPK) colouring system which
originated from a paper published by Corey and Pauling in 1952 [5]. The colours for some
common atoms are shown in Table 3.2.

DRAWING MOLECULAR STRUCTURE
Atom
Carbon
Oxygen
Hydrogen
Nitrogen
Halogens
Metals

18
Colour
Black
Red
White
Blue
Green
Silver

Table 3.2: CPK colourings for some common atoms.

Chapter 4

Graph Isomorphism
This chapter provides an introduction to the graph and subgraph isomorphism problems, as
well as several algorithms which can be used to solve these types of problems.

4.1

Graph Isomorphism Problem

Let us consider the graphs G and G0 shown in Figure 4.1.
c
w
b
a

d

x

y

z

Figure 4.1: A graph G with V (G) = {a, b, c, d} and a second graph G0 with V (G0 ) = {w, x, y, z}.
In Section 2.3.1, we observed that we could permute the rows and columns of the
adjacency matrix of G to get the adjacency matrix of G0 . This was done through matrix
multiplication and permutation matrices. Just from looking at the structure of the graphs, it
can be observed that there is a mapping from V (G) to V (G0 ) that preserves adjacency and
non-adjacency. That is, ij is an edge in E(G) if and only if (i) (j) is an edge in E(G0 ) where,
(a) ! x
(b) ! w
(c) ! y
(d) ! z.
19

GRAPH ISOMORPHISM PROBLEM

20

This mapping is an example of an isomorphism between the two graphs. As another
example, consider the bipartite graphs from Figure 2.11 (shown below). Visually these seem
very different, but when we examine the connectivity between nodes, it can be seen that
ax , by are adjacent in G, if an only if they are also adjacent in G0 . Therefore, there exists an
isomorphism between these two graphs.
a3

b3
a3

a2

b2

a1

b2

b1

b3

a2
a1

b1

Figure 4.2: Isomorphic bipartite graphs G and G0 with partite sets A = {a1 , a2 , a3 } and
B = {b1 , b2 , b3 }.
Definition 4.1. Let G and H be graphs. A mapping : V (G) ! V (H) is an isomorphism
between G and H if it is a bijection, and preserves the edges and non-edges. That is, for all
u, v 2 V (G), uv 2 E(G) if and only if (u) (v) 2 E(H). We say G and H are isomorphic if
such a mapping exists. For labeled graphs G` = (`, V, E), H` = (`0 , V 0 , E 0 ) an isomorphism
must also preserve labels. That is (`(u)) = `0 ( (u)).
In the context of a chemical molecule, an isomorphism that preserves labels indicates
that the bond connectivity is the same, and so the molecules are identical or stereoisomers of
one another.

4.1.1

Similar and Permutation Similar Matrices

Definition 4.2. Let A and B be n ⇥ n matrices. Then A is similar to B if there exists an
invertible matrix P such that A = P BP 1 .
Recall from Section 2.3.2 that a permutation matrix is an n ⇥ n matrix with exactly
one 1 in each row and each column.
Definition 4.3. If A and B are matrices where there exists a permutation matrix P such that
A = P BP 1 then A is said to be permutation similar to B.
Recall that for every permutation matrix P we have P 1 = P T . In particular, the
inverse of a permutation matrix is a permutation matrix. Hence, if A and B are permutation
similar, then we can write A = P BP T where P is a permutation matrix. An example of this
operation is as follows.

GRAPH ISOMORPHISM PROBLEM

21

Let
2

3
0 1 0
A = 4 1 0 15
0 1 0

and

2

2

3
0 0 1
B = 4 0 0 15 .
1 1 0

3
0 1 0
The permutation matrix P = 40 0 15 can be applied to show that A and B are
1 0 0
permutation similar.
2
32
32
3 2
3
0 1 0
0 0 1
0 0 1
0 1 0
P BP T = 40 0 15 40 0 15 41 0 05 = 41 0 15 = A
1 0 0
1 1 0
0 1 0
0 1 0
Thus A and B are permutation similar.

Proposition 4.4. Permutation similarity is an equivalence relation on n ⇥ n matrices.

Proof. We must show that permutation similarity is reflexive, symmetric and transitive. Since
the identity matrix I contains exactly one 1 in each row and each column, it is a permutation
matrix. Let A be an n ⇥ n matrix A, and I be the n ⇥ n identity matrix. Since IAI 1 = A,
A is permutation similar to itself, and we conclude that permutation similarity is reflexive.
Let A and B be n ⇥ n matrices, where A is permutation similar to B. Then there exists
a permutation matrix P such that
P BP

1

= A () P BP

1

P = AP () P

1

PB = P

1

AP () B = (P

1

)A(P

1

) 1.

Thus, if A is permutation similar to B, then B is permutation similar to A, so permutation similarity is symmetric.
Suppose there are n ⇥ n matrices A, B, C where for some permutation matrices P, Q
where
P AP

1

=B

and

QBQ 1 = C.

Then
C = QBQ 1
1

= Q(P AP
= QP AP

1

)Q 1

Q 1

= (QP )A(QP ) 1

GRAPH ISOMORPHISM PROBLEM

22

Note that QP is a permutation matrix, as permutation matrices are closed under taking
products. Thus A and C are permutation similar, and permutation similarity is transitive.
Therefore, we conclude that the relation is an equivalence relation.
Observe that the above proof works for general matrices P and Q, thus similarity is
also an equivalence relation.

4.1.2

Matrix Representation of an Isomorphism

Let G be a graph on n vertices with adjacency matrix A and G0 be a graph on n vertices with
adjacency matrix B. Both A and B are n ⇥ n matrices. Suppose both A and B have vertices
{1, 2, ..., n}. If : G ! G0 is an isomorphism, then the permutation matrix P with pi (i) = 1
for each i shows B = P AP T , by Observation 2. Conversely, if A and B are permutation
similar, then G and G0 are isomorphic. This, along with Observation 2 gives the following
additional observation.
Observation 3. Two graphs G and G0 are isomorphic if and only if their adjacency matrices
are permutation similar.
Suppose B and C are symmetric matrices and suppose B is permutation similar to C.
Then there is a matrix P such that
B =P CP T = P (P C T )T = P (P C)T .
This reformation is used by Ullmann’s algorithm for the subgraph isomorphism in
Section 4.2. Let G be a graph on n vertices with adjacency matrix A, and G0 be a graph on m
vertices with adjacency matrix B where m  n. Let V (G0 ) = {j1 , j2 , ..., jm } ✓ {1, 2, ..., n} =
V (G). Define S to be the m ⇥ n matrix with row i containing exactly one 1 in column ji . Then
row i of SA is row ji of A. These are the rows of A correspondng to the subgraph G0 . When
A is multiplied on the left by the m ⇥ n matrix S, the result will be an m ⇥ n matrix, and
(SA)T will therefore be an n ⇥ m matrix. Multiplying this again by S on the left, the result
will be an m ⇥ m matrix, which is the adjacency matrix C of G0 . To show a graph G00 with
adjacency matrix B is isomorphic to G0 we find S such that
B = P CP

1

= P (S(SA)T )P

1

= P (SAT S T )P

1

= (P S)AT (S T P T )
= (P S)AT (P S)T
= (P S)((P S)A)T .
To recap, if S is a matrix which contains exactly one 1 in each row, and at most one 1
in each column, the result of the operation S(SA)T is the adjacency matrix of a subgraph of

GRAPH ISOMORPHISM PROBLEM

23

G. Thus, to identify a subgraph of A which is isomorphic to B, requires an m ⇥ m permutation
matrix P and an m ⇥ n matrix S which meets the criteria listed above. That is,
C = P S((P S)A)T
or
C = M (M A)T
where M = P S. Observe M will have exactly one 1 in each row and at most one 1 in each
column, as this is a property of S and is preserved by P .
Consider again the graph from Figure 2.1, and a smaller graph H.
b

x

a
c

y

z

d

e
Figure 4.3: Graph G on the left, and graph H on the right.
Then we have
a b c d e
2
3
a 0 1 1 0 1
7
b6
6 1 0 1 1 07
6
7
c
1
1
0
1
0
A= 6
7
4
d 0 1 1 0 15
e 1 0 0 1 0
the adjacency matrix of G, and
x y z
2
3
x 0 1 0
B = y 4 1 0 15 the adjacency matrix of H.
z 0 1 0
To show that there is an isomorphism between G[{a, d, e}] and H, we first multiply A
by the 3 ⇥ 5 matrix S, with 1’s in the columns corresponding to the vertices a, d and e.

ULLMAN’S ALGORITHM

24
2

0
1 0 0 0 0 6
61
T
4
5
SAS = 0 0 0 1 0 6
61
0 0 0 0 1 40
1
2

3

1
0
1
1
0

1
1
0
1
0

0
1
1
0
1

32
1 1
6
07
760
7
07 6
60
15 4 0
0 0

0
0
0
1
0

3
0
2
3
07
0 0 1
7
4
5
07
7= 0 0 1 =C
5
0
1 1 0
1

We can see that C 6= B, however they are permutation similar as there is a permutation
matrix P such that
2
32
32
3 2
3
1 0 0 0 0 1 1 0 0
0 1 0
P CP T = 40 0 1540 0 1540 0 15 = 41 0 15 = B.
0 1 0 1 1 0 0 1 0
0 1 0
Thus there is an isomorphism between H and the induced subgraph G[{a, d, e}] of G.
If we relax the condition that B = P CP T to simply whenever bij = 1, then the ij
entry of P CP T is also 1, then we may conclude that H is isomorphic to a subgraph of G, but
it may not be induced. Consider
2
3
1 0 0 0 0
S = 40 1 0 0 05.
0 0 1 0 0
Then

while

2

3
0 1 1
C = SAS T = 41 0 15,
1 1 0
2

3
0 1 0
B = 4 1 0 15 .
0 1 0

Definition 4.5. Let B and C be m ⇥ m (0, 1)-matrices. If bij  cij for all i, j, then we say
that C dominates B.

4.2

Ullman’s Algorithm

In this section we describe an algorithm for finding all the isomorphisms between a given graph
H and subgraphs of another graph G. The algorithm uses the adjacency matrices of H and G
to identify possible isomorphisms, and then verify them. Checking all possible mappings from
V (H) into V (G) is computationally expensive, so a neighborhood condition is added to reduce
the number of mappings and improve the efficiency of the algorithm.
For the remainder of this section, we consider a given graph G with adjacency matrix
A and a second graph H with adjacency matrix B where |V (H)|  |V (G)|. Let |V (G)| = m
and |V (H)| = n, so A is a m ⇥ m matrix and B is an n ⇥ n matrix.

ULLMAN’S ALGORITHM

4.2.1

25

The Algorithm

The algorithm first computes an n ⇥ m matrix M where the entries of M are determined by
comparing the degrees of vertices in G with degrees of vertices in H.

mij =

8
>
<1,
>
:

0,

if the sum of entries in column j in matrix A is greater than or equal to the sum
of entries in column i of matrix B;
otherwise.

Observe that the sum of entries in a row or column of an adjacency matrix is equal to
the degree of the corresponding vertex. Therefore mij = 1 means vertex i in H has degree less
than or equal to vertex j in G. This is clearly a necessary condition for vertex i to map to
vertex j under a subgraph isomorphism.
Symbolically,

mij =

(

1
0

if deg(j) deg(i) for j 2 V (G), i 2 V (H)
otherwise

The matrix M may have several 1’s in each row. Taking the matrix M the algorithm
considers every matrix Mk , k = 1, 2, 3, .... generated from M by choosing exactly one 1 in each
row so that there is at most one 1 in each column. The matrix
C = Mk (Mk A)T
is the adjacency matrix of a subgraph G0 of G. If C dominates B, then B is isomorphic to a
subgraph of G0 and the matrix Mk describes this isomorphism.
As an example, let G and H be the following graphs.
c

b

a

d

x

y

e
Figure 4.4: Graphs G and H.

The adjacency matrices of G and H are

z

ULLMAN’S ALGORITHM

a
2
a 0
b6
61
A = c6
60
d4 0
e 0

The corresponding matrix M is

26

b
1
0
1
1
0

c
0
1
0
1
0

d
0
1
1
0
1

e
3
0
07
7
07
7
15
0

and

a
x 1
M = y 40
z 1

b
1
1
1

a
2
x 0
M1 = y 4 0
z 1

b
0
0
0

2

x
2
x 0
B = y41
z 0

c
1
1
1

d
1
1
1

y
1
0
1

z
3
z
1 5.
0

e
3
1
05 .
1

We see that deg(y) = 2 and that deg(a) = deg(e) = 1 therefore the corresponding
entries in M are 0. There are 3 ⇥ 4 ⇥ 3 = 36 choices for Mk , two of which are shown below.

Giving

c
1
0
0

d
0
1
0

2

e
3
0
05
0

3
0 1 0
C = M1 (M1 A)T = 41 0 05
0 0 0

Which maps z ! a, y ! d and x ! c, however, since a and d are not adjacent, but y and z are,
{a, c, d} is not a subgraph isomorphic to H. This can be seen as B dominates C, specifically
because c32 = 0 but b32 = 1.
a
x 0
M2 = y 4 0
z 0
2

b
0
0
1

c
1
0
0

d
0
1
0

e
3
0
05
0

Working through this example in more detail, we get
0
2
3 1T
2
3 2
3 0 1 0 0 0
2
3
C
0 0 1 0 0 B
0 1 1
0 0 1 0 0 6
1 0 1 1 07
B
6
7
C
7C
4
56
4
5
D = M2 (M2 A)T = 40 0 0 1 05B
B 0 0 0 1 0 60 1 0 1 07 C = 1 0 1
@
4
5
A
0 1 0 0 0
1 1 0
0 1 0 0 0 0 1 1 0 1
0 0 0 1 0
The matrix M2 is like M1 except that it maps z ! b. Here we observe that if bij =1,
then dij = 1. In this case, adjacency is preserved and H is isomorphic to a subgraph of G on
{b, c, d}. Note H is not isomorphic to G[{b, c, d}] as b13 = 0 but d13 = 1.

ULLMAN’S ALGORITHM

4.2.2

27

Labeling Condition

A labeling condition can be added to reduce the number of 1’s in the matrix M and therefore
reduce the number of matrices Mk to check for the subgraph isomorphism described above.
The application of a labeling condition is natural for the case of chemical graphs which are
comprised of different types of atoms. When searching for copies of a particular functional
group it is important to take into consideration the type of molecule.

4.2.3

Neighbourhood Condition

Another method of reducing the number of 1’s in the matrix M is to add a neighborhood
condition. The idea behind this condition is to impose the restriction that a vertex in H can
only be matched to a vertex in G if their neighbours are the same. This can be done both in
the context of labeling, and in the context of comparing vertex degrees.
Let be a vertex in V (H) where 1 , 2 , ...., s are the vertices adjacent to , and let
↵ be a vertex in V (G) where ↵1 , ↵2 , ..., ↵t are the vertices adjacent to to ↵ in V (G). If there
exists an isomorphism where maps to ↵, then for each x = 1, 2, ..., s there must be a vertex
↵y that is adjacent to ↵ such that x maps to ↵y .

4.2.4

Example

Consider the following graph G representing a simplified sugar molecule, and the smaller graph
H on 3 vertices. The adjacency matrix B which corresponds to H is shown. The adjacency
matrix A which corresponds to G is an 18 ⇥ 18 matrix, and is not shown. We show how using
a labelling conditions speeds up the search for the functional group H in G.
m
r
g

n
l

a

b

h

c

k
f

d

e

j
q

o
i
p

Figure 4.5: Graph G corresponding to a molecule of D-Galaturonic acid.

ULLMAN’S ALGORITHM

28

x

z

y

x
2
x 0
B= y 4 1
z 0

y
1
0
1

z
3
0
15
0

Figure 4.6: The graph H corresponding to a hydroxyl functional group, and its adjacency
matrix B.
The Matrix M
We begin by defining the matrix M using the conditions set by Ullman’s algorithm. Since
deg(y) = 2, the zero entries in the second row of the matrix M correspond to those vertices
in G with degree less than 2. From this matrix we can see that there are 13 ⇥ 17 ⇥ 16 = 3536
ways to choose exactly one 1 from each row, each from a distinct column, and therefore there
would be many possible isomorphic mappings to be checked.
a b c d e f g h i j k l m n o p q r
2
3
x 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
M = y 4 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 05
z 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Applying the Labeling Condition

Since G represents a sugar molecule, and H represents a hydroxyl group, the vertices can be
labeled according to the type of atom they represent. Both G and H use the set of labels
{C, O, H}. For this example, the subscripts are only shown for ease of reference and identification. Vertices labeled with the same letter are considered to have the same label regardless
of subscript.

ULLMAN’S ALGORITHM

29
O7
H
O1

H
O6

C

C

O2

C

O5
C

C

C

O4

H

H
O3
H

Figure 4.7: Graph G corresponding to a molecule of D-Galaturonic acid with labels assigned
based on type of atom.
When H is labeled to represent a hydroxy group attached to a carbon atom, we get
the following graph and adjacency matrix B`
C

H

O

C O H
3
C 0 1 0
B` = O 4 1 0 1 5
H 0 1 0
2

Figure 4.8: The graph H corresponding to a hydroxyl functional group, and its adjacency
matrix B.
The labeling condition requires that mij = 1 only if `(i) = `(j). The resulting matrix
M` by adding the labeling condition to the original matrix M is shown below.
C C 2 C 3 C 4 C 5 C 6 O 1 O 2 O 3 O 4 O 5 O 6 O 7 H1 H2 H3 H 4 H5
2 1
3
C 1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
O4 0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
05
H 0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1

We can see that in M` , the only non-zero entries occur when the column and row labels
indicate the same type of atom. Adding this condition allows us to reduce the number ways
to choose exactly one 1 in each row, and therefore reduce the number of possible mappings to
6 ⇥ 7 ⇥ 5 = 210.

THE GEMINI AND SUBGEMINI ALGORITHMS

30

Applying the Neighbourhood Condition
The neighborhood condition allows us to take the labelling condition one step further. If
we take the matrix M` from the previous step, we can see from examining Figure 4.7 that
the columns corresponding to hydrogen atoms and carbons will not be affected by this new
condition. This is because, the only condition for both hydrogen atoms and carbon atoms is
that they each be adjacent to an oxygen atom. However, if we examine the vertices labeled as
oxygens, we can see that not all oxygen atoms in G are adjacent to a carbon and a hydrogen
as in H. It is therefore possible to change the second row entries for O1 and O7 from 1 to 0.
There are now 6 ⇥ 5 ⇥ 5 = 150 mappings to check.
C C 2 C 3 C 4 C 5 C 6 O 1 O 2 O 3 O 4 O 5 O 6 O 7 H1 H2 H3 H 4 H5
2 1
3
C 1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
O4 0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
05
H 0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1

4.3

The Gemini and SubGemini Algorithms

The SubGemini Algorithm [10] was created to determine whether a larger circuit contained
copies of a smaller circuit. For this algorithm, circuits were represented as graphs, where
the vertices represented devices and nets, which were connected by edges. Devices and nets
were labeled based on type as well as degree. The algorithm is based on the Gemini and
GeminiII algorithms [7] which were both designed to find isomorphisms between graphs of the
same order. This was done through partitioning the vertices into sets, following the principle
outline by [6]. The labeling associated with chemical graphs make them very well suited to
this particular algorithm.

4.3.1

Gemini and GeminiII

The Gemini and GeminiII algorithms can simply begin by partitioning the vertices based on
the original label of each vertex. Recall that partitioning a set involves grouping elements of
the set into non-empty subsets where each element of the set appears in exactly one of the
subsets.
The Gemini algorithm, which was developed for use on electric circuits represented by
bipartite graphs, starts by partitioning vertices into partite sets. Chemical graphs are rarely
bipartite, however, the natural first step is to partition vertices based on the type of atom they
represent.
Next, each vertex is re-labeled to include information about its neighbours. For a
current label Lo , we obtain a new label L+ , where + denotes concatenation, by setting
X
L+ = Lo +
c i Li
(4.1)

THE GEMINI AND SUBGEMINI ALGORITHMS

31

where Li is the original label of the neighbour node, and ci is the number of bonds, or edges,
between the nodes. For example, the oxygen atom in water (Figure 2.5) would have the label
O1H1H, while a carbon double bonded to another carbon and attached to two hydrogen atoms
would have the label C2C1H1H. The labeling conventions could follow the IUPAC standard
for priority of atoms and functional groups.
This relabeling procedure can then be applied to neighbours of neighbours until each
partition contains only a single element. From there, the algorithm would look for an isomorphic mapping between the partitions of one graph and the partitions of the other. If such a
mapping exists, then the two graphs are isomorphic.
There are some exceptional cases where two vertices may remain identical through this
relabeling process. Some examples are shown below. In these cases, molecules possess at least
one plane of symmetry.
H
H

H
C

C

C
H

H
C

H

H

H

Figure 4.9: The structure of cis-2-butene on the left, and a benzene ring on the right.
Example 4.6. Benzene and cis-2-butene (Figure 4.9) are examples of molecules that contain
planes of symmetry. Applying the partitioning algorithm to cis-2-butene would not be able to
differentiate between the two central carbons and the two terminal carbons. In the benzene
ring, all the carbon atoms would receive the same label and thus be indistinguishable for the
algorithm.
Definition 4.7. H-NMR or proton nuclear magnetic resonance is a form of spectroscopy
which can be used to determine the structure of a molecule. The results of H NMR show
peaks, whose position and number indicate the arrangement of hydrogen atoms. Hydrogen
atoms in the same partition after applying the labeling procedure described above, show a
peak at the same position.
If we take the structure of cis-2-butene and consider only the hydrogen atoms, we can
see that each of the hydrogen atoms shown in blue would receive the same label, as would the
two shown in red. For example, the red hydrogen atoms would receive the label H1C2C1C....
while the blue hydrogen atoms would receive the label H1C1C1H1H....

THE GEMINI AND SUBGEMINI ALGORITHMS
H
H

H
C

C

C
H

32

H
C

H

H

H

Figure 4.10: The partitioning of hydrogen atoms in a molecule of cis-2-butene.
It is also possible to determine the structure of a molecule from one of the labels.
Choosing one of the red hydrogen atoms, and applying the labeling algorithm, the label would
be created according to the following sequence.
H, H1C, H1C2C1C, H1C2C1C1C1H1H1H1H, H1C2C1C1C1H1H1H1H1H1H1H
Given a label, we can reconstruct a graph using information about the types of atoms.
We begin with a hydrogen which can form only one bond, in this case to a carbon. The carbon
atom forms 4 bonds, which in this case includes a double bond to another carbon. The full
process for this reconstruction is shown below.

THE GEMINI AND SUBGEMINI ALGORITHMS

33

H

H
C
H
C

C

C
H

H

H

C

C

C

C

H

H
H

H

H
C

C

C
H

H
C

H

H

H

Figure 4.11: Determining the structure of cis-2-butene from one of the labels.
Vertices with identical labels are in the same orbit under the automorphism group of a
graph G. More information about group actions on a set, and orbits can be found in the text
by Bondy and Murty [2].
In the case of cis-2-butene, there are two distinct partite sets containing hydrogen
atoms, and so we would expect to see two distinct peaks on an H-NMR spectrum for cis-2benzene, as is shown in Figure 4.12.

THE GEMINI AND SUBGEMINI ALGORITHMS

34

Figure 4.12: An H-NMR of 2-cis-butene showing two distinct peaks [4].
The first peak indicates the two red hydrogen atoms, and the taller peak indicates the
six blue hydrogen atoms. The splitting of the peaks indicates the number of nearby, nonequivalent hydrogen atoms and gives further information about the structure. Most chemical
molecules will have more than one equivalent hydrogen meaning they will always have the
same labels. For this reason, the Gemini algorithm would require a condition that hydrogen
atoms do not need to be in singleton partitions, or would require equivalent hydrogen atoms
to be omitted altogether.
Example: Propene Alcohol Isomers
In chemistry, the term isomer is used to refer to two molecules with the same number of each
type of atom, but with different bond connectivity. There are different types of isomers such as
stereoisomers and structural isomers with different characteristics and properties. An example
of a pair of isomers is propen-2-ol shown in Figure 4.13 and 2-propen-1-ol shown in Figure
4.14. Both of these species contain 1 oxygen, 3 carbon and 6 hydrogen atoms. We will use the
Gemini algorithm to show that these two structural isomers are not isomorphic.
For this example, each labeling step will use the order O, C, H and in the case of two

THE GEMINI AND SUBGEMINI ALGORITHMS

35

atoms with the same initial label, the atoms connected by multiple bonds will be listed first.
g

d

e
a

h

b

c

i

f
j
Figure 4.13: A graph representation of propen-2-ol using CPK colouring.
We begin by labeling each vertex from the set {C, O, H} based on the atom type.
C ={a, b, c}

O ={d}

H ={e, f, g, h, i, j}

The partition of vertices with the label O contains only one vertex, so it does not require
further labeling. Furthermore, we will omit the hydrogen atoms for the next labeling step,
as both vertex a and vertex c clearly show equivalent hydrogen atoms. Next, we re-label the
remaining carbons based on those atoms immediately adjacent.
C2C1H1H ={a}

C1O2C1C ={b}

C1C1H1H1H ={c}

Since each vertex has a unique label, we stop.
u
q

t
r

v

x
s

y

w

z

Figure 4.14: A graph representation of 2-propen-1-ol shown using CPK colouring.
As we did with the first molecule, we begin by partitioning based on the type of atom.
O ={y}

C ={q, r, s}

H ={t, u, v, w, x, z}

Once again the oxygen results in a singleton partition. Omitting the hydrogen, we
proceed with the labeling process.

THE GEMINI AND SUBGEMINI ALGORITHMS

C2C1H1H ={q}

C2C1C1H ={r}

36

C1O1C2H ={s}

This result shows that {d} could be mapped to {y} and {a} could be mapped to {q},
however the remaining vertices have different labels and so have different bond connectivity.
Therefore, we conclude that the two isomers are not isomorphic.

4.3.2

SubGemini

Although the Gemini algorithm can identify isomorphisms between two graphs with the same
number of vertices, the method of identifying an isomorphism between partite sets containing
single vertices does not work for graphs of different order. The goal of the SubGemini algorithm
is therefore to identify a key vertex in the smaller graph and use that to create a candidate
vector of all possible vertices in the larger graph that it could map to. To visualize this,
consider a large organic molecule containing mostly carbon and hydrogen atoms, with a small
number of oxygen and other atoms as well. If the pattern of interest contained an oxygen
molecule, the SubGemini algorithm would identify the oxygen as the key vertex, and then
generate a vector of all oxygen molecules with the same adjacencies within the larger molecule.
This approach would be more effective than checking every possible mapping that included a
carbon or a hydrogen atom which are much more numerous.
The process of identifying the key vertex uses the same basic principle as the Gemini
algorithm, but with the important consideration that there will be one or more vertices in the
smaller graph that will be mapped to a vertex in the larger graph with more edges. More
specifically, let G and H be connected graphs such that |V (G)| |V (H)| and suppose that an
isomorphism exists between H and the subgraph G0 of G. Then there is at least one vertex
in G0 that is connected to vertices in G that are not in G0 . Subgemini accounts for this type
of vertex by marking it as “corrupted” meaning that it does not undergo further labeling.
These corrupted vertices can not be used as the key vertex, and if the next iteration of the
labeling process for a particular vertex v would include a corrupted vertex, the labeling process
terminates for v. At this point, the label for vertex v is complete.
The outline of the algorithm described by Ohrich, Ebeling, Ginting and Sather [10] is
as follows.
• Use the labeling procedure to partition the vertices in H until all labels are complete
and the parts contain only a single vertex.
• Label the vertices in G using the same number of iterations used to label the vertices in
H.
• Choose the partition of G with the fewest elements to be the candidate vector. The
vertex in H with the corresponding label is the key vertex K.
• If there is more than one partition of V (G) with the smallest number of elements, K can
be selected arbitrarily.

THE GEMINI AND SUBGEMINI ALGORITHMS

37

• Check for isomorphisms between H and subgraphs of G containing vertices in the candidate vector.
In some cases, identifying the key vertex in the smaller graph may be obvious just
from the initial label; however, the labeling process is still valuable as it can further reduce
the size of the candidate vector based on those vertices adjacent to it as is shown the following
example.
Example - Ethyl Acetoacetate
O2

O3

C
C

O
C

C
C

C1

C

O1

C

C

X

X
C2

O4
C

C
C

N+

O5
Figure 4.15: 4-Nitrobenzyl Acetoacetate and the carbonyl pattern. The dashed lines and
vertices labeled X are corrupted vertices. The subscripts are shown for ease of identification.
Key Vertex
To begin, the labeling process is applied to the carbonyl pattern using initial labels only, giving
{O} and {C1 , C2 }. For the next step, the oxygen would receive the label O2C, however both
carbons are corrupted at this step and so do not receive labels. Although it was possible to
identify oxygen as the key vertex based solely on the initial labels, it is important to proceed
with the labeling to reduce the number of vertices in the subsequent candidate vector.
Candidate Vector
Since the vertices in the candidate vector must have labels that begin with O, we will only
consider the vertices in 4-nitrobenzyl acetoacetate which correspond to oxygen atoms. After
the initial partitioning we have the partition.
{O1 , O2 , O3 , O4 , O5 }

Had we not fully labeled the key vertex, we would need to check for isomorphisms involving
all of these oxygen atoms, however the additional labeling step allows us to eliminate some of

THE GEMINI AND SUBGEMINI ALGORITHMS

38

these vertices. The next labeling iteration give us the following partitions.
O2N ={O4 }

O1N ={O5 }

O1C1C ={O1 }

O2C ={O2 , O3 }

So our final candidate vector, {O2 , O3 } contains only 2 vertices as opposed to 5. From
this it can be determined that there are multiple subgraph isomorphisms between 4-nitrobenzyl
acetoacetate and the carbonyl pattern.

Chapter 5

Reaction Networks
Reactions networks are typically modeled in one of two ways. In a chemical reaction network,
or CRN, the vertices represent the chemical species, and the edges represent the reactions
occurring between them [14]. In a reaction route network, or RRN, a system is modeled as a
bipartite digraph, where chemical species and chemical reactions are modeled as partite sets
of nodes. In this model, species nodes are only adjacent to reaction nodes and vice versa.
Species 1

Species 2

Reaction 1
Species 1

Species 2

Intermediate 1

Intermediate 1
Product 1

Product 2

Reaction 2

Reaction 3

Product 1

Product 2

Figure 5.1: The same simple chemical system modeled as a CRN and as an RRN.
There are benefits to both models, depending on the specifics of the reaction and system
being modeled.

5.1

Orlova’s Algorithm

In this section we present the algorithm from Orlova’s thesis [11] which also appears in the
paper [12]. The algorithm overview is as follows.

39

REACTION NETWORK FOR ETHYL ACETOACETATE

40

Algorithm 1: Algorithm
input: list of initial chemical species and complete list of reaction
rules/transformations
1 update = true;
2 while update=true do
3
update=false;
4
for 1  i  # of reaction rules do
5
Find all candidates to undergo reaction i from the current list of chemical
species
6
Apply the transformation to create products
7
end
8
if A product is not in the list of chemical species then
9
Add the product or products to the list
10
update=true;
11
end
12 end

5.1.1

Algorithm Components

Line 5: To find candidates for a particular reaction, the algorithm searches the current list
of chemical species to determine which species have the requisite pattern or patterns. This is
done by employing Ullmann’s algorithm to identify isomorphisms between the required pattern
and subgraphs of the molecular graphs on specific vertices.
Line 8: To verify if any of the species generated by the previous step have already been
included in the list of chemical species the algorithm checks for graph isomorphism between
the new and existing chemical molecules.
Line 6: Pre-determined reaction rules and transformations are applied to reaction candidates
by replacing or altering portions of the reacting species. An example of such a transformation
is shown below.

5.2

Reaction Network for Ethyl Acetoacetate

This section details an example of a reaction network constructed using the algorithm in Section
5.1. A complete list of the patterns and reaction rules is shown, as well as the reaction network
that is generated.

5.2.1

Input: Initial Species

For this example, we start with only two chemical species, ethyl acetoacetate and methanol.
The structure of these two molecules is shown in Figure 5.2.

REACTION NETWORK FOR ETHYL ACETOACETATE
O

O

C

C

H3 C

CH2

41

CH2
O

CH3

H3 C

O

H

Figure 5.2: The structure of ethyl acetoacetate and methanol.

5.2.2

Input: Patterns

In order to create the list of reaction rules, we must identify the patterns capable of undergoing
reactions in this system. Some molecules, like the initial species ethyl acetoacetate, may have
more than one pattern and so be a candidate species for more than one reaction.
Pattern 1. Ketone - Keto Form
O
R

C

R0

C

Pattern 2. Alcohol
R

CH2

O

H

Pattern 3. Ketone - Enol Form
O
R

C

R0

C

Pattern 4. Acetoacetate Enolate
O

O

C

C

R

C

Pattern 5. Carbon Dioxide
O

O
C

Pattern 6. Carbon Chain
CH3 (CH2 )n CH3

R0
O

REACTION NETWORK FOR ETHYL ACETOACETATE

42

Pattern 7. Acetone Enolate
O
H2 C

C

CH3

Pattern 8. Diacetone Alcohol
···

O

O

C

C

···

C

H

H

C···

Pattern 9. Water

O
Pattern 10. Ester
O
R0

C
R

O

Pattern 11. Carboxylic Acid
O
C
R

5.2.3

H
O

Input: Reaction Rules

From the patterns we generate a list of reaction rules. The Reaction number and the patterns
involved in each reaction are summarized in Table 5.1. Colours are used in each reaction to
show the parts of each pattern that are undergoing transformation.
Reaction 1. Keto-Enol Tautomerization
O
R

C

O
R0

Reaction 2. Decarboxylation

!

R

C

R0

REACTION NETWORK FOR ETHYL ACETOACETATE

43

O

O

O
C

R0

C

R

C

!

O

R

C

+

O

O

R 0 + H2 C

C

CH3

O

R

O

C

C

CH3 !

C

C

C
R00

!

R0

O

C

C

R

R

R0 + HO

R00

R00

O
Reaction
1
2
3
4
5
6

R00

C

↵

R
O
Reaction 6. Ester Exchange (Transesterification)
O
C

+H

H
O

O

R0 + HO

C

CH3

R0

Reaction 4. Aldol Condensation
O
O
R

C
R0
Reaction 5. Alcoholysis
O

CH3 (CH2 )n CH3

O

C

C

+

C

Reaction 3. Aldol Addition
O
R

C

C
R00

R

R00 + HO

R0

O
O

↵

C
R

Reactants
Pattern 1
Pattern 4
Pattern 1 + Pattern 7
Pattern 8
Pattern 3 + Pattern 2
Pattern 10 + Pattern 2

R0 + HO

O

Products
Pattern 3
Pattern 1 + Pattern 5 + Pattern 6
Pattern 8
Pattern 1 + Pattern 9
Pattern 1
Pattern 2 + Pattern 10

Table 5.1: Summary of Reactions.

REACTION NETWORK FOR ETHYL ACETOACETATE

44

Example 5.1. The ester exchange transformation between ethyl acetate and methanol, and
the resulting products.
One of the reactions that occurs as a part of this chemical network is an ester exchange
between ethyl acetate and methanol, shown below. In Line 2 of the algorithm, Ullman’s
algorithm would identify pattern 2 in methanol and pattern 10 in ethyl acetate. Both species
are shown below.
O
CH2

C
H3 C

O

CH3

+

HO

CH3

The R groups of each molecule are highlighted in their corresponding adjacency matrices shown below. The adjacency matrix for methanol has the R group highlighted in blue
and the atom where it attaches to the pattern is highlighted in green.
C C C C O O
2
3
C 0 1 0 0 0 0
C O H
7
2
3
C6
61 0 0 0 2 17
C
0 1 0
6
7
C6 0 0 0 1 0 1 7
O4 1 0 1 5
7
C6
60 0 1 0 0 07
H 0 1 0
O4 0 2 0 0 0 0 5
O 0 1 1 0 0 0
In order to more easily visualize the transformation for ethyl acetate, we can permute
the rows and columns of the adjacency matrix to move the submatrix corresponding to the R
group to the top left hand corner of the matrix. Note that the only other non zero entry in the
first two rows and columns is the atom where that group connects to the rest of the molecule.
This entry is highlighted in green.
C C C C O O
2
3
C 0 1 0 0 0 1
C O H
7
2
3
C6
61 0 0 0 0 07
C 0 1 0
7
C6
0
0
0
1
0
0
6
7
O4 1 0 1 5
7
C6
0
0
1
0
2
1
6
7
H 0 1 0
O4 0 0 0 2 0 0 5
O 1 0 0 1 0 0
After the candidate species have been identified, the transformation is applied. From
the list of reaction rules, we know that the reaction results in the R groups of the two molecules
being exchanged, resulting in the products shown below.

REACTION NETWORK FOR ETHYL ACETOACETATE

45

O
CH3

C
H3 C

C C C O O
3
C 0 0 0 0 1
7
C6
60 0 1 0 07
7
C6
0
1
0
2
1
6
7
4
O 0 0 2 0 05
O 1 0 1 0 0
2

5.2.4

O

+

HO

CH2

CH3

C C O H
2
3
C 0 1 1 0
7
C6
61 0 0 0 7
O4 1 0 0 1 5
H 0 0 1 0

Network

This reaction network is bipartite, with the chemical species shown in round nodes, while the
reaction nodes are shown in green boxes. The input species, ethyl acetoacetate and methanol,
are shown in purple, while the final nonreactive products are shown in red. The intermediate
species, which are those chemical species that are generated by the algorithm but also having
patterns which allow them to react further, are those species shown in the black circles.
A few reactions have been omitted from this diagram because their products are already
a part of the network. For instance, methyl acetoacetate is a candidate for Reaction 6 with
methanol, however the products of that reaction would be methyl acetoacetate and methanol
which are already in the network.

REACTION NETWORK FOR ETHYL ACETOACETATE
Ethyl Acetoacetate

46

Methanol

R1
Ethyl Acetoacetate Enolate
R6

R5

Methyl Acetoacetate

Methyl Acetate

R12
Methyl Acetoacetate Enolate
R21

R2
Ethane

Acetone

CO2

Methane

R11
Acetone Enolate
R3
Diacetone Alcohol
R4
H2 O

Mesityl Oxide

Figure 5.3: The reaction network generated by Orlova’s Algorithm generated from ethyl acetoacetate and methanol.

REACTION NETWORK FOR ETHYL ACETOACETATE

5.2.5

47

Output: List of Chemical Species

Intermediate Species
Species Name
Ethyl Acetoacetate Enolate
Methyl Acetoacetate
Methyl Acetoacetate Enolate
Acetone
Acetone Enolate
Diacetone Alcohol

Patterns
Pattern 3, Pattern 4
Pattern 1
Pattern 7, Pattern 4
Pattern 1
Pattern 7
Pattern 8

Condensed Formula
CH3 C(OH) = CHCOOC2 H5
CH3 COCH2 COOCH3
CH3 COCH = C(O )CH3
(CH3 )2 CO
CH2 C(O)CH2
CH3 C(O)CH2 C(OH)(CH3 )2

Table 5.2: The reaction intermediates from the reaction network for ethyl acetoacetate and
methanol.
Non-Reactive Final Products
Species Name
Methyl Acetate*
CO2
Ethane
Methane
H2 O
Mesityl Oxide * *

Patterns
Pattern 1
Pattern 5
Pattern 6
Pattern 6
Pattern 9
Pattern 1

Condensed Formula
CH3 COOCH3
CO2
CH3 CH3
CH4
H2 O
(CH3 )2 C = CHCOCH3

Table 5.3: Final products of the reaction network for ethyl acetoacetate and methanol.
*Methyl acetate can undergo Reaction 6 with methanol, however the products of that reaction
will be the same as the inputs.
** Mesityl Oxide is a candidate for Reaction 1, however, its enol form does not react
further in this network.

Chapter 6

Conclusion
Chemical graph theory is a field that requires an understanding of both graph theory and
chemistry. Questions that are studied in one area, often have parallels to topics of interest in
the other. An example of such a parallel is the subgraph isomorphism problem discussed in
this paper. The subgraph isomorphism problem is directly related to chemistry, on account of
the importance of functional groups in chemical reactions. Some features of chemical graphs,
such as the natural partitioning of vertices into groupings based on atom type, make the
algorithms used in graph theory particularily effective when applied to chemical graphs. The
algorithms described in this paper focus primarily on the structural characteristics of chemical
species, however there are applications of these concepts to other aspects of chemistry including
reaction rates and kinetics.
There may also be improvements to consider for the reaction network algorithm described by Orlova [11]. For instance, the current algorithm does not track which reactions have
already been considered, and so each time a new chemical species is added to the list, each
reaction rule will be applied to all species with the requisite patterns, even those that have
already been checked. It would be possible to generate a list of species that have already had
specific transformations applied so that when a new species is added, only those species that
have not reacted with one another will be considered. It is possible, however, that the size of
most reaction networks mean that improvements like this do not significantly improve time or
efficiency of the algorithm.
Graph theory provides many useful tools and techniques which can be used to effectively
model molecular structure and chemical reactions. A few of these techniques were discussed in
this paper, however there are many more applications to many different areas of chemistry. The
algorithms described in this paper focus primarily on the structural characteristics of chemical
species, however there are also applications of these concepts to areas such as reaction rates
and reaction kinetics, and the physical and chemical properties of specific molecules.

48

Bibliography
[1] N. Biggs, E.K. Lloyd, and R.J. Wilson. Graph Theory, 1736-1936. Clarendon Press, 1986.
isbn: 9780198539162. url: https://books.google.ca/books?id=XqYTk0sXmpoC.
[2] J. A. Bondy and U. S. R. Murty. Graph Theory With Applications. Elsevier Science
Publishing Co., 1982.
[3] Arthur Cayley. “On the mathematical theory of isomers”. In: The Collected Mathematical
Papers. Cambridge Library Collection - Mathematics. Cambridge University Press, 2009,
pp. 202–204.
[4] 2-BUTENE(624-64-6) 1H NMR. Chemical Book, CAS DataBase List. url: https://
www.chemicalbook.com/SpectrumEN_590-18-1_1HNMR.htm.
[5] Robert B. Corey and Linus Pauling. “Molecular Models of Amino Acids, Peptides, and
Proteins”. In: Review of Scientific Instruments 24.8 (Aug. 1953), pp. 621–627. issn: 00346748. doi: 10.1063/1.1770803. eprint: https://pubs.aip.org/aip/rsi/articlepdf/24/8/621/19099249/621\_1\_online.pdf. url: https://doi.org/10.1063/1.
1770803.
[6] D.G. Corneil and C.C. Gotlieb. “An Efficient Algorithm for Graph Isomorphism”. In:
Journal of the Association for Computing Machinery (1970).
[7] C. Ebeling. “GeminiII: a second generation layout validation program”. In: [1988] IEEE
International Conference on Computer-Aided Design (ICCAD-89) Digest of Technical
Papers. 1988, pp. 322–325. doi: 10.1109/ICCAD.1988.122520.
[8] Ivan Gutman et al. “Why plerograms are not used in chemical graph theory? The case
of terminal-Wiener index”. In: Chemical Physics Letters 568-569 (2013), pp. 195–197.
issn: 0009-2614. doi: https : / / doi . org / 10 . 1016 / j . cplett . 2013 . 03 . 045. url:
https://www.sciencedirect.com/science/article/pii/S0009261413003771.
[9] Compiled by Alan D. McNaught IUPAC and Andrew Wilkinson. Compendium of Chemical Terminology: IUPAC Recommendations 2nd ed. Blackwell Scientific Publications,
2019.

49

BIBLIOGRAPHY

50

[10] Miles Ohlrich et al. “SubGemini: identifying subcircuits using a fast subgraph isomorphism algorithm”. In: Proceedings of the 30th International Design Automation Conference. DAC ’93. Dallas, Texas, USA: Association for Computing Machinery, 1993, pp. 31–
37. isbn: 0897915771. doi: 10.1145/157485.164556. url: https://doi.org/10.1145/
157485.164556.
[11] Yuliia Orlova. “Graph-theoretical approach to algorithmic construction of complex reaction networks”. PhD thesis. Sept. 2020.
[12] Yuliia Orlova, Ivan Kryven, and Piet D. Iedema. “Automated reaction generation for
polymer networks”. In: Computers & Chemical Engineering 112 (2018), pp. 37–47. issn:
0098-1354. doi: https : / / doi . org / 10 . 1016 / j . compchemeng . 2018 . 01 . 022. url:
https://www.sciencedirect.com/science/article/pii/S0098135418300462.
[13] J. R. Ullmann. “An Algorithm for Subgraph Isomorphism”. In: J. ACM 23.1 (Jan. 1976),
pp. 31–42. issn: 0004-5411. doi: 10.1145/321921.321925. url: https://doi.org/10.
1145/321921.321925.
[14] J.J.P. Veerman, T. Whalen-Wagner, and Ewan Kummel. “Chemical reaction networks
in a Laplacian framework”. In: Chaos, Solitons Fractals 166 (2023), p. 112859. issn:
0960-0779. doi: https : / / doi . org / 10 . 1016 / j . chaos . 2022 . 112859. url: https :
//www.sciencedirect.com/science/article/pii/S0960077922010384.