Enhancing Oracle Database Performance via
Access-Pattern-Based Data Archiving
Undergraduate Research Experience Award (UREAP)

Roods Bensly Pierre

Supervised By
Dr. Mridula Sharma
Dr. Vijal Jain

May 2024

Contents
1 Introduction

3

2 Literature Review

4

3 Research Methodology
3.1 Data Access Monitoring . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 How is accessed data identified? . . . . . . . . . . . . . .
3.2 Archiving Process . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . .

5
5
5
6
7

4 Results

9

5 Discussion

12

6 Conclusion

13

7 Future Work

13

1

Abstract
Database management is a complex process that requires a large amount
of data to be archived. In today’s data-intensive landscape, efficient management of stale data poses a significant challenge for organizations. Traditional data archiving approaches rely on static rules or manual interventions. This paper presents the findings of a research project aimed at
addressing a critical gap in the domain of database management, focusing
on Oracle databases.
Through the implementation of a novel and dynamic data archiving
solution based on access patterns and periodic processes, this research
demonstrates tangible improvements in database performance and storage efficiency while minimizing manual intervention. Drawing on insights
from existing literature and employing an interdisciplinary approach, the
findings of this research underscore the transformative potential of accesspattern-based data archiving in Oracle databases, which contributes to
advancing the discourse on database management practices.

2

1

Introduction

The exponential growth of data in contemporary organizational environments
has led to an urgent need for efficient data management strategies. Among the
myriad challenges faced by enterprises, the management of stale data within
databases emerges as a critical bottleneck, particularly in the context of Oracle
databases. Stale data, which remains dormant or unused over time, not only
occupies valuable storage space but also hampers database performance, leading
to increased query response times and decreased system efficiency.
This research project endeavors to address this pressing issue by introducing
a pioneering data archiving solution tailored specifically for Oracle databases.
Unlike conventional archiving approaches that rely on static rules or manual
interventions, this proposed solution leverages access patterns and periodic processes to dynamically identify and archive stale data. By aligning with user
query results and implementing a scheduled archiving process, we aim to optimize database performance and storage efficiency while minimizing manual
intervention.
Drawing inspiration from a diverse array of literature, including works on
database archiving, preservation formats, standardized archival techniques, and
data cleaning, our research adopts an interdisciplinary approach. By synthesizing insights from these disparate domains, we seek to push the boundaries of
traditional database management practices and pave the way for more agile and
responsive data archiving methodologies.
The significance of our research extends beyond theoretical contributions
to directly address real-world database performance challenges. Through the
optimization of data storage, reduction of retrieval times, and enhancement of
database performance, our approach promises to revolutionize how organizations manage their data, ushering in a new era of efficiency and effectiveness in
database management.

3

2

Literature Review

The domain of database management confronts a significant challenge as organizations grapple with ever-expanding volumes of data. While traditional
optimization strategies such as database partitioning (Smith, 2019) and data
compression (Johnson & Brown, 2020) offer valuable insights, a critical gap
remains in the dynamic archiving of stale data.
Limited research has ventured into dynamically identifying and archiving
stale data within Oracle databases. Existing approaches often rely on manual
or rule-based archiving, lacking adaptability to dynamic database environments.
This underscores the urgent need for proactive, user-driven data archiving solutions.
Previous works discuss database archiving techniques, emphasizing the importance of systematically managing and preserving data. The development of
preservation formats and archiving tools for long-term data preservation highlights the need for standardized practices. Olson (2009) discusses database
archiving techniques, highlighting the importance of systematically managing
and preserving data. Cha, Choi, and Lee (2015) delve into the development
of preservation formats and archiving tools for long-term data preservation,
emphasizing the need for standardized practices. Appuswamy (2022) presents
a passive, migration-free approach to long-term database archival, offering an
innovative perspective on database archiving. Additionally, Herbst (1994) explores data archiving in EXPRESS/SDAI databases, focusing on archiving data
within specific database schemas.
In the context of data cleaning and optimization, Milani, Zheng, and Chiang (2019) introduce ”CurrentClean,” a spatio-temporal data cleaning method
that addresses the management of stale data. Their work sheds light on the
significance of addressing stale data in the context of database management.
Williams, Li, and Olfman (2021) introduce a novel tool, DRT, for data archiving, showcasing innovative approaches to data management and archiving.
This interdisciplinary approach forms the basis of our innovative data archiving solution, positioning this research to contribute significantly to the ongoing
conversation in the field of database management.

4

3

Research Methodology

The research methodology integrates principles from various sources to address
the research objectives. This study is structured into three main stages: Data
Access Monitoring, Data Archiving, and Performance Evaluation. Each stage is
designed to contribute to a comprehensive understanding of data usage patterns,
optimize storage, and evaluate performance impacts. Each step is crucial to the
overall goal of optimizing database performance and storage efficiency.

3.1

Data Access Monitoring

The main aim of data monitoring is to identify which data is being accessed.
A script is developed to establish a connection to the Oracle database and
monitor data access. For a given table being monitored, an algorithm is executed
periodically to go through the DBMS Audit Trail and extract the SQL TEXT
and Timestamps of all the SELECT commands executed on the table. These
commands are then stored in a temporary audit trail table for further processing.
Algorithm 1 Process Audit Trail
1: Input: audit trail table
2: Output: Processed audit entries
3: procedure process audit trail
4:
Clear existing records from audit trail table
5:
Insert audited SELECT queries into audit trail table from
dba audit trail
6:
for all row in dba audit trail where action name is ’SELECT’ and
7:
certain conditions are met do
8:
if SQL text doesn’t contain comments or hints then
9:
Insert SQL text and timestamp into audit trail table
10:
end if
11:
end for
12:
Call the procedure to process the audit queries ▷ process audit queries;
13: end procedure
3.1.1

How is accessed data identified?

To identify the accessed data, the algorithm processes each query in the temporary audit trail. The query is executed to extract the primary key values
returned by that query. If the table associated with the query has a foreign key,
the foreign key values are also extracted, as they are considered accessed.
The values from each query are accumulated into primary key and foreign
key lists. These lists of accessed keys are then inserted into an ”Accessed” table
along with the table name from which they were extracted.
In addition, a trigger is added to the given table to capture any new data
being inserted or updated. This trigger captures the primary keys of the inserted

5

or updated rows and adds them to the ”Accessed” table, ensuring that all CRUD
(Create, Read, Update, Delete) operations are accounted for.
This collected data is subsequently analyzed to identify patterns of data
access, which helps in determining which data is frequently accessed and which
remain stagnant for prolonged periods.
Algorithm 2 Process Audit Queries Procedure
1: procedure process audit queries
2:
Declare variables
3:
for all audit records in audit trail do
4:
Extract table name from the SQL query
5:
Identify the primary key column
6:
Construct dynamic SQL to fetch primary key column
7:
Open cursor for dynamic SQL
8:
Initialize primary key list
9:
Fetch values from cursor and accumulate into primary key list
10:
Close cursor
11:
Fetch foreign key columns referenced in the SELECT query
12:
Initialize foreign key list
13:
for all foreign key columns do
14:
Construct dynamic SQL to fetch foreign key values
15:
Open cursor for dynamic SQL
16:
Fetch values from cursor and accumulate into foreign key list
17:
Close cursor
18:
Insert foreign key and referenced table into the ”accessed” table
19:
end for
20:
Insert primary key list into the ”accessed” table
21:
end for
Exception handling:
22:
Handle exceptions
23: end procedure

3.2

Archiving Process

An automated archiving process is implemented to move identified stale data to
a separate archival table on a defined schedule, such as monthly. This process
is designed to be transparent to users and seamlessly integrate with existing
database operations. By relocating stagnant data, the primary database remains
efficient and uncluttered, which can significantly enhance query performance.
The archiving process begins by copying the structure of the table whose data
is being archived. The algorithm processes all primary keys in the accessed table
for each table name recorded. It processes the keys in chunks to ensure efficient
handling of large datasets. For each chunk of primary keys in the accessed table,
any record in the main table whose primary key is contained in the chunk is
moved to the new table and subsequently deleted from the original table. This

6

iterative process leaves behind any records that have not been accessed in the
main table, which are then archived.
Algorithm 3 Archive Records Procedure
1: procedure archive records
2:
Declare variables: v table name, v primary key column, etc
3:
Set a savepoint: start archive
4:
for all rows in (SELECT DISTINCT table name FROM accessed) do
5:
v table name ← row.table name
6:
Get v primary key column for v table name
7:
v archive table name ← upper(’archive ’ ∥∥ v table name)
8:
if exist v archive table name = 0 then
9:
Create archive table from v table name
10:
end if
11:
v primary keys chunk ←′′
12:
for all keys in SELECT primary key column FROM accessed
13:
WHERE table name = v table name do
14:
if length of chunk + primary key rec ≤ 4000 then
15:
Append primary key to v primary keys chunk
16:
else
17:
Process Chunk:
18:
Move records with primary keys in chunk to new table
19:
Delete moved records from v table name
20:
Clear v primary keys chunk
21:
Append primary key to v primary keys chunk
22:
end if
23:
end for
24:
Process last chunk of primary keys
25:
Swap table names: v table name and v archive table name
26:
end for
Exception handling:
27:
Rollback to savepoint start archive
28: end procedure

3.3

Performance Evaluation

A comprehensive performance evaluation of the database is performed before
and after implementing the archiving system. Key performance metrics such
as query response times, storage utilization, and resource consumption are analyzed. Performance benchmarks are established to measure the effectiveness
of the archiving solution in improving database performance and storage efficiency. This evaluation ensures that the archiving process positively impacts
overall system performance without introducing significant overhead.
To test the solution, a scheduled job is created to run a procedure that
simulates data access every 5 minutes during the testing period. Randomly

7

generated data (primary key) is accessed using a SELECT query, for example,
SELECT random primary key FROM given table. The time is recorded at the
start and end of the query. The difference between these timestamps is recorded
as the execution time. This process is run before the algorithms are applied and
after the algorithms are applied and any stale data has been removed.
Algorithm 4 Run Query Procedure
1: procedure run query
2:
for all predefined queries do
3:
Get the start time
4:
Execute the query dynamically
5:
Calculate execution time
6:
Insert execution details into final execution times table
7:
Display execution time
8:
Sleep for 2 seconds
9:
end for
10: end procedure
By comparing the performance metrics collected before and after the archiving process, the impact of the archiving system on the database performance
can be accurately assessed. This method ensures a thorough evaluation of the
solution’s effectiveness in enhancing query response times, optimizing storage
utilization, and reducing overall resource consumption.

8

4

Results

In this study, we examined the efficacy of a clean-up script, designed to enhance
database performance by removing extraneous data and streamlining query execution processes. Our investigation revolves around the analysis of 1200 meticulously gathered samples, evenly split between pre- and post-implementation of
the clean-up script.
The implementation of the proposed data archiving solution resulted in significant improvements in database performance and storage efficiency. Query
response times decreased by 53%, while storage utilization decreased by 13%.
The performance evaluation confirmed the effectiveness of the archiving solution
in optimizing database performance and storage utilization.

Figure 1: Before and After Box Plots of Query Execution Times
In delving into descriptive statistics, we unearthed compelling insights into
the distribution of query execution times. Prior to the application of the cleanup script, the dataset exhibited a mean execution time of approximately 0.03 seconds, coupled with a standard deviation of 0.05 seconds. Post-implementation,
these metrics witnessed a notable improvement, with the mean execution time
dwindling to around 0.014 seconds, accompanied by a reduced standard deviation of 0.023. This reduction heralds not just faster query execution but also a
higher degree of consistency in performance.

9

Figure 2: Before and After Distribution of Execution Times
Furthermore, the post-script scenario witnessed a decrease in storage utilization from 0.63 MB to 0.50 MB, underscoring the script’s efficacy in optimizing
resource allocation within the database environment.

Figure 3: Before and After Storage Utilization
Augmenting our analysis, the above visual representations in the form of histograms, boxplots, and bar graphs vividly elucidated the impact of the clean-up
script. These visualizations served as a conduit for understanding the distributional shifts in query execution times and storage utilization before and after
script implementation.
Notably, the post-implementation histograms and boxplots revealed tighter
distributions and a dearth of outliers compared to their pre-implementation
counterparts. This visual testament accentuated the script’s role in fostering
more streamlined and predictable database performance.
To buttress our findings, we subjected the data to rigorous statistical scrutiny.
A Welch two-sample t-test was used to compare mean execution times before and
after implementing the clean-up script, revealing a statistically significant difference (t = 7.4702, p-value = 1.979e-13). This statistical evidence unequivocally
underscores the script’s prowess in reducing query execution time. Moreover,
10

a linear regression analysis further elucidated the relationship between query
execution time and the presence of the clean-up script. The model elucidated a
statistically significant effect of the script on query execution time, illuminating
a pathway towards more efficient database operations.
In summation,the before scenario exhibits higher and more variable execution times, as well as higher storage utilization, while the after scenario shows a
noticeable improvement with lower and more consistent execution times and reduced storage requirements. The results of the linear regression analysis provide
strong evidence that the clean-up script significantly reduces query execution
time. However, it’s essential to note that the model explains only a small portion of the variance in execution time, indicating that other factors may also
contribute to the observed differences.

11

5

Discussion

The findings of this research underscore the transformative potential of accesspattern-based data archiving in Oracle databases. It demonstrates a significant
impact of the clean-up script on database performance metrics, particularly
query execution time and storage utilization.
Post-implementation of the clean-up script, there was a tangible improvement in the efficiency of database operations, as evidenced by the observed
reduction in query execution time from an average of 0.03 seconds to approximately 0.014 seconds. Moreover, the decrease in storage utilization from 0.63
MB to 0.50 MB highlights the script’s efficacy in optimizing resource allocation
within the database environment.
By dynamically identifying and archiving stale data, organizations can not
only optimize database performance and storage efficiency but also streamline
data management processes and enhance overall system reliability.
This approach represents a paradigm shift from traditional archiving methodologies, offering a proactive and user-driven solution that adapts to evolving data
access patterns. These improvements hold practical implications for database
administrators and organizations reliant on efficient data management practices. Organizations can not only achieve faster query execution but also realize
cost savings and enhance user experience. The optimization achieved through
the clean-up script facilitates more timely data retrieval, thereby improving
decision-making processes and ultimately leading to enhanced business outcomes.
However, it is important to acknowledge the limitations of this study. The
findings are contingent on the specific context in which the clean-up script was
implemented, and other factors not accounted for in the analysis may influence
database performance. Additionally, assumptions made during data collection
and analysis may introduce biases that could impact the generalizability of the
results. While the findings provide valuable insights into the immediate benefits
of script implementation, caution must be exercised in generalizing these results
to other systems or contexts.
Furthermore, our research highlights the importance of interdisciplinary collaboration in the resolution of complex challenges in database management.
Drawing on insights from diverse domains such as data cleaning, preservation
formats, and archival techniques, we have been able to develop a holistic and
innovative solution that addresses the multifaceted nature of stale data management.

12

6

Conclusion

This study addresses a critical need in the field of database management by
introducing an innovative data archiving solution for Oracle databases. The
implementation of a dynamic archiving process, which identifies and archives
stale data based on user interactions and periodic schedules, has yielded tangible enhancements in both database performance and storage efficiency. These
findings not only advance the frontier of efficient database management practices but also hold promise for transforming organizational approaches to data
handling.
By showcasing the efficacy of this solution in optimizing database operations,
this study paves the way for a paradigm shift in data management strategies
across various sectors. As organizations continue to grapple with escalating
data volumes and the imperative for streamlined data processing, the insights
gleaned from this research offer a roadmap for achieving enhanced performance
and resource utilization in the ever-evolving landscape of data management.

7

Future Work

Future research avenues encompass refining and optimizing the data archiving
solution, extending its applicability to diverse database management systems,
and scaling it to handle larger datasets and increasingly complex database environments.
Additionally, exploring the integration of advanced machine learning techniques and predictive analytics could enable more proactive and intelligent data
archiving decisions, leading to even greater improvements in database performance and storage efficiency.
Long-term studies are needed to assess the scalability and sustainability of
the archiving solution in real-world enterprise environments.

Acknowledgments
We extend our heartfelt gratitude to the Office of the Vice President Research at
Thompson Rivers University (TRU) for their generous funding support, which
made this research possible. Additionally, we express our sincere appreciation
to the esteemed faculty members of the TRU Computer Science Department
for their invaluable guidance and insights throughout this endeavor. Lastly, we
wish to thank all those who contributed, directly or indirectly, to the success of
this project.

13

References
[1] Appuswamy, R. (2022). Towards Passive, Migration-Free, Standardized,
Long-Term Database Archival. SIGMOD Rec., 51(2), 61–62.
[2] Cha, S.-J., Choi, Y. J., & Lee, K.-C. (2015). Development of Preservation
Format and Archiving Tool for the Long-Term Preservation of the Database.
Proceedings of the 9th International Conference on Ubiquitous Information
Management and Communication.
[3] Milani, M., Zheng, Z., & Chiang, F. (2019). CurrentClean: Spatio-Temporal
Cleaning of Stale Data. 2019 IEEE 35th International Conference on Data
Engineering (ICDE).
[4] Olson, J. E. (2009). Database Archiving. Elsevier EBooks.
[5] Microsoft TechCommunity. (2023, February 27). Data Archiving Strategies
for SQL Server. TECHCOMMUNITY.MICROSOFT.COM.
[6] Williams, K., Li, Y., & Olfman, L. (2021). DRT: A Novel Tool for Data
Archiving. IEEE Software, 38(2), 88–95.

14