Thompson Rivers University Undergraduate Student Research Experience Award Program Final Report MEASURING AIR QUALITY - A COMPUTER FRAMEWORK FOR DATA COLLECTION AND ANALYSIS Undergraduate Researcher: Pablo Ulloa Primary Supervisor: Dr. Mila Kwiatkowska Secondary Supervisor: Dr. Mahnhoon Lee The Department of Computing Science Table of Contents Summary ............................................................................................................................... 3 1 Introduction ....................................................................................................................... 5 1.1 Background ..............................................................................................................................5 1.1.1 Particulate Matter ....................................................................................................................... 5 1.1.2 Motivation ................................................................................................................................... 5 1.1.3 Impact of PM2.5 on human health ............................................................................................. 6 1.2 Data Characteristics ..................................................................................................................7 1.2.1 Spatial dimension of data: the locations of the PM2.5 sensors .................................................. 7 1.2.2 Climatic conditions in Kamloops ................................................................................................. 8 1.2.3 Time dimension of data: data sets and time series .................................................................... 8 1.3 PM2.5 data available in Kamloops .......................................................................................... 10 1.3.1 Data from the government stations.......................................................................................... 10 1.3.2 Data from the citizen run units ................................................................................................. 14 1.3.3 Data from the experiments ....................................................................................................... 28 2 Descriptions and visualizations ......................................................................................... 32 2.1 The result of the Series of experiments ................................................................................... 32 2.1.1 Visualizing the relationship between the PM2.5 concentrations measured by PMS5003 and the intensity of the traffic ............................................................................................................ 37 2.1.1.1 Limitations of the experiments .............................................................................................. 38 2.1.1.2 Observation about comparison.............................................................................................. 38 2.2 Visualizing the data from the government stations and the PurpleAir units .......................... 38 2.2.1 Comparison of data from two sources ....................................................................................... 38 References: ......................................................................................................................... 53 Appendices: ........................................................................................................................ 57 Appendix A: ................................................................................................................................. 57 2 Summary In the summer of 2017, we designed, implemented, and tested a computer framework to collect, analyze, and visualize air quality data for concentrations of fine particulate matter (PM2.5). Our project had three phases: (1) construction and testing of an outdoor/indoor low-cost portable device based on the PM sensor (Plantower PMS5003) and Arduino technology for data collection and data storage for short-term experiments, (2) building a datawarehouse for the collection and aggregation of the PM2.5 data from multiple sources and locations within Kamloops, and (3) analysis and visualization of the PM2.5 data. In the first phase, we tested the performance of the PMS5003 sensor in a series of experiments in two Kamloops locations (low-traffic TRU campus and high-traffic intersection). We designed several experiments to test the responsiveness of the PMS5003 sensor to PM2.5 pollution from vehicles. The collected data show some differences in PM2.5 levels at the intersection with and without heavy traffic and with the varied background pollution (before and during the wildfires in BC). However, we have observed that the data are influenced by several environmental variables, such as wind, temperature, and relative humidity. In the second phase, we obtained several data sets for a multi-sensor analysis of the PM2.5 data from two sources in Kamloops: the two government-run stations and the 22 citizen-run PurpleAir units (based on PMS5003 sensors). We collected data from May 1 until August 23, 2017, stored the data in a database, and prepared the data for further analysis (marking missing data and errors). In the third phase, we created several programs for data aggregation and visualization. We used the database from Phase 2 and produced a series of graphs to compare the PM2.5 concentrations measured by multiple sensors located in various locations within the Kamloops area. Our study was exploratory and had several limitations: the PM concentrations vary greatly according to location and environmental factors; the sensors use different approaches for data collection, the data are reported at multiple averaging times; and numerous data are missing from the citizen-run devices. The results provided in this report represent a first step towards an integrated framework for PM2.5 data analysis from environmental monitoring instruments (government stations) and from a network of low cost sensors used by the citizen scientists. 3 4 1 Introduction 1.1 Background 1.1.1 Particulate Matter Particulate matter (PM) is an air pollutant consisting of a mixture of many minute solid or liquid particles floating in the air (Fine particulate matter - environmental reporting BC, 2015). The PM particles can be categorized by origin, source and physicochemical properties, but for practical reasons they are usually categorized by size (aerodynamic diameter) (Englert, 2004). Particulate matter of 10 micrometers in aerodynamic diameter or less are called PM10 (Working Group on Monitoring and Reporting, 2011); and particulate matter of 2.5 micrometer in aerodynamic diameter or less are called PM2.5 or fine particulate matter (Working Group on Monitoring and Reporting, 2011). Some definitions of PM2.5 do not include particulate of 2.5 micrometer in aerodynamic diameter (Pinault, 2017); however, for our study we used the definition specified by the Ambient air monitoring Protocol for PM2.5 and Ozone (Working Group on Monitoring and Reporting, 2011), which includes particulates of 2.5 micrometer in the definition of PM2.5. 1.1.2 Motivation PM is a major pollutant in the atmosphere (Pinault, 2017; Shi et al., 2017). Furthermore, PM is considered to have an important impact on global climate (Law & Stohl, 2007; Davis & Jixiang, 2000; Chen & Penner, 2005; Kanakidou et al., 2005) and to have a negative impact on human health and mortality (Apte, Marshall, Cohen & Brauer, 2015; Lim et al., 2012; Lu et al., 2015; Pope & Dockery, 2006). Because of PM impacts on humans and the environment the measurements of PM have become "crucial for epidemiology and air quality management" (Apte et al., 2017). Since the greatest contributor to air pollution is PM2.5 (Kelly, 2016), the following subsubsection focuses on the importance of measuring PM2.5 by looking at its impact on human health. 5 1.1.3 Impact of PM2.5 on human health This subsection reviews the associations between PM2.5 and health outcomes. First, it discusses the association between long term (annual exposure) to PM2.5 and human health. Second, it examines the association between short term (daily or hourly) exposure to PM2.5 and human health. Long term: annual exposure Annual exposure of PM2.5 has been strongly associated with higher levels of mortality. Studies have shown that the high level of annual PM2.5 concentrations are associated with cardiovascular related deaths, respiratory related deaths and lung cancer (Pope et al., 1995; Fang, 2016; Hoek et al.; Kan et al., 2012). For example, a US study found that for every increase of 10µg/m3 in annual concentration of PM2.5 the cardiovascular diseases and lung cancer mortality increased by 6% and 8% (Pope et al., 1995). Another study of 74 cities in China calculated that in 2013, the high level of annual concentration of PM2.5 (minimum 26 µg/m3; maximum 160 µg/m3; average 72 µg/m3) was associated with a mortality rate of 1.9% (Fang, 2016). Fang also calculated that for every increase of 10µg/m3 in PM2.5 the mortality increases by 5.37%. Short term: daily and hourly exposures Daily and hourly exposures of PM2.5 have been associated with higher mortality (Lin, 2017; Ostro, 2017; Pope, 2015; Atkinson et al., 2014; Madsen et al., 2012b). A number of epidemiological studies have found that daily levels of PM2.5 are associated with daily mortality and morbidity (Lin, 2017; Ostro, 2017; Pope, 2015; Atkinson et al., 2014). In fact, a metaanalysis of 110 time series studies found that for every 10µg/m3 increment in daily PM2.5 mortality increased from 0.25% to 2.08% (Atkinson et al., 2014). Furthermore, recent studies have also linked hourly PM2.5 concentrations to mortality "suggesting that peak concentration may be a better exposure indicator than daily mean" (Lin H., 2017). For example, Norway study reported an excess of 2.8 % in mortality risk for every 10 µg/m3 increase in hourly PM2.5 (Madsen et al., 2012) and a study of six cities in China reported hourly peak concentration of PM2.5 to be an important risk factor in mortality (Lin H., 2017). 6 The reviewed studies demonstrate that long-term exposure and short-term exposure to PM2.5 constitute a mortality risk. Also, the authors of these studies emphasize the importance of measuring PM2.5 concentrations at an hourly rate. 1.2 Data Characteristics Our data warehousing framework for PM2.5 has two dimensions: a spatial dimension representing various locations of sensors; and a time (time-series) dimension reflecting the changes in time (in minutes and hours). The spatial dimension has a high granularity (the highest precision); the data warehouse maintains the specific location of the sensors. This approach allows to study the spatial variability of PM data in Kamloops airshed. The time dimension stores the PM data at high granularity (minutes and, if not available, hours). Our data analysis framework provides software packages for data aggregation along these two dimensions. 1.2.1 Spatial dimension of data: the locations of the PM2.5 sensors This study examines the concentration of PM2.5 across the city of Kamloops. The city is located in the province of British Columbia, Canada with a population of 90,280 and covering an area 299.25 km2 (Government of Canada, Statistics Canada, 2017). In order to manage the air quality of Kamloops, the Ministry of Environment of British Columbia has defined the Kamloops airshed. An airshed is an area "where the movement of air and thus all pollutants can be restricted by local landforms" (What is in the air we breathe, 2012). The airshed encapsulates the area of the city, the area of the TK'emlúps Indian Band, and the area of the North and South Thompson River valleys, totaling 942 km2 (What is in the air we breathe, 2012). Our analysis of PM2.5 data is based on twenty-two sites, which are located inside the Kamloops airshed area. Figure 1 shows the airshed area with its boundaries. 7 Figure 1 : Kamloops Airshed (taken from What is in the air we breathe, 2012) 1.2.2 Climatic conditions in Kamloops Climatic conditions are an important factors in the behavior of air and the movement of PM2.5 and should be considered in the data analysis. Below are the overall climatic characteristics from 1971-2000 for the city of Kamloops (What is in the air we breathe;2012):  Daily Mean Temperature: January: -4.2° C; July: 21.0° C  Average Annual Precipitation: 279 mm (27% is in the form of snow)  Monthly Average Precipitation: 11.7 mm - 35.2 mm (heaviest in July and August)  Bright Sunshine: 2,075 hours  Fog: 60 or more days/year 1.2.3 Time dimension of data: data sets and time series This subsection describes four different studies of PM2.5 concentrations and their specific data sets. Each study utilizes a different time frame and different data collection methods. Study 1: We analyzed 36 hours of PM2.5 data starting on July 1st 2017 and ending of the July 2, 2017. The time frame was chosen to encapsulate a pollution event detected on the night of July 1, 2017 (fireworks used during the celebration of the Canada Day). The pollution event lasted around 10 hours and expanded across the city airshed and even reached the sensor located in Lac Le Jeune. 8 Study 2: We analyzed PM2.5 data for two summer months starting on June 01, 2017 and ending on August 1, 2017. The time frame was chosen to be able encapsulate a series of extreme pollution events. During the events the city was completely cover in smoke from near by forestfires. Each pollution event lasted a different amount of time, the longest pollution event lasted 5 days and the shortest lasted 24 hours. Study 3: We analyzed data from 17 exploratory experiments, the 17 experiments tested the sensibility of the PMS5003 sensor. Each experiments had a different time frame. On an average, the experiments lasted approximately 35 minutes; with the longest one lasting 50 minutes and shortest one lasting 16 minutes. Study 4: We analyzed all data available for every sensor in the PurpleAir network, totaling around 11 millions row of data, each row with 24 values. Each PurpleAir unit has different amount of data available. The oldest (the unit being installed for the longest time) PurpleAir sensor has data from September 2016 to August 2017. Table 1 shows the site locations, the installation dates, and the numbers of downloaded rows of data for the two sensors (sensor A and sensor B) inside the units. Each row represents data for varied intervals of time depending on the unit setup (e.g., 35 seconds and 5 minutes). Table 1: Summary of PurpleAir sites within Kamloops with installation dates and the number of rows available for this study until July 31, 2017 Site Name/Location Aberdeen Drive Armour Place Azure Place Battle Street P1 Battle Street West Braemar Drive Coldwater Drive Glenmore Drive High Schylea Drive Hugh Allan Lloyd George 1010Drive10 Lorne Street 1111111Elementary Monmouth Drive School Installation Date 2017/01/15 2016/10/07 2017/03/31 2016/10/08 2017/03/31 2016/09/22 2017/03/31 2017/01/16 2017/03/23 2016/12/10 2017/03/31 2017/03/31 2016/10/18 Number of Rows Sensor A 237 809 510 124 103 779 367 894 136 778 268 347 135 056 286 759 123 750 325 892 88 046 142 874 473 781 Number of Rows Sensor B 237 819 501 312 102 516 367 886 136 598 268 315 133 921 286 596 120 665 324 735 87 985 141 772 467 652 9 Mount Dufferin Mulberry Avenue 14Crescent Nicola Street Nicola Street West Ord Road Schubert Drive Strathcona Terrace Sun Rivers Valleyview Drive 2017/03/02 2016/09/22 2017/03/31 2017/02/26 2016/12/10 2017/03/23 2017/01/15 2016/10/07 2016/12/10 175 454 502 811 74 567 174 130 312 080 138 526 171 243 514 769 305 459 175 454 502 425 74 550 176 090 312 061 138 532 171 191 514 401 305 447 1.3 PM2.5 data available in Kamloops In our studies we used PM2.5 data from three sources. First, we used PM2.5 data from two government run air monitoring stations. Second, we used PM2.5 data from twenty-two citizen run air monitoring sites. Third, we used PM2.5 data from an experiment perform with our own portable device based on a Plantower PMS5003 sensor and Arduino technology. 1.3.1 Data from the government stations The Kamloops airshed has two air monitoring stations operated by the government of BC: the Kamloops Aberdeen station and the Kamloops Federal Building station (Southern interior air zone.). Figure 2 indicates their locations with red stars. Figure 2: Locations of the government air monitoring stations in Kamloops. 10 After communicating with the BC Ministry of Environment, we found that both station use the Thermos SHARP 5030 sensor to measure PM2.5 concentrations (Robles J., personal communication, July 6, 2017). 1.3.1.1 The Thermo SHARP5030 sensor The SHARP5030 uses two techniques to measure PM2.5 concentrations: a beta attenuation technique and an optical technique. The sensor uses both of these measurements to calculate the final and more accurate PM2.5 measurements (Robles J., personal communication, July 6, 2017; Thermo Fisher Scientific Inc, 2013). The sensor calculates the final PM2.5 measurements by multiplying the optical measurement by a calibration factor (Robles J., personal communication, July 6, 2017; Thermo Fisher Scientific Inc, 2013). The calibration factor is a based on the longterm ratio between the optical measurements and the beta attenuation measurements (Robles J., personal communication, July 6, 2017; Thermo Fisher Scientific Inc, 2013). The final PM2.5 measurement has hourly precision of ± 2 µg/m3 when the concentration of PM2.5 is bellow 80 µg/m3 and of an hourly precision of ± 5 µg/m3 when concentration is above 80 µg/m3 (Thermo Fisher Scientific Inc, 2013). Figure 3 shows SHARP5030 sensor at the location. Figure 3: SHARP5030 sensor (Thermo Fisher Scientific Inc., 2013) 11 1.3.1.2 The Aberdeen Station The Aberdeen station is located south of the city, at 2330 Pacific way. With a latitude of 50.63694 degrees and a longitude of -120.37207 degrees. The station is in a suburban area and has an elevation of 857 m (BC Ministry of Environment, b). The data from the station is downloadable from the stations website (BC Ministry of Environment, a). The station website provides the option to download eight different types of measurements (BC Ministry of Environment, a) (Robles J., personal communication, July 6, 2017): 1. TRS (Total Reduce Sulfur compound, its unit is part per billion (ppb) ). 2. SO2 (Sulfur Dioxide, its unit is ppb) 3. NO (Nitrogen Oxide, its unit is ppb) 4. NO2 (Nitrogen Dioxide, its unit is ppb) 5. O3 (Ozone, its unit is ppb) 6. PM25 (final PM2.5 measurements and its unit is in µg/. 7. OPTIC_SHARP measurement is the PM2.5 measurement of the optical sensor inside the SHARP5030. 8. PM10_BAM measurement is the PM10 measurement reported by the beta attenuation sensor inside the SHARP5030. Observation: However, three of the eight measurements were not available (TRS, SO2 and O3). Downloading and storing data for the Aberdeen station: This study focuses on PM2.5 values and therefore we stored the final calibrated PM2.5 measurements of of the SHARP5030 (measurement 6) and the PM2.5 optical measurement of the SHARP5030 (measurement 7). The PM2.5 data from the station was stored in a comma separated values (csv) file with three values: dateTime row with ISO 8601 standard format YYYY-MM-DDTHH24:MI:SS in the local time zone (UTC -7), the final calibrated PM2.5 value in µg/m3 and the optical PM2.5 value in µg/m3. 1.3.1.3 The Federal Building Station The Federal station is located downtown, on the roof of the federal Building (BC Ministry of Environment, e). The federal building is located at the intersection of 3rd street and Seymour 12 Avenue (BC Ministry of Environment, e). The station has an elevation of 381 m, a latitude of 50.67477 degrees and longitude of -120.334016 degrees (BC Ministry of Environment, e). The data from the station is downloadable from the website (BC Ministry of Environment, c). The website provides the option to download ten different types of measurements (BC Ministry of Environment, c; Robles J., personal communication, July 6, 2017): 1. TRS measurements (Total Reduce Sulfur compound; its unit is part per billion (ppb) ) 2. SO2 measurement (Sulfur Dioxide; its unit is ppb) 3. PM2.5_BAM measurement (the PM2.5 measurement of the beta attenuation sensor inside the SHARP5030 in µg/m3) 4. NO measurement (Nitrogen Oxide; its unit is ppb) 5. NO2 measurement (Nitrogen Dioxide, its unit is ppb) 6. O3 measurement (Ozone, its unit is ppb) 7. PM25 measurement (the final PM2.5 measurements and its unit is in µg/m3) 8. OPTIC_SHARP measurement (the PM2.5 measurement of the optical sensor inside the SHARP5030) 9. PM10_BAM measurement (the PM10 measurement reported by the beta attenuation sensor inside the SHARP5030) 10. BETA_SHARP measurement (the measurement of PM10 performed by the beta attenuation sensor inside the SHARP5030 in µg/m3) Observation: However, one measurement was not available for download and had no values (PM2.5_BAM) (BC Ministry of Environment, c). Downloading and storing data for the Federal Building station: This study focuses on PM2.5 values and therefore we stored the final calibrated PM2.5 measurements of of the SHARP5030 (measurement 7) and the PM2.5 optical measurement of the SHARP5030 (measurement 8). The PM2.5 data from the station was stored in a comma separated values (csv) file with three values: dateTime row with ISO 8601 standard format YYYY-MM-DDTHH24:MI:SS in local time (UTC -7), the final calibrated PM2.5 value in µg/m3 and the optical PM2.5 value in µg/m3. 13 1.3.2 Data from the citizen run units The citizen run units are the PurpleAir low cost PM monitoring sensors. PurpleAir is a grass-root organization that designed a low-cost air monitoring unit and provides data collection service. The low-cost air monitoring unit, can be bought from the PurpleAir website at 229 US dollars (PurpleAir, d). The unit is designed to measure PM pollution and immediately communicate the results in real-time (PurpleAir, d). 1.3.2.1 Location of the PurpleAir sites within Kamloops airshed At the time of this study, there were twenty-three PurpleAir sites (later 24) around the city of Kamloops each with one PurpleAir unit (PurpleAir, b). Twenty-two of the 23 sites are located inside the Kamloops airshed. The one site located outside the airshed, is at Lac Le Jeune Provincial Park. The locations of the twenty two sites in the Kamloops airshed are shown in Figure 4. The sites outside of the Kamloops airshed are shown in Figure 5. Figure 4: A map with the locations of the twenty two PurpleAir sites inside the airshed of Kamloops (the Thompson Rivers University site is not shown). The sites are indicated with circles (the color visualize the level). Each site has one PM2.5 reading - the PM2.5 level in center of the circle. 14 Figure 5: A map highlighting the two sites near the city of Kamloops: Lac Le Jeune and Armour Place 1.3.2.1 Components of the a PurpleAir unit Not all the PurpleAir units are identical but all units have the following main components: two Plantower particulate matter sensors, one temperature/humidity sensor, a Wi-Fi module and an enclosure (PurpleAir, c). Over two years of existence, PurpleAir have been trying to improve their unit by changing two of its components: the enclosure and the pair of particulate matter sensor. The enclosure used by the monitoring unit has changed. Currently, the unit uses a white enclosure as shown in Figure 7; however it used to have a purple enclosure as shown in Figure 6 (Kelly et al. 2017). The PM sensor inside the PurpleAir unit has also changed. The first design used the Plantower PMS1003 sensor (Kelly et al. 2017), and as the Plantower company release new versions of the particulate matter sensor PurpleAir started using the new versions of the sensor for their design (Kelly et al. 2017). All the PurpleAir units in Kamloops use a pair of PMS5003 sensor. The two components that have been changed may influence the accuracy and precision of the PM measurments. Kelly "identified potential interference caused by the sensor housing" (Kelly et al. 2017). 15 Figure 6: Enclosure previously used by PurpleAir (Kelly et al. 2017) Figure 7: Current enclosure for PurpleAir monitoring unit (PurpleAir ). 1.3.2.2 The properties of the Plantower PMS5003 The Plantower PMS5003 is a low-cost particulate matter sensor, it uses a laser-induced light and a photo-diode detector to measure PM (Yong, 2016b). We have not found any peer-reviewed paper or an independent evaluation of the precision and accuracy of the PMS5003. However, the accuracy and precision of the PMS1003 and the PMS3003 have been evaluated by a peerreviewed paper (Kelly et al., 2017). The paper compared the PMS1003 and the PMS3003 against “two federal equivalent (one tapered element oscillating microbalance and one beta attenuation monitor) and gravimetric federal reference methods (FEMs/FRMs) as well as one research-grade instrument (GRIMM)”. The results show that the 24 hours and hourly averages of the outdoor PM2.5 measurements of the PMS1003 and the PMS3003 correlated well against the measurements of the Thermo Scientific SHARP 5030 Bam sensor, the Thermo Scientific 1405-F sensor (TOEM) (R2 ≥ 0.83) (Kelly et al., 2017). However, the behavior of the PMS1003 was not consistent and certain conditions cause the PMS1003 to behaved differently, following are the 16 most relevant conditions (Kelly et al., 2017): 1. The results of the research paper indicated that the PMS1003 overestimated the PM2.5 levels when the levels exceed 10 µg/m3. Furthermore, the overestimation particulate matter changes/varies base on the size of the particle. For example, for the particulates of 0.3 µm, that are included in the PM2.5 measurements the sensor overestimated the number of particulate matter by a factor of 1.1–1.9. The overestimation increases with particle size and for the largest size of 10 µm, the overestimation had a factor of 30–500 (Kelly et al. 2017). 2. The best fit between the scatter plot of PMS1003 measurements and the TOEM measurements changed if the PM2.5 levels exceeded 40 μg/m3. This lead to the creation of two fits for PMS1003 data (Kelly et al. 2016): a. The first fit for PM2.5 up to 40 μg/m3: PM2.5PMS = 1.81PM2.5TOEM−1.37 b. The second fit for PM2.5 higher than 40 μg/ m3: PM2.5PMS = 90.9e−0.0333∗PM2.5TOEM−7.16 up to 40 μg/m3 3. The paper found that for the PMS1003 “response to PM concentration varies with particles properties to a much grater degree than the research grade instrumentation” (Kelly et al. 2017). For the PM2.5 data study, we assumed that the results of the evaluation of the PMS1003 also apply to the PMS5003. This assumption had to be made since we could not find any independent evaluation of the PMS5003, but according to the datasheet of the manufacturer the PMS5003 is almost identical to the PMS1003, with only the enclosure and the dimension of the sensor being different (Yong, 2016b; Yong, 2016a). 1.3.2.3 The internal operation of the PurpleAir unit There is no published information about the algorithms used by the PurpleAir units. Thus, in order to study the PM2.5 data from the PurpleAir, we have communicated with the creator of the unit firmware Adrian Dybwad (A. Dybwad, personal communication, March 3, 2017). The communications with Mr. Dybwad consisted of 17 emails and a phone interview. This 17 subsection uses as main source of information the communication with Mr. Dybwad. Each unit has two sensors (a pair of sensors) and the data is produced by both of them. Each PurpleAir unit performs three main actions, gathering data, processing the data, and sending the data to a server (A. Dybwad, personal communication, April 24, 2017). To complete those three actions, each unit runs an algorithm that continuously repeats ten steps in the following order (A. Dybwad, personal communication, May 27, 2017): 1. Step 1 collects data from the first PMS5003 sensor for 5 seconds. The first PMS5003 sensor continuously generates a set of twelve outputs (PM1.0_CF_ATM_µg/m3, PM2.5_CF_ATM_µg/m3, PM10.0_CF_ATM_µg/m3, 0.3um/dl, 0.5um/dl, 1.0um/dl, 2.5um/dl, 5.0um/dl, 10.0um/dl, PM1.0_CF_1_µg/m3, PM10_CF_1_µg/m3) every second, but this values are only recorded during the first step. During the five seconds of the first step all of the outputs of the sensor are recorded. This can generate either four or five sets with twelve outputs. 2. Step 2 collects data from the second PMS5003 sensor for 5 seconds. The second PMS5003 sensor continuously generates a set of twelve outputs every second, but this values are only recorded during the second step. During the five seconds of the second step all of the outputs of the sensor are recorded. This can generate either four or five set with twelve outputs. 3. Step 3 is collects the data from the temperature/humidity sensor, the Wi-Fi module and the ESP8266 chip. This step collects one value for temperature and one value for the humidity from the temperature/humidity sensor. This step also collects one value for the Wi-Fi signal strength from the Wi-Fi module and one value for number of minutes the ESP8266 chip has been on. 4. Step 4 averages the data from the first PMS5003 sensor. During the five seconds of Step 1, four or five sets of 12 outputs are recorded. The data has four or five values for every output of the sensor, the four or five values of the outputs are averaged. This process reduces the 5 or 4 sets of 12 outputs into a single set composed of 12 averages. 18 5. Step 5 averages the data from the second PMS5003 sensor. During the five seconds of the Step 2, four or five sets of 12 outputs are recorded. The data has four or five values for every output of the sensor, the four or five values of the outputs are average. This process reduces the 5 or 4 sets of 12 outputs into a single set composed of 12 averages. 6. Step 6 sends eight data values to the server. During this step all of the data gather in the third step are sent and the four values from Step 4 are sent. 7. Step 7 sends eight remaining values from Step 4 to the server. 8. Step 8 collects again the data from the temperature/humidity sensor, the Wi-Fi module and the ESP8266 chip. This step collects one value for temperature and one value for the humidity from the temperature/humidity sensor. This step also collects one value for the Wi-Fi signal strength from the Wi-Fi module and one value for number of minutes the ESP8266 chip has been on. 9. Step 9 sends eight data values. All of the data value from Step 8 are sent plus four values from Step 5. 10. Step 10 sends eight remaining values from Step 5 to the server. Figure 8 outlines the ten steps of the process. 19 Step 1: Collect 5 seconds of data from the first PM sensor.The PM sensor produces 4 or 5 set of 12 outputs in this step. Step 3: Collect data form the other 3 sources: the Wi-fi module , the ESP86 chip, and the temperature humidty sensor. Step 2: Collect 5 seconds data from the second PM sensor. The PM sensor produces 4 or 5 set of 12 outputs in this step. Step 5: Average data from the second sensor. Step 4: Average data from the first sensor. Step 9: Send 4 values from the second PM sensor and 4 values from the Step 7: Send the other 8 values of the first PM sensor. Step 6: Send 4 values from the first PM sensor and the data from the other three sensors. Step 8: Collect again the date from the other three source: The wifimodule, the ESP8266 chip and the temperaturehumidy sensor Step 10: Send the remining 8 values from the second PM sensor. Figure 8: The cycle of 10 step used by the PurpleAir algorithm ordered in a timeline 1.3.2.3 The data from PurpleAir units This study uses data from 23 identical PurpleAir units, all using the same enclosure and the PMS5003 sensor (PurpleAir, b). Each unit continuously repeats ten steps to gather, process and send the data to a server. Overall, the units output data from five sources, two PM5003 sensors, a temperature/humidity sensor, a Wi-Fi module, and a ESP8266 chip (PurpleAir, c). However, the units do not include an internal clock or internal memory and are not capable of recording time or of storing more than a few data measurements (A. Dybwad, personal communication, March 3, 2017). The data storage and the time measurements are handle by the server, in this case it is the ThingSpeak server (A. Dybwad, personal communication, March 3, 2017). ThingSpeak is a 20 platform that stores and analyzes data from any “internet of thing” sensors (The MathWorks,). The PurpleAir units send data to Thingspeak every 20 to 40 seconds (A. Dybwad, personal communication, March 3, 2017). Thingspeak records the time when data was received and stores the data. The Thingspeak service function by using “channels”, a channel can store up to eight values and has a REST API for communication (The MathWorks, ). In total, each PurpleAir unit sends twenty-eight measurements via four ThingSpeak channels (A. Dybwad, personal communication, March 3, 2017). Each channel stores 8 specific values, plus the date and time in which the data was received (The MathWorks,). The Channel 1 and 2 of each unit store data related to the PM measurements of the first PM sensor. The Channel 3 and 4 store of each unit store data related to the PM measurements of the second PM sensor. The details of the operation of each channel are descried below:  The channel 1 receives 4 values from the first PM sensor (PM1.0_CF_ATM_µg/m3, PM2.5_CF_ATM_µg/m3, PM10.0_CF_ATM_µg/m3), two values from the Wi-Fi module (UptimeMinutes, RSSI_dbm), two values corresponding to temperature and humidity sensor (Temperature_F, Humidity_%). This data is the data sent by Step 6 of the algorithm.  The channel 2 receives 8 values from the first PM sensor (0.3um/dl ,0.5um/dl ,1.0um/dl ,2.5um/dl ,5.0um/dl ,10.0um/dl ,PM1.0_CF_1_µg/m3 ,PM10_CF_1_µg/m3). This data is the data sent by the unit in Step 7.  The channel 3 receives 4 values from the second PM sensor (PM1.0_CF_ATM_µg/m3, PM2.5_CF_ATM_µg/m3, PM10.0_CF_ATM_µg/m3), two values from the Wi-Fi module (UptimeMinutes, RSSI_dbm), two values corresponding to temperature and humidity sensor (Temperature_F, Humidity_%). This data is the data sent by the unit in Step 9.  The channel 4 receives 8 values from the second PM sensor (0.3um/dl ,0.5um/dl ,1.0um/dl ,2.5um/dl ,5.0um/dl ,10.0um/dl ,PM1.0_CF_1_µg/m3 ,PM10_CF_1_µg/m3). This data is the data sent by the unit in Step 10. 21 1.3.2.4 The files downloaded The data from the PurpleAir units can be downloaded from the PurpleAir website upon request. The data from every PurpleAir unit is divided into four files, the "primary A" file, the "secondary A" file, the "primary B " file and the "secondary B" file. The four files are CSV files that represent the data from the four ThingSpeak channels (A. Dybwad, personal communication, March 3, 2017). The "primary A" file has the data from the Channel 1, it includes data from the first PM sensor, the Wi-Fi module and the temperature/humidity sensor. The file has the following columns: 1. dateAndTime 2. entry_id 3. pm1.0 CF= ATM in µg/m3 4. pm2.5 CF= ATM in µg/m3 5. pm10.0 CF= ATM in µg/m3 6. uptimeMinutes 7. RSSI_dbm in DB 8. The temperature in F 9. the humidity in % 10. the PM2.5 with CF=1 in µg/m3 The secondary A file has the data from Channel 2, it only includes data from the first PM sensor. The file has the following columns: 1. dateAndTime 2. entry_id 3. 0.3um/0.1 L 4. 0.5um/0.1 L 5. 1.0um/0.1 L 6. 2.5um/0.1 L 7. 5.0 um/0.1 L 8. 10.0um/0.1 9. PM1.0 CF=1 µg/m3 10. PM10 CF=1 µg/m3 22 The "primary B" file has the data from the Channel 3, it includes data from the second PM sensors, the Wi-Fi module and the temperature/humidity sensor. The file has the following columns: The primary B file has the following 10 columns: 1. dateAndTime, 2. entry_id, 3. pm1.0 CF= ATM in µg/m3 4. pm2.5 CF= ATM in µg/m3 5. pm10.0 CF= ATM in µg/m3 6. uptimeMinutes 7. RSSI_dbm in DB 8. The temperature in F 9. the humidity in % 10. the PM2.5 with CF=1 in µg/m3 The secondary A file has the data from channel 4, it only includes data from the second PM sensor. The file has the following columns: 1. dateAndTime 2. entry_id 3. 0.3um/0.1 L 4. 0.5um/0.1 L 5. 1.0um/0.1 L 6. 2.5um/0.1 L 7. 5.0 um/0.1 L 8. 10.0um/0.1 9. PM1.0 CF=1 µg/m3 10. PM10 CF=1 µg/m3 1.3.2.4.1 Definition of columns in the CSV files / Definition of PurpleAir data: As explained in section 1.3.2.3, the data of every PurpleAir unit is divided into four Thingspeak channels. Channels 1 and 2 store the data related to the first PM sensor. Channels 3 and 4 store 23 the data related to the second PM sensor. Every unit has in total four CSV files and each CSV has the data from a Thingspeak Channel (section 1.3.2.4). The "primary A" file has the data from channel 1, the "secondary A" file has the data from channel 2, the "primary B" file has the data from channel 3, and the "secondary B" file has the data from channel 4. The four CSV files have a total of 20 different columns, we communicated with the creator of the the PurpleAir firmware to find the definition of each value in the data (A. Dybwad, personal communication, March 3, 2017). Bellow are the definitions for each column: 1. The "created_at" column stores the date and time in coordinated universal time (UTC) and with the format YYYY-MM-DD HH24:MI:SS. The dateAndTime column is present in the CSV four files. This time does no point to the moment when the data was measured, but to the moment when the row of values arrived to the Thingspeak server. 2. The "entry_id" column contains a value relative to the CSV file. The value indicates the row number or line number inside the CSV file. The four CSV file have an "entry_id" column. 3. The " PM1.0_CF_ATM_µg/m3" column contains an estimate for the concentration in the air of particulate matter of aerodynamic diameter of 1.0um or less. The estimate is created by one of the PM sensors and is based on a correction factor call ATM. We were not able to find any information on how the ATM factor is created. The "primary A" file stores the value outputted by the first sensor and the "primary B" file stores the value outputted by the second sensor. 4. The " PM2.5_CF_ATM_µg/m3" column contains an estimate for the concentration in the air of particulate matter of aerodynamic diameter of 2.5um or less in the air. The estimate is created by one of the PM sensors and the estimate is base on a correction factor call ATM (information on the ATM factor was not available). The "primary A" file stores the value outputted by the first sensor and the "primary B" file stores the value outputted by the second sensor. 5. The " PM10.0_CF_ATM_µg/m3" column contains an estimate for the concentration in the air of particulate matter of aerodynamic diameter of 10.0um or less in the air. The estimate is created by one of the PM sensors and the estimate is calculated using a correction factor call ATM (information on the ATM factor was not available). The 24 "primary A" file stores the value outputted by the first sensor and the "primary B" file stores the value outputted by the second sensor. 6. The "UptimeMinutes" column contains a value indicating the amount time the unit CPU has been running since it was powered up, time is measured in minutes. This column is present in the "primary A" and "primary B" files. 7. The "RSSI_dbm" column contains a value representing the Wi-Fi signal strength of the Wi-Fi module inside the unit. This value is in Decibel, the column is present in the "primary A" and "primary B" files. 8. The "Humidity_%" column contains a value representing the humidity of the air. The humidity value is express in a percentage. The column is present in the "primary A" and "primary B" files. 9. The "Temperature_F" columns represent the temperature of the temperature/humidity sensor in degrees Fahrenheit. The temperature column is present in two of the four CSV files. The column is present in the "primary A" file and the "primary B" files. 10. The "PM10.0_CF_1_µg/m3" column contains an estimate of the concentration in the air of particulate matter of aerodynamic diameter of 10.0 um or less. The estimate is outputted by one of the PM sensors and the estimate is calculated using no correction factor. The "secondary A" file stores the value outputted by the first sensor and the "secondary B" file stores the value outputted by the second sensor. 11. The "PM2.5_CF_1_µg/m3" column contains an estimate of the concentration in the air of particulate matter of aerodynamic diameter of 2.5 um or less. The estimate is outputted by one of the PM sensors and the estimate is calculated using no correction factor. The "secondary A" file stores the value outputted by the first sensor and the "secondary B" file stores the value outputted by the second sensor. 12. The "PM1.0_CF_1_µg/m3" column contains an estimate of the concentration in the air of particulate matter of aerodynamic diameter of 1.0 um or less. The estimate is outputted by one of the PM sensors and the estimate is calculated using no correction factor. The "secondary A" file stores the value outputted by the first sensor and the "secondary B" file stores the value outputted by the second sensor. 25 13. The "0.3um/dl" column contains a value indicating the number of particulate matter in 0.1L of air with an aerodynamic diameter of 0.3 um or smaller. The unit of the column is particles/0.1L. 14. The "0.5um/dl" column contains a measurement of the number of particulate matter in 0.1L of air with an aerodynamic diameter in between 0.3 um and 0.5 um. The unit of the column is particles/0.1L. 15. The "1.0um/dl" column contains a measurement of the number of particulate matter in 0.1L of air with an aerodynamic diameter in between 0.5 um and 1.0 um. The unit of the column is particles/0.1L. 16. The "2.5um/dl" column contains a measurement of the number of particulate matter in 0.1L of air with an aerodynamic diameter in between 1.0 um and 2.5 um. The unit of the column is particles/0.1L. 17. The "5.0um/dl" column contains a measurement of the number of particulate matter in 0.1L of air with an aerodynamic diameter in between 1.0 um and 2.5 um. The unit of the column is particles/0.1L. 18. The "10.0um/dl" column contains a value indicating the number of particulate matter with an aerodynamic diameter of 10.0 um or smaller in 0.1L of air. The unit of the column is particles/0.1Liters. 1.3.2.4.2 Time data of the PurpleAir Units The PurpleAir units in this study do not have an internal clock and the time store with PM measurements do not point to the moment when data was measured but point to the moment when the data was received by ThingSpeak. Consequently, the order in which the algorithm gathers, processes and sends its measurements to ThingSpeak is crucial. Even though only four timestamps are recorded in the data, the ten-step algorithm has eight important time points. Below is a description of each time point ( "i" in the subscript denotes the number of iterations): 1. The first timestamp (t1starti) corresponds to the time when the unit starts gathering outputs from the first sensor. 2. The second timestamp (t1endi) corresponds to the time when unit stops gathering outputs from the first sensor. 26 3. The third timestamp (t2starti) corresponds to the time when the unit start gathering outputs from the first sensor. 4. The fourth timestamp (t2endi) corresponds to the time when unit stop gathering outputs from the first sensor. 5. The fifth timestamp (tchannel1i) corresponds to the time when the data of channel 1 was received by ThingSpeak. 6. The sixth timestamp (tchannel2i) corresponds to the time when the data of channel 1 was received by ThingSpeak. 7. The seventh timestamp (tchannel3i) corresponds to the time when the data of channel 1 was received by ThingSpeak. 8. The eight timestamp (tchannel4i) corresponds to the time when the data of channel 1 was received by ThingSpeak. According to the algorithm describe in section 1.3.2.3 the times have the following order: t1starti < t1endi< t2starti< t2endn < tchannel1i < tchannel2i < tchannel3i 16:45 High High medium/low very low : 0.185 car/s High traffic: 0.664 cars/s Yes 52 min Count 576 cars No smoke from forest fire 28 min Count 1116 cars July 4 Night 22:29 - 22:49 High winds Low traffic: 0.112 cars/s No smoke from forest fire 20 min Count 134 cars Low winds Yes : Single, control performed but 2 hours after measuring PM2.5 of intersections Yes : Single, control performed but 50 minutes after measuring PM2.5 of intersections Yes 8.84 µg/m3 Double control before and after(1st control 12 min ; 2sd control 10 minutes) Double control before and after (1st control 6 min ; 2nd control 6 min ) 5.03 µg/m3 7.41 µg/m3 24.68 µg/m3 3.07 µg/m3 34 Double control before and after (1st control 6 min ; 2nd control 6 min ) Double control before and after (1st control 6 min; 2nd control 7 min) 4.52 µg/m3 Count: 1260 Double control before and after (1st control 4 min; 2nd control 5 min) none Count: 138 cars Double control before and after (1st control 6 min; 2nd control 7 min) none July 5 Night 22:48 - 23:09 Medium winds Low traffic: 0.121 cars/s No smoke from forest fire. But a pedestrian pass by with a cigarette 21 min Count: 153 cars July 8 Early morning 00:39 - 01:00 Medium winds Very low traffic 0.055 cars/s No smoke from forest fire 19 min Count: 63 cars July 13 16:41 - 17:11 Low winds Very High traffic 0.700 cars/s No smoke from forest fires 30 min July 19: 22:18 - 22:39 Low/Medium winds 0.1095 cars/s No smoke 21 min from forest fire. But People Smoke in one corner 3.46 µg/m3 35 July 20 morning 07:50 - 8:20 Low winds High traffic: 0.477 cars peer second No smoke from forest fire 31 min Count: 888 cars Double control before and after (1st control 6 min; 2nd control 7 min) none In order to extract information from the experimental data we have created a python program to calculate the averages and to create a series of graphs for each of the experiments. 36 2.1.1 Visualizing the relationship between the PM2.5 concentrations measured by PMS5003 and the intensity of the traffic To visualize the relationship between the PM2.5 measurements and the traffic we have created a two dimensional scatter plot. The plot did not used all data from the experiments, it only used the data from the experiments that measured the traffic with a traffic count and a control. The unit of y-axis of the plot is µg/m3 and the unit of x-axis is cars/s. Each experiments created a single point on the plot:  The x-component of each point was the traffic average recorded during the experiment  The y-component of each point was the difference between two averages: the difference between the PM2.5 average recorded at the intersection and the PM2.5 average of the control values. The plot also excluded one outlier, that of July 1st. This data point is excluded since the PM2.5 background level increased significantly while the experiment where being performed. Additional PM2.5 pollution detected at the intesection in function of traffic Difference between PM2.5 of intersection and its corresponding control in µg/m3 4 3 2 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 -1 -2 -3 Traffic in cars/s Figure 15: Difference between PM2.5 level at the studied intersection and in the control location. Note that the negative values could be the result of the wind. 37 The experiments did not create enough data points to have a strong correlation between the difference of PM2.5 averages and the amount of traffic. 2.1.1.1 Limitations of the experiments The experiments were only observational experiments. They did not include a measurement of the wind and the control was not performed at the same time than the experiments. Furthermore, the traffic count does not take into account the fact that every car is different and produces a different amount of PM2.5. For example, a truck with a Diesel engine may produce more PM2.5 than a small car. 2.1.1.2 Observation about comparison The experiments that had "double control" indicated that the higher the PM2.5 levels in the background are, the higher are the variable background pollution levels. The higher the PM2.5 levels are the more variability between measurements. If the Background concentration of the PM2.5 were high enough and variable enough, the PMS5003 was not likely to detect a difference in pollution. 2.2 Visualizing the data from the government stations and the PurpleAir units 2.2.1 Comparison of data from two sources This study created a series of graphs comparing the PM2.5 PurpleAir data (PD) (described in section 1.3.2) to the PM2.5 data from the government (GD) (described in section 1.3.1). The purpose of the graphs is to provide a comparison between the PD relation to the GD. First, this study describes the limits of the comparison. Second, this study describes the process we used to manage the limitations of the comparison. Third, this study describes the methods used to create the graph. 38 2.1.2 Limitations The comparison between the GD and the PD is limited. The government stations and PurpleAir units are measuring PM2.5 at different locations, they are using different sensors to measure PM2.5 levels, and provide different data granularity. The government stations use the SHARP5030 sensor whereas the PurpleAir units use the PMS5003 sensor. Additionally, the data from the two sources have different granularities, the government stations only provides hourly averages and the PurpleAir units provide five seconds averages. Additionally, the frequency of measurements of the two sources is different, the government station provides an hourly average every hour but the PurpleAir unit provide a five second average for approximately every 40 seconds. The frequency in which the PurpleAir unit reports averages depends on the network and the setting of the unit, while most unit report every 40 seconds some of them are configured differently and can report data only every 5 minutes. An other major difference between the two data sources are their geographical locations (spatial dimension), each station and each site unit measure PM2.5 levels only in one specific location, while the PM2.5 level can on occasion be considered to be similar inside an airshed, the levels are not identical across the airshed and the values of one sensor can not be expected to be equal the values of an other sensor located in a different location. 2.1.3 Methods The graphs manage two major differences in the data. Firstly the data granularity and frequency, and secondly, the spatial differences between the stations and units. First, the difference in time granularity and frequency of measurements were managed by using an average. The goal of using the averages was to have the same amount of data points with the same timestamps for both the government sensors and the PurpleAir units. The data from the relevant PurpleAir sensors was averaged to have the same timestamp. If the GD was composed of daily averages the PD of sensors was gathered and averaged by days, if GD was composed of hourly averages the PD of the sensors was gathered and averaged for every hour. Second, the spatial dimension of study was managed by only including the data from the PurpleAir sites in the close proximity of the government run stations. The graphs include data from the sensors of PurpleAir that are no more than 1400 meters away from the governmental 39 stations. This distance was chosen because we used the data of at least four PurpleAir units for every governmental station. The distance between the stations and sites was calculated using their GPS locations and the "haversine" formula, this formula gives the shortest distance between two points in a sphere while ignoring all obstacles or hills. The earth radius used for the formula was of 6,371,000 meters. For the Federal Building there are six PurpleAir sites that are less than 1400 meters from the station: 1. The Strathcona Terrace site, with a distance of 1243.45 m away from the federal station. 2. The Nicola Street site, with a distance of 367 m away from the federal station. 3. The Nicola Street West site, with a distance of 547 m away from the federal station. 4. The Lorne Street site, with a distance of 432 m away from the federal station. 5. The Lloyde George elementary School site, with a distance of 1084 m away from the federal station. 6. The Battle Street P1 site, with a distance of 1217 m away from the federal station. For the Aberdeen station there are four PurpleAir sites with data that are less than 1400 meters from the station: 1. The Glenmohr Drive site, with a distance of 1150.9 m away from the Aberdeen station. 2. The Aberdeen Drive site, with a distance of 365.54 m away from the Aberdeen station. 3. The Huge Allan Drive site, with a distance of 1352.05 m away from the Aberdeen station. 4. The Braemar Drive site, with a distance of 1070.68 m away from the Aberdeen station. This study created ten graphs comparing the PM2.5 measurements of two government-run stations to the PM2.5 measurements of all the nearby PurpleAir sensors. Five graphs were created for each of government stations creating a total of ten graphs. Each graph compares the PM2.5 measurements of one government station to the PM2.5 measurements of the PurpleAir sensors near that government station over a period of time. The five time periods were chosen for the five graphs of the two government stations. The time periods were chosen to reflect the behavior under specific conditions: 1. The increasing levels of PM2.5 were represented for both government stations with two 40 graphs that depicted measurements staring on August 12, 2017 at 00:00 and ending on August 18 at 18:00. 2. The decreasing levels of PM2.5 were represented for both government stations with two graphs that depicted measurements staring on July 20, 2017 at 00:00 and ending on July 26 at 19:00. 3. The very high levels of PM2.5 were represented for both government stations with two graphs that depicted measurements staring on July 31, 2017 at 23:00 and ending on August 6 at 18:00. 4. The low levels of PM2.5 were represented for both government stations with two graphs that depicted measurements staring on July 30, 2017 at 00:00 and ending on July 31 at 23:00. 5. The overall levels of PM2.5 over the summer were represented for the government stations with one graph that depicted measurements staring on May 18, 2017 and ending on August 22. Each graph visualizes the PM2.5 measurements by plotting 3 graphs and a series of boxplots. The first graph represents the final PM2.5 average for an hour or a day of one government station. It was created by plotting a line between two consecutive hourly averages or two consecutive daily averages. The second graph represents the PM2.5 optical measurements created by the government stations. Each government station has two internal sensors, the optical sensor uses a similar technique to the PMS5003, we use the graph to analyze how the optical measurement of the government station compares to PMS5003 values. The third graph , plotted an "expected" hourly average, this expected hourly average represent the value that PMS5003 should provide if it follow the fit created by the analysis of the PMS1003( Kelly et al. 2017).The plot was created by using the government data an applying the following function provided in the paper describing the performance of the PMS1003 sensor:  if average PM2.5 is less than 40: ExpectedPM = 1.81PM government station−1.37  if average PM2.5 is 40 or greater: ExpectedPM = 90(1-e1-0.003PM goverment station)-7.16 Finally, the data of the nearby PurpleAir sensors are represented by a series of boxplots. The data for every sensor was gather and averaged for every hour. This produced a set averages for every 41 hour. The graph includes a boxplot for every hour and includes the following: the median, the mean, and the outliers. The median, the mean and the outliers of the set of averages are also represented in graphs. The median is represented with a light green color line across the boxplot. The mean for the hour is represented with a green triangle over the boxplot. The outliers are represented by small circles "o". The boxplot had the following parameters: 1. The interquartile (IQR) of the boxplot was compose of any value between the first quartile (Q1) and the third quartile (Q3). 2. The lower whisker of the boxplot was defined with the formula: Q1 - 1.5IQR 3. The top whisker of the boxplot was defined with the formula: Q3 + 1.5IQR 4. The outliers are drawn with circle and defined as any number bellow the lower whisker or any number above the top whisker. 42 2.1.4 Graphs Overall measurements of the Federal Building sensor and the nearby PurpleAir sensors during 3 month of summer: Figure 9: The graphs is plotting daily averages of the PM2.5 measurements at or near the Federal station over a period of 3 months. The daily average for the nearby PurpleAir sensor are graph with a boxplot in purple. 43 Graphs comparing the measurements of the two types of sensors when levels of PM2.5 were increasing: Aberdeen Station: Figure 10:Hourly average of PM2.5 for sensors at or near the Aberdeen station, when PM2.5 are increasing. 44 Federal Stations: Figure 11:Hourly averages of PM2.5 for sensors at or near the Federal station, when PM2.5 are increasing. 45 Graphs comparing the measurements of the two types of sensors when levels of PM2.5 decreasing: Aberdeen Station: Figure 12:Hourly PM2.5 averages of the sensors at or near the Aberdeen station, when PM2.5 levels are decreasing 46 Federal Station: Figure 20: Hourly averages of PM2.5 for sensors at or near the Federal station, when PM2.5 are decreasing. 47 Graphs comparing the measurements of the two types of sensors when PM2.5 levels are very high: Aberdeen Station : Figure 21: Hourly averages of PM2.5 for sensors at or near the Aberdeen station, when PM2.5 are at very high levels 48 Federal Station: Figure 22: Hourly averages of PM2.5 for sensors at or near the Federal station, when PM2.5 are at very high levels 49 Graphs comparing the measurements of the two types of sensors when PM2.5 levels are very low: Aberdeen: Figure 23: Hourly averages of PM2.5 for sensors at or near the Aberdeen station, when PM2.5 are at very low levels 50 Federal station: Figure 24: Hourly averages of PM2.5 for sensors at or near the Federal station, when PM2.5 are at very low levels 51 Observations: These graphs give some interesting insights about the behavior of the PurpleAir units. Here are some observations we believed are important: 1. The graphs showed that the formula used to create the "expected levels of PM2.5" breaks down at high levels of PM2.5 (figure 16 -22). This is expected since, the fit formula used to create the graph measured concentration of up to 80 µg/m3. 2. The optical measurements performed by the Thermo Scientific SHARP 5030, in general, are significantly closer to the measurements of the PMS5003 than the final measurements. 3. In general, the calculated "expected" levels of PM5003 when below 80 µg/m3 were closer than the final measurements of the Thermo Scientific SHARP 5030. 4. The averages of the PMS5003 near the government run stations registered most of the important increases and decreases in PM2.5 when compared against the government run station. The averages of PMS5003 clearly follow the general "trends" of the government run stations. 5. The distance between PurpleAir unit and the government run stations seems to play a greater importance at high levels of PM2.5. The above observations about the behavior of the PurpleAir network in Kamloops require more extensive and rigorous mathematical analysis. There is a clear need for future research. However, the forthcoming studies should be able to use our computer framework, visualization programs, our findings about the algorithms used by the PurpleAir, and our database of the PM2.5 measurements collected through the summer of 2017. 52 References: Apte, J. S., Marshall, J. D., Cohen, A. J., & Brauer, M. (2015). Addressing global mortality from ambient PM2.5. Environmental Science & Technology, 49(13), 8057. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/26077815 Apte, J. S., Messier, K. P., Gani, S., Brauer, M., Kirchstetter, T. W., Lunden, M. M., . . . Hamburg, S. P. (2017). High-resolution air pollution mapping with Google street view cars: Exploiting big data. Environmental Science & Technology, 51(12), 6999-7008. doi:10.1021/acs.est.7b00891 Atkinson, R. W., Kang, S., Anderson, H. R., Mills, I. C., & Walton, H. A. (2014). Epidemiological time series studies of PM2.5 and daily mortality and hospital admissions: A systematic review and meta-analysis. Thorax, 69(7), 660-665. doi:10.1136/thoraxjnl2013-204492 BC Ministry of Environment. (a). Aberdeen station report. Retrieved from https://envistaweb.env.gov.bc.ca/StationReportFast.aspx?ST_ID=479 BC Ministry of Environment. (b). Description aberdeen station  Retrieved from https://envistaweb.env.gov.bc.ca/StationDetails.aspx?ST_ID=479 BC Ministry of Environment. (c). Federal building station report. Retrieved from https://envistaweb.env.gov.bc.ca/StationReportFast.aspx?ST_ID=267 BC Ministry of Environment. (d). Southern interior air zone. Retrieved from http://www2.gov.bc.ca/gov/content?id=52720D41CCFD4D42ADD11803E66904F3 BC Ministry of Environment. (e). Station description federal building station. Retrieved from https://envistaweb.env.gov.bc.ca/StationInfo3.aspx?ST_ID=267 Bart, O., Wen-Ying, F., Broadwin, R., Green, S. & Lipsett, M. (2007). The effects of components of fine particulate air pollution on mortality in California: Results from CALFINE. Environmental Health Perspectives, 115(1), 13-19. doi:10.1289/ehp.9281 Brauer, M., Freedman, G., Frostad, J., van Donkelaar, A., Martin, R. V., Dentener, F., . . . Cohen, A. (2016). Ambient air pollution exposure estimation for the global burden of disease 53 2013. Environmental Science & Technology, 50(1), 79. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/26595236 Carvalho, H. (2016). The air we breathe: Differentials in global air quality monitoring. The Lancet. Respiratory Medicine, 4(8), 603-605. doi:10.1016/S2213-2600(16)30180-1 Chen, Y., & Penner, J. E. (2005). Uncertainty analysis for estimates of the first indirect aerosol effect. Atmospheric Chemistry and Physics, 5(11), 2935-2948. doi:10.5194/acp-5-29352005 Davis, B. L., & Jixiang, G. (2000). Airborne particulate study in five cities of china. Atmospheric Environment, 34(17), 2703-2711. doi:10.1016/S1352-2310(99)00528-2 Englert, N. (2004). Fine particles and human health—a review of epidemiological studies. Toxicology Letters, 149(1), 235-242. doi:10.1016/j.toxlet.2003.12.035 Fine particulate matter - environmental reporting BC. (2015). Retrieved from http://www.env.gov.bc.ca/soe/indicators/air/fine-pm.html Government of Canada, Statistics Canada. (2017). Census profile, 2016 census - Kamloops, city [census subdivision], British Columbia and Thompson-Nicola, regional district [census division], British Columbia. Retrieved from http://www12.statcan.ca/censusrecensement/2016/dppd/prof/details/page.cfm?Lang=E&Geo1=CSD&Code1=5933042&Geo2=CD&Code2=593 3&Data=Count&SearchText=kamloops&SearchType=Begins&SearchPR=01&B1=All&TA BID=1 Kanakidou, M., Seinfeld, J. H., Pandis, S. N., Barnes, I., Dentener, F. J., Facchini, M. C., . . . Wilson, J. (2005). Organic aerosol and global climate modelling: A review. Atmospheric Chemistry and Physics, 5(4), 1053-1123. doi:10.5194/acp-5-1053-2005 Law, K. S., & Stohl, A. (2007). Arctic air pollution: Origins and impacts. Science, 315(5818), 1537-1540. doi:10.1126/science.1137695 Lim, S. S., Vos, T., Flaxman, A. D., Danaei, G., Shibuya, K., …Adair-Rohani, H. (2012). A comparative risk assessment of burden of disease and injury attributable to 67 risk factors 54 and risk factor clusters in 21 regions, 1990-2010: A systematic analysis for the global burden of disease study 2010. The Lancet, 380(9859), 2224. doi:10.1016/S01406736(12)61766-8 Lin, H., Ratnapradipa, K., Wang, X., Zhang, Y., Xu, Y., Yao, Z., . . . Ma, W. (2017). Hourly peak concentration measuring the PM2.5-mortality association: Results from six cities in the pearl river delta study. Atmospheric Environment, 161, 27-33. doi:10.1016/j.atmosenv.2017.04.015 Lu, F., Xu, D., Cheng, Y., Dong, S., Guo, C., Jiang, X., & Zheng, X. (2015). Systematic review and meta-analysis of the adverse health effects of ambient PM2.5 and PM10 pollution in the chinese population. Environmental Research, 136, 196-204. doi:10.1016/j.envres.2014.06.029 Madsen, C., Rosland, P., Hoff, D. A., Nystad, W., Nafstad, P. & Næss. Ø, E. (2012). The shortterm effect of 24-h average and peak air pollution on mortality in Oslo, Norway. European Journal of Epidemiology, 27(9), 717-727. doi:10.1007/s10654-012-9719-1 Pinault, L., van Donkelaar, A., & Martin, R. V. (2017). Exposure to fine particulate matter air pollution in Canada. (No. 28).Statistic Canada. Pope, C. A., & Dockery, D. W. (2006). Health effects of fine particulate air pollution: Lines that connect. Journal of the Air & Waste Management Association, 56(6), 709-742. doi:10.1080/10473289.2006.10464485 Pope, I., C, Turner, M., Burnett, R., Jerrett, M., Gapstur, S., Diver, W., . . . Brook, R. (2015). Relationships between fine particulate air pollution, cardiometabolic disorders, and cardiovascular mortality. Circulation Research, 116(1), 108-115. doi:10.1161/CIRCRESAHA.116.305060 PurpleAir. (a). PurpleAir downloads. Retrieved from https://map.purpleair.org/sensorlist PurpleAir. (b). PurpleAir map. Retrieved from https://www.purpleair.com/map?&zoom=12&lat=50.673134166532904&lng=120.34322673797607&size=50&orderby=L&latr=0.14447222026635842&lngr=0.417480468 75 55 PurpleAir. (c). PurpleAir technology. Retrieved from https://www.purpleair.com/technology? PurpleAir. (d). Sensors - PurpleAir  Retrieved from https://www.purpleair.com/sensors The MathWorks, I.ThingSpeak documentation; Retrieved from https://www.mathworks.com/help/thingspeak/ Shi, G., Peng, X., Huangfu, Y., Wang, W., Xu, J., Tian, Y., . . . Russell, A. G. (2017). Quantification of source impact to PM using three-dimensional weighted factor model analysis on multi-site data. Atmospheric Environment, 160, 89-96. doi:10.1016/j.atmosenv.2017.04.021 Thermo Fisher Scientific Inc. (2013). Model 5030 instruction manual Yong, Z. (2016a). Digital universal particle concentration sensor PMS1003 series data manual Yong, Z. (2016b). Digital universal particle concentration sensor Pms5003 series data manual Working Group on Monitoring and Reporting. (2011). Ambient air monitoring protocol for PM2.5 and ozone Canadian Council of Ministers of the Environment. 56 Appendices: Appendix A: Conceptual Model of the database for PM2.5 data collection. 57