SECTION 2.2.3 Supporting Reproducible Research Gabriele Hayden, Tisha Mentnech, Vicky Rampin, and Franklin Sayre Introduction Claims are more likely to be credible—or found wanting—when they can be reviewed, critiqued, extended, and reproduced by others. All phases of the research process provide opportunities for assessing and improving the reliability and efficacy of scientific research. —National Academies of Sciences, Engineering, and Medicine 1 In the last decade there has been growing concern that many research studies are not reproducible. This has led to declarations that we are experiencing a “reproducibility crisis” and questions about the veracity of studies used for decision-making in everything from public policy to patient care. Studies looking at the reproducibility of disciplines have occurred in psychology, biology, biomedicine, neuroscience, drug development, chemistry, climate science, economics, and education, and in no field were the majority of findings found to be reproducible. Reproducibility is considered a fundamental characteristic of research and is a multifaceted concept that broadly refers to the ability of researchers to get the same results when repeating an analysis. There have been many proposed definitions of reproducibility and the related concept of replicability. Here we adopt the definitions proposed in 2015 by Bollen and Kaplin. Reproducibility is the ability to take the original methods and data and get the same results. Replicability entails the same methods but involves collecting new data and getting substantially the same results. (Because new data is collected, you wouldn’t expect exactly the same results.)2 While this chapter deals primarily with reproducibility, many of the recommendations for improving reproducibility will also make replication easier. In theory, all published and especially peer-reviewed research should be reproducible. After all, reproducibility doesn’t involve anything more than redoing what was originally done, using the original methods and data. In actuality the story is usually more complicated. Experimental and qualitative research increasingly involves large teams, computational tools, 171 172 Section 2.2.3 and vast amounts of data. Research in fields that do not use experiments will often have some elements that cannot be reproduced.3 At the same time, scholarly communication has stayed more or less the same for centuries: authors write short narrative reports describing their research, often with strict space limits, which are then peer-reviewed by others who may or may not have access to the underlying research materials, and finally publish in traditional journals. The underlying data is often carefully guarded and methods are described in abstract terms with critical details left out, thus complicating reproducibility. This book discusses many of the problems with traditional publishing and the advantages of adopting more open practices. The previous section discussed research data management (RDM) and data sharing and how those practices ensure that the study’s underlying data can be found, understood, and used. Reproducibility requires that we go further and share as much detail as possible as transparently as possible. Reproducibility thus provides one of the strongest justifications for open scholarly communication practices such as data and code sharing, transparent methods reporting, and open publishing models and metrics. Reproducibility requires that we adopt open and transparent methods, not just as an abstract good but as a fundamental part of research practice. Many of these areas are core aspects of academic librarianship and entail the expertise of librarians and other information professionals. They relate to the packaging of scholarship: how research is described and shared with others, how research is cited, and how impact is measured. The role of information professionals in supporting reproducibility is being increasingly recognized both within librarianship and by researchers, funders, and institutions. Stodden and colleagues highlighted the role librarians could play in “supporting a culture change toward reproducible …research” including using academic libraries’ rich connections with departments to support and manage digital scholarly output.4 In 2017 Vicky Rampin (then Vicky Steeves) described the new field of reproducibility librarianship,5 and the National Institutes of Health advisory committee recommended that they “lead efforts to support and catalyze open science, data sharing, and research reproducibility.”6 A conference about how librarians can support reproducibility that was organized by several of the authors of this chapter in 2020 drew almost 200 attendees and presenters from around the world.7 Librarians aren’t the only stakeholders engaged with this topic. Reproducibility is a complex and rapidly changing topic with new studies, guidelines, policies, organizations, and technologies being announced regularly. As discussed in the previous section, reproducibility is also highly discipline-specific; in some areas it has received significant attention, and institutions such as journals and funders have started putting in place measures to address concerns. In other areas researchers are just starting to think through how to make their work more reproducible. Providing a comprehensive guide to all these issues is beyond the scope of this section. Instead, we focus on five ways librarians can support reproducible research. These ideas are immediately actionable, build on existing services and expertise, and if widely implemented would have a major impact. These ideas are 1. Help researchers find and use reporting guidelines in order to improve reporting and transparency. 2. Promote and support preregistration of studies in order to improve evaluation of research. 3. Support researchers in creating computational pipelines in order to improve methods. 4. Preserve computational environments in order to improve sustainability. 5. Educate researchers about alternative and new scholarly metrics in order to shift incentives. These aren’t the only ways librarians and other information professionals can support reproducibility, and at the end of this section we briefly explore broader roles. We also provide an extensive guide to basic definitions, tools, and resources. Supporting Reproducible Research Five Big Ideas for Supporting Reproducible Research HELP RESEARCHERS FIND AND USE REPORTING GUIDELINES Many recommendations for improving reproducibility focus on improvising the reporting of a study’s methodology, analysis, and results. Traditionally, methods sections have been short descriptions of what the researcher did, with how that was communicated left almost entirely up to the author. Intentionally or unintentionally, details were often left out or described with too little detail to be reproducible by a reviewer or reader. This has become an even greater problem as research has grown more complex and reliant on more people and tools. Reporting guidelines are detailed lists of what researchers need to report and at what level of detail in order for readers to fully understand, evaluate, and reproduce a study. These guidelines promote transparent and accurate reporting by helping researchers think about what they need to report, either in the text of their article or by publishing data, code, or supplemental files. Many journals now require that authors follow reporting guidelines when submitting an article. There are different reporting guidelines for different study designs because each design requires authors to report different things. Guidelines are usually created by groups of researchers who look at what needs to be reported for a methodology to be understood and then publish a consensus paper in a major journal that outlines how the guideline was developed and what it requires. The best resource for finding reporting guidelines is the EQUATOR Network (https://www.equator-network.org/reporting-guidelines/), an international collaboration of groups seeking to improve reporting by creating, publishing, and promoting guidelines. The EQUATOR Network also collects these guidelines into a single resource with many of the most important guidelines hosted on its website. Many professional societies also offer discipline-specific guidelines; for example, the American Medical Association and the American Psychological Association include reporting guidelines as part of their style guides.8 Many of the best known guidelines are for qualitative methodologies such as randomized controlled trials (RCTs), but a good example of a guideline for qualitative studies is the “Standards for Reporting Qualitative Research: A Synthesis of Recommendations,” or SRQR.9 The SRQR is “a list of 21 items that we consider essential for complete, transparent reporting of qualitative research.”10 As you can see from table 2.5, guidelines simply list information that should be included. Librarians can help promote reporting guidelines by teaching researchers about them and promoting them on research guides, in workshops, and during consultations. Researchers often don’t know about these guidelines until they are preparing a manuscript for submission and are happy to have a template for thinking about what they will need to report when they are writing up their research. Guidelines can also help new researchers evaluate other articles and think through what they need to do when designing their own studies. Health science librarians have a long history of promoting, using, and creating reporting guidelines due to their work with systematic reviews and the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) checklist.11 The PRISMA checklist sets out elements that need to be reported in a systematic review for others to understand how the research was conducted. PRISMA has always asked researchers to report on aspects of their search strategy, but recently PRISMA-S has been published, an extension to the PRISMA Statement with specific elements required for understanding and reproducing.12 173 174 Section 2.2.3 TABLE 2.5 “Standards for Reporting Qualitative Research: A Synthesis of Recommendations,” or SRQR, Methods section. See full table at https://journals.lww.com/academicmedicine/_layouts/15/ oaks.journals/ImageView.aspx?k=academicmedicine:2014:09000:00021&i=T1-21&year=2014&iss ue=09000&article=00021&type=Fulltext. No. Topic Item S6 Researcher characteristics and reflexivity Researchers’ characteristics that may influence the research, including personal attributes, qualifications/experience, relationship with participants, assumptions, and/or presuppositions; potential or actual interaction between researchers’ characteristics and the research questions, approach, methods, results, and/or transferability S7 Context Setting/site and salient contextual factors; rationale* S8 Sampling strategy How and why research participants, documents, or events were selected; criteria for deciding when no further sampling was necessary (e.g., sampling saturation); rationale* S9 Ethical issues pertaining to human subjects Documentation of approval by an appropriate ethics review board and participant consent, or explanation for lack thereof; other confidentiality and data security issues S10 Data collection methods Types of data collected; details of data collection procedures including (as appropriate) start and stop dates of data collection and analysis, iterative process, triangulation of sources/methods, and modification of procedures in response to evolving study findings; rationale* S11 Data collection instruments and technologies Description of instruments (e.g., interview guides, questionnaires) and devices (e.g., audio recorders) used for data collection; if/how the instrument(s) changed over the course of the study S12 Units of study Number and relevant characteristics of participants, documents, or events included in the study; level of participation (could be reported in results) S13 Data processing Methods for processing data prior to and during analysis, including transcription, data entry, data management and security, verification of data integrity, data coding, and anonymization/deidentification of excerpts S14 Data analysis Process by which inferences, themes, etc., were identified and developed, including the researchers involved in data analysis; usually references a specific paradigm or approach; rationale* S15 Techniques to enhance trustworthiness Techniques to enhance trustworthiness and credibility of data analysis (e.g., member checking, audit trail, triangulation); rationale* * The rationale should briefly discuss the justification for choosing that theory, approach, method, or technique rather than other options available, the assumptions and limitations implicit in those choices, and how those choices influence study conclusions and transferability. As appropriate, the rationale for several items might be discussed together. PROMOTE AND SUPPORT PREREGISTRATIONS AND REGISTERED REPORTS Preregistration is when authors explicitly and publicly share or publish their study hypothesis and design in a journal or online repository before they begin their research. This is very different from the traditional model where the hypothesis and design are carefully guarded until publication out of fear of being scooped, though some repositories allow preregistrations to be embargoed (i.e., not made fully public for a limited period of time). Supporting Reproducible Research Preregistration helps limit bias by reducing opportunities for authors to change their methodology after collecting data in order to get a positive result, also known as HARKing (hypothesizing after results are known). Authors can still make changes to their plan after registration, but because it was already reported they would need to explain and justify those changes on publication. Preregistration also provides a framework for a research project and places an emphasis on transparency in planning and methodology from the start of the project. The Center for Open Science (COS) publicizes and offers model workflows for preregistration. More information on preregistration and preregistration templates can be found on the OSF Preregistration page (https://www.cos.io/initiatives/prereg). For decades, preregistration has been common and even mandated for some RCTs that are federally funded, and registrations are publicly available at ClinicalTrials.gov. Preregistration is also common for systematic reviews and publicly available in PROSPERO, a UK-based international prospective register of systematic reviews, and within some journals. Registered reports are a new publication type that wraps a preregistration into a journal article. These registered reports are peer-reviewed before data collection and are accepted in principle; that is, the journal agrees to publish the final results of the study on the strength of the proposed methods and regardless of the outcome. This has benefits for both individual research groups and the field as a whole. It helps reduce positive results bias, a type of publication bias in which only positive results are published, thus systematically biasing the published literature toward novel results that may not be reproducible. It incentivizes researchers to share their methodology in order to get valuable feedback before data collection and a guaranteed publication once the registered report is in principle accepted. More information can be found on the COS Registered Reports page (https://www.cos.io/initiatives/ registered-reports). Library and information professionals can support and educate researchers about the benefits of preregistration and registered reports and how they promote transparency in research, as well as walk them through the available templates and tools. Library and information professionals are involved in supporting all areas of the research enterprise. Promoting and providing information on preregistrations and registered reports is another way to actively support transparency and reproducible research in scholarly communication. HELP RESEARCHERS CREATE PIPELINES FOR COMPUTATIONAL REPRODUCIBILITY A number of drivers of irreproducibility relate to issues with study design and analysis.* In the introduction we discussed reproducibility and how it requires transparent open methods and data. Section 2.2.2, on RDM, explored how to manage and share research data so it can be understood, used, and shared. Here we extend these concepts to discuss computational reproducibility and specifically how computational pipelines can support more open and transparent methods. * Study design and statistical analysis are highly dependent on the discipline and method used and require disciplinary and methodological expertise. Issues related to study design and analysis are usually harder for librarians and information professionals to directly help address unless they have specific disciplinary and methodological expertise and are embedded on a project. There are some methodologies, such as systematic reviews, that librarians can directly support, but in many other cases it’s not appropriate. Here we focus on ways that all LIS professionals can support better methods. 175 176 Section 2.2.3 A computational pipeline is “an interdependent set of programs with manipulable parameters, including program version and input data, which output some usable result.”13 For instance, an analysis script in a programming language like Python or R that takes some data as the input and produces an output could be considered a computational pipeline. Another example could be a server website application from a digital humanities project that takes a database and displays a map for a GIS analysis. The goal of computational pipelines is to take any manual or isolated steps and turn them into holistic automated processes that can be documented, audited, and rerun. Data processing with OpenRefine is a good example of a well-documented computational pipeline.14 OpenRefine is an open source tool that allows users to load in data in many formats (e.g., CSV, TSV, HTML, etc.), transform and process it quickly and accurately, and then export it. It keeps track of the processing steps that you use in the order you run them as a JSON file.15 JSON is a machine- and human-readable text format that others can use to re-create the pipeline exactly. You can see an example of this in figure 2.2, where the left side shows in plain language the steps in order, and the right side shows the JSON version of the same workflow. If you give someone else this JSON file and your raw data, they will be able to reproduce your pipeline exactly. Figure 2.2 Screenshot of OpenRefine’s operation history. Supporting Reproducible Research Figure 2.3 Data workflow from Glenda M. Yenni et al., “Developing a Modern Data Workflow for Evolving Data,” preprint, BioRxiv, July 24, 2018, p. 5, https://doi.org/10.1101/344804 is used under CC-BY 4.0. OpenRefine is one of the most reproducible data-processing tools available because of this ability to export, reuse, and share a full pipeline. Given that OpenRefine’s pipeline steps are outlined in JSON, they can be easily version controlled so we can see how the pipeline changes over time in ways that potentially affect the research output. This information provides more context for the readers and reviewers. Computational pipelines can also be more complex when they involve additional computations, such as performing a calculation or creating a visualization. These computations add complexities to pipelines in terms of software and sometimes even hardware dependencies, as, for example, when the pipeline runs on high-performance computing systems. A great example of a more complex computational pipeline comes from Yenni and colleagues, who describe transitioning their lab from manual data processing (see figure 2.3) to a more automated computational pipeline.16 Their stated motivations were to 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow uses two tools from software development, version control and continuous integration to create a modern data management system that automates the pipeline.17 Yenni and colleagues wanted to eliminate potential spots for human error, as well as create a reproducible, well-documented computational pipeline that others can use.18 This speaks to one of the goals of computational reproducibility: at the end of a project, make a research compendium or a reproducible bundle of the pipeline.19 This is a package that contains all of the things necessary to reproduce your work, from data and code to the computational environment: Research compendia are an increasingly used form of publication, which packages not only the research paper’s text and figures, but also all data and software for better reproducibility.20 177 178 Section 2.2.3 Computational pipelines enable computational reproducibility, which is the ability to rerun a pipeline using the original computational environment and dependencies, facilitated through the use of research compedia. This is harder than expected, as it requires using the precise version of the software originally used with the exact parameters and input data, and occasionally even particular hardware configurations.21 The terms data reproducibility or code reproducibility are also used to describe similar goals but fall short because to reproduce others’ work, it’s necessary to have not only the data and code, but also the entire computational setting in which the research takes place. Without this computational setting, code may fail unexpectedly or may produce subtly different results. For example, Gronenschild and colleagues found significant differences in the results of neuroscience analyses when using the analysis software FreeSurfer in different types of computational settings (e.g., one running Mac OSX 10.5 and 10.6, an HP versus an Apple workstation, and with two different versions of FreeSurfer).22 What we are actually trying to keep reproducible is the entire analysis pipeline, hence the name computational reproducibility. And of course, there are often multiple pipelines for a given project—for the data preparation, data analysis, data visualizations, and so on. So when we talk about something being computationally reproducible (or not, or reproducible to a varying degree), we’re usually talking about one of those pipelines, not any particular object within them (such as data or code). The ideal situation is to package these computational pipelines into a research compendium that can later be rerun to verify research claims (by a peer reviewer, for instance), build upon it for complementary use cases, and teach newcomers valuable methodologies using real-world research. Sandve and colleagues discuss some basic steps toward computational reproducibility that center on creating computational pipelines.23 It’s worth noting here that computational reproducibility relies on following RDM best practices, just with a few extra steps (the computational part!). If the work can be rerun but not understood, there’s limited utility to it. Following the RDM best practices will make sure that your work is rerunnable and understandable not only by machines, but also humans. Creating research compendia that include the computational pipeline is the next step for full computational reproducibility. Computational Reproducibility in Primarily Non-computational Fields As computation and interdisciplinary work become widespread in fields that are still dominated by non-computational work, several issues arise. First, it becomes increasingly important that scholars who do not themselves use computational methods understand and value openness and computational reproducibility in order to engage with and assess the work of their colleagues. Second, the computational work practiced in primarily non-computational fields offers many of the same challenges to computational reproducibility, such as using proprietary software, building one-off programs that soon become orphaned or unsupported, or depending on graphical interfaces whose computational dependencies are challenging to document and archive. Digital humanists are scholars trained in the humanities who study digital artifacts or who use computational methods to study topics traditional to the humanities. They offer one example of scholars who often have colleagues who practice non-computational work. In the past, prominent practitioners sometimes used this fact to lend obscurantist weight to their work, as when Franco Moretti described literary data that he did not share or make open as factual and “independent of interpretation.”24 Even when transparency and computational reproducibility are valued, as they increasingly are, digital humanities projects often involve using or building programs with graphical user interfaces or other elements that do not lend Supporting Reproducible Research themselves easily to the project of reproducibility. These programs can be proprietary, one-off, or not supported over the long term by the research group that created them. Even when they are open source, the complexity of their technology stack (the programs used to build them) makes them more challenging to reproduce than statistical analyses or visualizations built in R or Python. A number of solutions have been proposed to address the challenges of hard-to-reproduce and hard-to-archive computational work. One solution used for archiving digital exhibits is web archiving technology. For example, many institutions use Archive-It for static websites and Webrecorder for interactive websites.25 However, these capture only images or video of the sites rather than the sites themselves. Another promising avenue is the use of ReproZip-Web,26 an extension of ReproZip (discussed below), to archive the software and code that creates digital journalism and other interactive data visualization websites.27 Qualitative social science researchers are another category of researchers whose work sometimes resists classical definitions of reproducibility and whose computational work faces reproducibility challenges. Many of the programs for coding qualitative data and managing media formats such as photos and video are proprietary. For qualitative social scientists, open source programs for qualitative coding such as Taguette and QCoder can simplify data sharing.28 PRESERVE COMPUTATIONAL ENVIRONMENTS TO ENSURE SUSTAINABLE REUSE OF RESEARCH In section 2.2.2, you learned about RDM and data sharing, and above we discussed how computational pipelines can be used to ensure different research workflows can be made reproducible and shareable. While software and data preservation are critical to ensure reproducibility, we also need to preserve the actual computational environment in which the research takes place (much as we need the environment for computational reproducibility!). Many modern research practices (and pipelines) rely on unique toolkits, and the output from these tools often depend on the actual software in which the research happens.29 Because of this, researchers need to be able to interact with research pipelines in their original computational environment to faithfully reproduce the work: “There is a very clear need to preserve not only digital objects, but reliable access to these objects, which means adopting one or more approaches toward software preservation.”30 This work is done by those involved in software preservation, a subfield of digital preservation concerned with selecting, accessioning, ingesting, describing, accessing, and archiving of software and associated contextual files (e.g., documentation). This is different from preserving source code: human-readable, uncompiled, plain-text files (e.g., script.py). Software refers to compiled code, such as an operating system, or an application (e.g., a file with an .exe extension for Windows programs). Software preservation spans many types of professional activities, ranging from large-scale downloading and archiving of software to in-depth software curation and emulation efforts for specific pieces of software or hardware.31 Hong and colleagues describe several techniques for software preservation: y Technical preservation (techno-centric)—Preserve original hardware and software in same state y Emulation (data-centric)—Emulate original hardware and operating environment, keeping software in same state 179 180 Section 2.2.3 y Migration (functionality-centric)—Update software as required to maintain same functionality, porting/transferring before platform obsolescence y Cultivation (process-centric)—Keep software ‘alive’ by moving to more open development model bringing on board additional contributors and spreading knowledge of process y Hibernation (knowledge-centric)—Preserve the knowledge of how to resuscitate/recreate the exact functionality of the software at a later date32 Hong and colleagues also outline the considerations for each strategy as it relates to research software in service of reproducibility and archiving of the scholarly record.33 The authors highlight technical preservation and emulation as ways to continue to access research materials in the long term for reproducibility, and migration, cultivation, and hibernation as the most applicable strategies to promote software reuse. One general digital preservation tool widely used for reproducible research efforts is BagIt.34 BagIt is a specification for hierarchical file system conventions. It was designed for export of content normally kept in database structures that are likely to degrade or lose support, as well as to be shipped around to different storage locations. A bag in the BagIt terminology has a payload and tags, which are metadata that describes the storage and transfer of the bag.35 This specification is widely used by computational reproducibility tools as a means of structuring the research compendia that can be exported out of the platform. Digital archivists use this format for storing data and code for the long term for manual curation processes. Exporting research compendia as bags is attractive because the format integrates well with archival repositories and version control systems, it has the ability to reference external data (e.g., external to the bag), it includes awareness of provenance, it is flexible, and it is readable by humans (see figure 2.4).36 Figure 2.4 Whole Tale (described in appendix B), for instance, allows users to export their tale as a bag. One project geared specifically toward software preservation taking active steps to be useful for reproducibility use cases is the Emulation-as-a-Service Infrastructure (EaaSI) project.37 This project seeks to scale up access to Emulation-as-a-Service (EaaS), which is a tool that spins up emulated computational environments whenever a user wants.38 EaaS uses configuration templates (stored in XML—which is also easily versioned and shared) to assemble emulated hardware and data as specified by the user. This is done either manually, through a web form, or automatically, by importing it from another tool (e.g., ReproZip bundles, Docker and Singularity containers). The user is presented with a virtual computing environment that mimics the behavior of a physical machine. This can be used as a virtual reading room for users who want to interact with legacy materials using the original software, Supporting Reproducible Research but it also can be used to reproduce legacy research no matter what operating system, no matter how far in the future.39 Software preservation is directly tied to the ability to reproduce research in the long term, as well as being a valuable activity for preserving an important cultural artifact. However, “successfully collecting, preserving, and providing access to software as a research object will likely require significant policy and procedural development for research libraries.”40 The costs associated with staff and resources, as well as legal and social challenges, make software preservation a difficult endeavor. However, it’s critically important in ensuring the sustainability of research created today, yesterday, and tomorrow. EDUCATE RESEARCHERS ABOUT CITATION PRACTICES TO CHANGE INCENTIVES Challenges to reproducibility and replicability persist in part because the pressures of publication, promotion, and tenure are often in direct conflict with best practices for reproducibility. Researchers are encouraged to publish novel and surprising results in high-impact journals and to keep data and other artifacts for themselves in order to maximize its potential future utility. This incentivizes poor and opaque research practices. These incentives exist not only at the individual level but also in the business models of large, interlocking institutions. They are bound up with university prestige and rankings, journal conventions and journal ranking systems, and department- and university-level tenure criteria. At the level of the individual actor (a researcher) or even at the level of the individual institution (a university or journal), the pressure to follow existing conventions is immense. Despite institutional and cultural challenges, changing how research is incentivized to promote reproducible and transparent practices is one of the best ways to help fix the reproducibility crisis.41 Researchers across a broad range of disciplines are advocating for the practice of reproducible research within their own fields and making practices of reproducible research part of their research agenda. Advocates often argue for discipline-specific reproducibility and are often responding to discipline-specific incentives that favor reproducibility.42 For example, social psychology became invested in reproducible research as a result of a crisis of legitimacy in the discipline.43 A decade later the push for reproducible research in psychology has begun to spread to the social sciences generally.44 Biomedical research faces a complex set of competing incentives, but ultimately the demand for drugs, treatments, and medical devices that work helps drive support for reproducibility among funders and government agencies.45 Many of these researchers advocate for broadening what is considered a legitimate output of research to include things like openly shared code and data, preprints, and new types of publications like registered reports. As researchers in individual disciplines slowly change the consensus within their field regarding what counts as research outputs, this in turn changes criteria for tenure and promotion within that field. Change can also come from the top down. The decision of major granting organizations like the National Science Foundation (NSF) to begin requiring data management plans (DMPs) in 2010 raised both the awareness and the practice of sharing data as well.46 It also helped drive the development of important infrastructure, such as data repositories. The position of research data management librarian 181 182 Section 2.2.3 exists in part due to the institutional need for someone to support faculty and principal investigators (PIs) writing DMPs for grants, and the recent move in libraries to support reproducibility similarly responds to demand for this support from funders.47 Thus, for example, when the National Institutes of Health launched new grant application instructions regarding rigor and reproducibility in 2015, the library faculty at the Spencer S. Eccles Health Sciences Library at the University of Utah responded by hosting a conference and symposium.48 This is one example of the ways changes in the research environment shape changes in libraries that support researchers. As librarians we have some ability to advocate for incremental change by increasing awareness and helping make reproducible practices easier in all the ways discussed in this section and throughout this book.49 One way librarians can support changes to incentives is in the areas we already support: educators and providers of scholarly metrics. We often educate students and faculty on how scholarly metrics work and provide metrics to faculty committees responsible for promotion and tenure. We can use these opportunities to talk about the problems with existing metrics and offer alternatives. When asked to put together a collection of metrics for a tenure, grant, or promotion package, we can include nontraditional outputs such as data sets, code, and registered reports. For instance, the Office of Scholarly Communication at Texas A&M University Libraries offers one particularly successful model of scholarly communication outreach. Director Bruce Herbert, himself a senior tenured professor of geology and geophysics, meets with faculty before they go up for tenure, helping them develop and communicate metrics appropriate to their field. For example, he helped one faculty member, a renowned poet, earn tenure by helping him document the presence of his poetry on syllabi of universities across the world using the Open Syllabus Project.50 Librarians with less seniority can nevertheless offer valuable resources to researchers, whether through LibGuides and other web-based materials or through one-on-one consultations.51 How Else Libraries Can Support Reproducibility As discussed in this section, improving reproducibility requires broad changes to how research is incentivized, conducted, and communicated. We’ve focused on five high-impact, immediately actionable interventions that build on the work information professionals already do. However, there are many other ways we can positively impact reproducibility. Table 2.6 outlines some of these interventions and includes citations of articles and other resources that discuss them further. The themes and interventions in this section are adapted from Sayre and Reigeman, who outline a broad array of interventions and supports that librarians can provide to help improve reproducibility.52 You may recognize the themes from the interventions outlined above. Among these interventions you will find roles for functional specialists, disciplinary liaisons, subject experts working within libraries, and anyone else who supports research and scholarly communication. The breadth of expertise required and disciplinary differences mean that any work done in these areas likely needs to involve collaboration between subject specialists, liaisons, and other specialized experts. Supporting reproducible research also requires institutional, national, and international infrastructure that librarians and information professionals are part of developing and supporting. Supporting Reproducible Research 183 TABLE 2.6 Library services contributing to reproducibility. Adapted from Franklin Sayre and Amy Riegelman, “Replicable Services for Reproducible Research: A Model for Academic Libraries,” College and Research Libraries 80, no. 2 (March 2019): 265. Theme Intervention Supporting Reproducible Methods Support for research methodologies with which LIS professionals have expertise, such as digital humanities, bibliometrics, and GIS.a Adoption of reproducible practices and transparency in our own research and work practice. Support for building computational pipelines for data processing, analysis, and visualization.b Support for systematic reviews and extending systematic review services to new disciplines outside the health sciences in order to improve researchers’ understanding of previous research on a topic.c Support for active research data management and help for researchers in managing their research data before and during the research process. Work with quality assurance offices, and training for new lab members on best practices during the research process itself. Connection of researchers to methodological and statistical support units on campus. Improving Reporting and Dissemination Help for researchers in finding and using guidelines and checklists (e.g., PRISMA, etc.) to improve methods reporting. Help for researchers in understanding preregistration and finding repositories for preregistration.d Provision of open access publishing services in order to increase publication of null results and reduce the effects of adverse incentives. Encouragement of replications through support, programming (e.g., reproducibility hackathon,e poster session featuring replication studies of graduate students), and institutional open access publishing. a. Allison Campbell-Jensen, “Award-Winning Changemaker,” Continuum (blog), University of Minnesota Libraries, September 30, 2020, https://www.continuum.umn.edu/2020/09/award-winning-changemaker/. b. Ana Trisovic et al., “Advancing Computational Reproducibility in the Dataverse Data Repository Platform,” in P-RECS ’20: Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems (New York: ACM, 2020), 15–20, https://doi.org/10.1145/3391800.3398173; Daniel Nüst and Matthias Hinz, “Containerit: Generating Dockerfiles for Reproducible Research with R,” Journal of Open Source Software 4, no. 40 (August 21, 2019): 1603, https://doi.org/10.21105/joss.01603; Reem Almugbel et al., “Reproducible Bioconductor Workflows Using Browser-Based Interactive Notebooks and Containers,” Journal of the American Medical Informatics Association 25, no. 1 (January 2018): 4–12, https://doi.org/10.1093/jamia/ocx120; David L. Donoho, “An Invitation to Reproducible Computational Research,” Biostatistics 11, no. 3 (July 2010): 385–88, https://doi.org/10.1093/biostatistics/kxq028; Carl Boettiger, “An Introduction to Docker for Reproducible Research,” ACM SIGOPS Operating Systems Review 49, no. 1 (January 2015): 71–79, https:// doi.org/10.1145/2723872.2723882. c. Melissa L. Rethlefsen et al., “Librarian Co-authors Correlated with Higher Quality Reported Search Strategies in General Internal Medicine Systematic Reviews,” Journal of Clinical Epidemiology 68, no. 6 (June 2015): 617–26, https://doi. org/10.1016/j.jclinepi.2014.11.025; Jonathan B. Koffel and Melissa L. Rethlefsen, “Reproducibility of Search Strategies Is Poor in Systematic Reviews Published in High-Impact Pediatrics, Cardiology and Surgery Journals: A Cross-Sectional Study,” ed. Brett D Thombs, PLOS ONE 11, no. 9 (September 26, 2016): e0163309, https://doi.org/10.1371/journal. pone.0163309. d. Amy Riegelman, “A Primer on Preregistration (& Why I Think It Should Be a Submission Track in LIS Journals)” (presentation, Librarians Building Momentum for Reproducibility, virtual conference, January 28, 2020), https://osf.io/ w4dfh/. e. Kristina Hettne et al., “ReprohackNL 2019: How Libraries Can Promote Research Reproducibility through Community Engagement,” IASSIST Quarterly 44, no. 1–2 (2020): 1–10, https://doi.org/10.29173/iq977. 184 Section 2.2.3 TABLE 2.6 Library services contributing to reproducibility. Adapted from Franklin Sayre and Amy Riegelman, “Replicable Services for Reproducible Research: A Model for Academic Libraries,” College and Research Libraries 80, no. 2 (March 2019): 265. Theme Intervention Supporting Sustainable Reuse of Research (Data, Code, Environment) Support for data curation (see section 2.2.2). Support for data/code/methods sharing, including educating researchers, running institutional data repositories, and helping define standards for citation and sharing. Support for preserving computational environments. Changing How Research Is Evaluated (Diversifying Peer Review) Education for researchers about new forms of peer review and publication, such as preprints, open peer review, and registered reports. Education for researchers about the benefits of preregistrations. Provision of support and repositories for preregistrations. Support for preprints and help for researchers in finding appropriate venues for depositing preprints, understanding journal guidelines (e.g., Sherpa Romeo) regarding copyright, and negotiating with journals. Changing the Incentives that Drive the Scholarly Ecosystem (Rewarding Open and Reproducible Practices) Help in creating citation standards for data, code, research materials, etc. Teaching for faculty, researchers, and students about how different citation metrics work and the costs and benefits of each, as well as the longevity of scholar identity (e.g., ORCID). Provision of citation data to tenure and promotion committees: providing citation data for data, code, software, and materials to tenure and promotion committees and advocating for changes to academic incentives. Definitions of reproducible research often imply that research means experiments that can be repeated or structured as a computational pipeline of software, code, and data that can be rerun. Yet for many disciplines, research involves something other than a controlled, repeatable computation or experiment.53 It could instead be the study of primary source documents, the interpretation of texts or works of art, the coding and analysis of interviews, or the documentation of events in time, from the eruption of Mount Vesuvius to the migration of a bird species during a particular year. As this last example suggests, even within STEM, research can be descriptive, exploratory, or documentary. Scholars argue about whether the concept of reproducibility should be applied to this work.54 However, there is broad agreement that openness or transparency with regard to methodology and data can allow research to be open to scrutiny and can make some elements of research processes and protocols reproducible. Open methodologies—While not all research involves controlled experiments, all research does require a methodology—a series of steps that an experienced researcher takes to develop an argument or explore a claim. And while a qualitative researcher or humanities scholar generally does not think in terms of computational pipelines and research protocols, their work may involve protocols and computation. Even if scholarship is either not fully reproducible or makes claims that do not fit within a paradigm of reproducibility, its data (objects of study and evidentiary claims) should be shared and its methodology should be as transparent as possible. As we have emphasized throughout this section, openness—sharing as much Supporting Reproducible Research as possible about the sources and methodology of research work—is what allows research to be questioned, verified, tested, or repeated. It is what allows work to enter the scholarly conversation.55 Understanding methodology and data and how they should be communicated is field-specific. Qualitative social scientists have engaged in long-standing scholarly conversations about research methodology and quality in qualitative and mixed-methods research that predate but importantly inform the push for reproducible research in these fields. Some important concepts include generalizability, reliability, rigor, and validity (see the entries for each of these terms in Lewis-Beck and colleagues).56 Other tools for increasing the credibility of qualitative research include audit trails, decision trials, and reporting guidelines.57 Recent scholarly interest in preregistration for qualitative research has built on this existing work, and OSF has recently made available a preregistration form for qualitative research.58 Finally, as qualitative research archived in repositories has begun to be reused, scholars have begun to refine their understanding of how methodologies should be documented in order to better allow for reuse.59 Scholars of literature and culture do not often use the language of methodology to describe their work, preferring instead to talk about the theories that inform particular works of scholarship. As humanist scholars have moved toward interdisciplinary and digital work that uses methodologies drawn from other disciplines, however, it becomes essential to share methodologies, since readers can no longer be assumed to have been trained in the same unspoken but shared methodological practices.60 Recent work on improving the reproducibility of systematic search terms offers one useful model for documenting archival research practices in the humanities.61 Open data—Data sharing is essential to transparency, openness, and any form of reproducibility. For more details, see “RDM for Qualitative and Humanities Research” in section 2.2.2, “Managing, Sharing, and Publishing Data.” Conclusion This chapter has outlined five interventions LIS professionals can implement to support rigor and reproducibility. They range from relatively traditional—helping researchers find guidelines and publish preregistrations—to highly technical, such as helping preserve computational environments. Also outlined are a range of other services that can impact rigor and reproducibility. All these interventions will improve the openness of the scholarly communication landscape generally. One of the best ways librarians can get involved in reproducibility is to adopt open, transparent, and reproducible practices in our own work. We can learn about and use tools like Markdown, R, Git, and Docker to make our own work more reproducible while also making it more efficient.62 This is usually the best way to learn about these tools so we can later help researchers employ them for their own work. When LIS professionals conduct research, we can and should preregister studies, follow reporting guidelines, use computational pipelines, and ensure that the computational environments of our own research are preserved, shared, and sustainable. We can also adopt incentives within our own communities that encourage the scholarly communication landscape we want to see by encouraging the sharing and citation of data, code, and other nontraditional publications. By doing so we learn about these processes, model their use, and can speak authentically about the value of open and reproducible scholarship. 185 186 Section 2.2.3 Appendix A: Glossary: Definitions of Reproducibility Concepts • Reproducibility “the ability of a researcher to duplicate the results of a prior study using the same materials and procedures as were used by the original investigator”63 • Reproducibility can have varying definitions depending on the discipline that defines it. Several sources explore the different ways reproducibility can be defined and applied.64 • Types of reproducibility – Empirical reproducibility—Traditional scientific notion of experimental researchers capturing descriptive information about (non-computational) aspects of their research protocols and methods – Computational reproducibility—The computational details and other information necessary for others to replicate the findings65 • Additional definitions of reproducibility – Methods reproducibility—The ability to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results – Results reproducibility (aka replicability)—The production of corroborating results in a new study, having followed the same experimental methods – Robustness—The stability of experimental conclusions to variations in either baseline assumptions or experimental procedures – Generalizability (aka transportability)—The persistence of an effect in settings different from and outside of an experimental framework – Inferential reproducibility—The making of knowledge claims of similar strength from a study replication or reanalysis66 • Replicability “the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected”67 • Transparency Transparency is reflected by clear and open communication about the methods and procedures used to obtain the research results and is foundational to reproducibility and replicability.68 • Rigor Rigor is the strict application of the scientific method to ensure unbiased and well-controlled experimental design, methodology, analysis, interpretation and reporting of results. • Repeatability The measurement can be obtained by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat their own computation.69 Supporting Reproducible Research • Research misconduct Research misconduct is the “fabrication, falsification, or plagiarism in proposing, performing, or reviewing research, or in reporting research results.”70 • Questionable research practices (QRPs) Research practices that may give “false impressions about the replicability of empirical results and misleading evidence about the size of an effect”71 Types of QRP could be • P-hacking “Occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant”72 • HARKing HARKing is defined as “presenting a post hoc hypothesis (i.e., one based on or informed by one’s results) in one’s research report as if it were, in fact, an a priori hypotheses.”73 • Preregistration Registering a research project or study before the study is conducted. Registrations typically include the hypothesis, study methods, and the research protocol.74 • Registered reports “Registered Reports is a publishing format used by over 250 journals that emphasizes the importance of the research question and the quality of methodology by conducting peer review prior to data collection. High quality protocols are then provisionally accepted for publication if the authors follow through with the registered methodology. This format is designed to reward best practices in adhering to the hypothetico-deductive model of the scientific method. It eliminates a variety of questionable research practices, including low statistical power, selective reporting of results, and publication bias, while allowing complete flexibility to report serendipitous findings.”75 187 188 Section 2.2.3 Appendix B: Tools for Computational Reproducibility This section outlines some open, scholar-led software projects that are aimed at helping researchers make their work computationally reproducible. While there are proprietary tools for computational reproducibility, they are not widely available, and this resource focuses on openly available tools as a matter of ethics. The options discussed here are all free and open source grassroots initiatives from scholars who are deeply invested in openness and reproducible research. Nuest and colleagues provide a wider survey of tools for computational reproducibility geared toward publishing computational research, which is inclusive of proprietary software as well as some open platforms described below.76 There are four classes of computational reproducibility tools that will be discussed in this section: 1. Containers—Lightweight, portable virtual operating systems 2. Web-based integrated development environments (IDEs)—Which provide code editing and execution and often have additional features for reproducibility 3. Web-based replay systems—Support for computational replay of materials that are hosted in a different place from the system 4. Packaging systems—Software that automatically captures dependencies and computational environments used at time of executing a computational pipeline CONTAINERS The research community has been increasingly using and sharing containers in service of reproducibility. Containers are a popular way to create virtual operating systems, like sandboxes, separate from the physical infrastructure and native operating system.77 Two popular container systems, Singularity and Docker, are especially popular for research reproducibility.78 Docker was made to “pack, ship and run any application as a lightweight container,” specifically with the advantage of working in most computational environments.79 It is widely used in software development to deploy software in the cloud as well as to ensure a common development environment among programmers. Several other tools described below rely on Docker in the backend to remain reproducible. Singularity was made for high-performance computing (HPC) work because of security considerations that both allow users full flexibility within the container and keep them from accessing parts of the HPC environment that administrators do not want users to access. Starting a Singularity container swaps out the host operating system environment for one the user controls without having root access and allows the user to run that application in its native environment.80 Singularity containers can then be shared to allow others to work in the same computational environment. Containers, however, are best used for short-term reproducibility. There are several problems with their use for long-term sustainability. Containers have no idea of provenance or the computational pipeline used—a container with code and data can be rendered virtually useless if not accompanied by extensive documentation about its inputs and workflow steps. In addition, learning how to use containers is also difficult, as it is not always practical for researchers to create and use containers in their daily workflow. Supporting Reproducible Research WEB-BASED INTEGRATED DEVELOPMENT ENVIRONMENTS (IDES) An integrated development environment (IDE) provides features for authoring, compiling, executing, and debugging code, as well as helpful functions like code completion, built-in support for version control, and syntax highlighting.81 These are especially helpful for new programmers who benefit from the visual cues and prompts. IDEs can be either desktop or web-based applications. The scholarly community has taken advantage of both containers and web-based IDEs to create a new type of this application geared for reproducible research. These systems often provide access to a coding environment in browser, such as Jupyter notebooks or RStudio,82 or their own IDE, and allow users to either export their work as a research compendium or allow sharing of these environments to bolster reproducibility. an NSF-funded Data Infrastructure Building Block (DIBBS) initiative to build a scalable, open source, web-based, multi-user platform for reproducible research enabling the creation, publication, and execution of tales—executable research objects that capture data, code, and the complete software environment used to produce research findings.83 The website defines a tale as “an executable research object that combines data (references), code (computational methods), computational environment, and narrative (traditional science story)”—which we know is also called a research compendium (see figure 2.5).84 Figure 2.5 A beta version of the Whole Tale system is available at https://dashboard.wholetale.org. When working in Whole Tale, the users have the option to choose a type of environment from a list of options: RStudio, Jupyter Notebooks, OpenRefine 2.8, Jupyter Notebooks with Spark, and JupyterLab.85 Once within those environments, users can work as if they were on their local computer—importing and installing new libraries, adding data, and even running high-performance computing jobs. Whole Tale will keep track of the version of any software dependencies and relevant environmental variables. Once a tale is complete, it 189 190 Section 2.2.3 can be published to a repository like Dataverse,86 with descriptive metadata and a research compendium that can later be rerun in Whole Tale for reproducibility.87 However, the web-based IDEs for reproducibility require that researchers work within a specific online platform, and that can be untenable for those who need to be able to, for instance, work offline or work across multiple types of environments for collaboration or compliance purposes. Most, if not all, of the proprietary tools that are marketed for computational reproducibility fall in this category of web-based IDEs. WEB-BASED REPLAY SYSTEMS Given that researchers are hard-pressed to change their workflows and tools, web-based replay systems were created. These are applications that take a link to research materials hosted elsewhere, build the computational environment in-browser, and display to the user some method of interacting with the materials, such as an instance of JupyterLab. This offloads the responsibility for hosting materials to platforms devoted to that and allows the researchers to have flexibility in how they work. Web-based replay systems allow any user to interact with reproducible compendia in a sandbox, allowing users to modify input data or parameters, or even code, and re-execute it. They often ask the user to follow some structure for either the input or the directory structure in order to work properly and use container systems in the backend to recreate the research compendia for researchers. There are two large-scale projects that allow for computational replay of research. One of those is Binder (see Figure 2.6), from Project Jupyter.88 Binder uses repo2Docker to reproduce the computational environment of research hosted on Git hosting platforms (e.g. GitLab, GitHub) or repositories (e.g. Zenodo, Dataverse).89 Users can replay materials in RStudio, Jupyter notebooks, JupyterLab, and Julia notebooks from Binder. When navigating to the Binder home page (https://mybinder.org), the user is prompted to enter a URL or DOI that leads to a directory that contains Jupyter notebooks, RMarkdown files, or Julia notebooks. Binder will then look through the directory of files for something that will tell it about the computational dependencies, like a requirements.txt file for a Python project or a Dockerfile. The user will then see the materials in the original computational environment, in the original interface.90 Binder also provides a reusable link to this page with the live materials to others who want to reproduce the work. REANA is another example of a computational replay system, based in high-energy physics (HEP).91 Made by a team at CERN (the European Organization for Nuclear Research), REANA has the goal of helping researchers “structure their input data, analysis code, containerised environments and computational workflows so that the analysis can be instantiated and run on remote compute clouds.”92 REANA relies heavily on the usage of the Common Workflow Language, “an open standard for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments.”93 This, in combination with the multiple container systems available on REANA, allow for computational replay of HEP workflows. This idea and process could, however, be generalized for other domains as well. PACKAGING SYSTEMS The final category of computational reproducibility tools we’ll cover are packaging systems. Packaging systems are desktop or server-based tools that automatically capture dependencies Supporting Reproducible Research Figure 2.6 Binder, a tool for reproducing the computational environment of research hosted on Git hosting platforms, is available at https://mybinder.org. and computational environments at time of executing a computational pipeline. The draw with packaging systems is the flexibility—you don’t have to go into a project thinking about reproducibility to be able to use a packaging tool to create a record of the computational environment. As long as your pipeline runs, the packaging tool will work. One example is ReproZip.94 ReproZip works by running at the same time as a computational pipeline, tracing all the steps and dependencies while the pipeline runs as normal. Then it packages together input files, output files, parameters, environmental variables, executable code, and steps into a portable, generalized format: the RPZ (.rpz), or the ReproZip bundle (see figure 2.7). These bundles are small (size of the bundles really depends on the size of input and output data), portable (can be deposited into a repository or e-mailed!), and self-contained (everything needed to reproduce the pipeline is there!).95 ReproZip bundles can be replayed locally on any operating system (using ReproUnzip) or in-browser (using ReproServer). These tools will take a ReproZip bundle and automatically unpack it, setting up all the dependencies and workflow steps for users so they can reproduce the contents in the original computational environment. ReproUnzip operates on the plug-in model so users can choose which unpacker they can use to reproduce the work, for example Docker or Vagrant. However, this can be expanded to include any container or virtual machine systems in the future, because the extensive metadata ReproZip captures.96 191 192 Section 2.2.3 Figure 2.7 ReproZip ecosystem, created by Fernando Chirigati. Used with permission. ReproZip also has an ecosystem of other open tools: ReproZip-Web (combines ReproZip with web archiving technology to capture complex server-client applications), reprozip-jupyter (a ReproZip plug-in for Jupyter notebooks, see example videos), ReproUnzip (a tool to replay and interact with the computational pipelines archived in ReproZip bundles), and ReproServer (a way to replay ReproZip bundles in-browser).97 Right now, ReproZip, ReproZip-Web, and reprozip-jupyter can pack materials only on Linux (because of the extensive information captured and the fact that the OS needs to be recreated at will from ReproZip bundles), but users can install any of the other tools above on any operating system. However, installing ReproUnzip and another piece of software can be a big ask for some researchers. To that end, ReproServer was created, which allows users to either upload a ReproZip bundle (.rpz) or provide a link to one and then reproduce and interact with the contents of the RPZ file in-browser, drastically reducing the number of steps and complexity. What’s more, ReproServer integrates with repositories, such that users can create links like this—https://server.reprozip.org/osf.io/<5 character OSF link>—to immediately begin reproducing the work or send to reviewers or collaborators for their input. ReproServer also provides a permanent URL to the unpacked environment and the results of rerunning the pipeline in the RPZ file.98 SUMMARY Different reproducibility tools will work for different researchers and workflows. For instance, when processing and analyzing research materials, many people tend to use containers or web-based IDEs because the rapid-prototyping capabilities are useful for the more exploratory and error-prone processing step. One key reason why they are especially useful in the analysis step is because they can also be ported to be compatible with web-based replay systems, which are useful in publishing your work. When the work is done and nearing publication, people tend to prepare, structure, or export their research for web-based replay systems. These are useful because of the near-instant replay of computational research for reviewers of publications or presentations, members Supporting Reproducible Research of promotion committees, or any interested party. This brings a wider accessibility to the reproducible work, which helps for post-publication review. Lastly, packaging tools are the most sustainable for long-term reproducibility, especially when combined with emulation technology. Packaging tools are provenance-aware (e.g., they know the order in which research pipelines run), automatically capture dependencies, automatically write in-depth technical and administrative metadata, and are interoperable (in that they are built to work with a variety of other tools). These traits make them the most reliable for preservation and access purposes. This appendix was meant to guide an understanding of the wider landscape of computational reproducibility tools. These four key classes of tools (containers, web-based IDEs, web-based replay systems, and packaging systems) and the examples discussed here reflect community-based efforts to scaffold the understandability and usability of their research, teaching, and learning. These tools can be used to both make one’s own work reproducible and help a designated community make their work more reproducible and sustainable in the long term. COI: Vicky Rampin contributes to the ReproZip project. 193 194 Section 2.2.3 Appendix C: Examples of Computational Reproducibility This appendix will showcase some examples of how the software described in this chapter has been used to make research reproducible. Further examples can be found in the open book The Practice of Reproducible Research, which comprises case studies and workflows for reproducibility across various disciplines.99 The first example of computationally reproducible research comes from a machine-learning researcher, Logan Ward, using Whole Tale to promote reuse of their materials. Their tale is meant to allow others to reproduce the materials in a “2016 paper …on using machine learning to predict the properties of materials…. The notebooks within this tale recreate the validation tests from the paper and how the models were used to discover new materials.”100 Users who want to reuse this tale will have to either: (a) create an account on WholeTale and copy it to their workspace to interact with it, or (b) download the tale to their local computer and try to get it running with containers. Figure 2.8 Rerunning the Ward tale in my Whole Tale account successfully. The next example is from the biological sciences, where Lewis and colleagues used the eLife journal’s reproducible document stack (RDS) to provide an interactive version of their paper (figure 2.9) to allow others to directly rerun the code with original data that was used for analysis and visualizations.101 The eLife RDS is based on Stencila,102 a tool meant to introduce reproducibility features (such as updating dependencies in real time between code cells as you change data or code) to everyday research tools (like Jupyter notebooks) and Docker to keep the original computing environment. The code and data are then linked and can be downloaded and explored by readers in real time, augmenting their reading experience and allowing for open post-publication peer review. Supporting Reproducible Research Figure 2.9 A screenshot of the interactive paper L. Michelle Lewis et al., “Replication Study: Transcriptional Amplification in Tumor Cells with Elevated c-Myc,” eLife 7 (2018): e30274, https://doi.org/10.7554/eLife.30274 is used under CC BY 4.0. Another example comes from digital humanities, where Nick Wolf made the materials available for his 2015 Heaney Lecture, “National School System and the Irish Language.”103 Look for the published essay “The National-School System and the Irish Language in the Nineteenth Century.”104 Wolf used ReproZip to make a reproducible research compendium of the R scripts that he wrote to analyze and visualize historical education data from Ireland. Wolf was able to package his research with two commands: reprozip trace R Rscript NationalSchools_Wolf_2016.R and reprozip pack national-schools.rpz. The RPZ bundle was then uploaded to the Open Science Framework,105 where it can be either downloaded by secondary users for local interaction or unpacked with ReproServer in-browser for quick reproduction and inspection (see figure 2.10). 195 196 Section 2.2.3 Figure 2.10 A screenshot of unpacking Wolf’s RPZ bundle with ReproServer: https://server.reprozip. org/reproduce/osf.io/wfvqr. The results can be found at https://server.reprozip.org/ results/fyjog. The next example comes from earth science and mammalogy, using Jupyter notebooks with Binder for reproducibility. This notebook, Analyzing Whale Tracks” by Dr. Roberto De Almeida (see figure 2.11), looks at ocean data to track the trajectories of migrating whales. He wanted to see if whales could benefit from the ocean currents when migrating across the world.106 Figure 2.11 A screenshot of the “Analyzing Whale Tracks” Jupyter notebook running in Binder: https://nbviewer.org/github/robertodealmeida/notebooks/blob/master/earth_day_ data_challenge/Analyzing%20whale%20tracks.ipynb. People can interact with the GitHub repository of Jupyter notebooks locally by installing Jupyter notebooks and all the requisite dependencies (e.g., the correct Python version and Supporting Reproducible Research the correct Python library versions). They can also interact with the notebooks in Binder, to allow for simpler reproducibility in-browser. Users can interact with the notebooks with the same flexibility as if it were their local computer, re-executing and editing code, adding their own data, importing and exporting files, and so on. These sandboxes do not persist, but instead offer a great way to instantly replay research during the reading or reviewing process. The final example comes from high-energy physics, where the REANA team created an example reproducible analysis pipeline of ATLAS data (see figure 2.12).107 The workflow that they made reproducible with REANA emulates a “Beyond Standard Model (BSM) search as performed in collider particle physics.”108 This involves reading in observed data, fitting it against a statistical model, and computing the upper limit on the signal strength of the BSM project (the main output). Figure 2.12 The workflow that was made reproducible with REANA. I would not want to manually recreate that! Used under the MIT License. To create their reproducible analysis pipeline, they have to create “runnable recipes” addressing (1) where is the input data, (2) what software was used to analyse the data, (3) which computing environments were used to run the software and (4) which computational workflow steps were taken to run the analysis. This will permit instantiation of the analysis on the computational cloud and run the analysis to obtain (5) output results.109 The authors then put together a reana.yml file that configures the analysis structure with the correct computational pipeline steps, inputs, parameters, dependencies, and code. It can then be deployed to a REANA server, one of which is hosted by CERN for use. This one is hard to reproduce without domain knowledge, or at least serious computational know-how. 197 198 Section 2.2.3 Different reproducibility tools offer different functionality, which appeals to disciplines with varying norms. These examples offer some examples of how a few disciplines have used reproducibility tools to allow others to verify, extend, and interact with their work. By walking the walk, the authors above have provided great examples to follow in terms of making research reproducibly accessible to all. Notes National Academies of Sciences, Engineering, and Medicine, Open Science by Design (Washington, DC: National Academies Press, 2018), 107, https://doi.org/10.17226/25116. 2. Kenneth Bollen et al., Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science (Alexandria, VA: National Science Foundation, May 2015), 3–4, https://www.nsf.gov/sbe/AC_Materials/ SBE_Robust_and_Reliable_Research_Report.pdf. 3. Sabina Leonelli, “Rethinking Reproducibility as a Criterion for Research Quality,” in Including a Symposium on Mary Morgan: Curiosity, Imagination, and Surprise, Research in the History of Economic Thought and Methodology, vol. 36B, ed. Luca Fiorito, Scott Scheall, and Carlos Eduardo Suprinyak (Bingley, UK: Emerald Publishing, 2018), 129–46, https://doi.org/10.1108/S0743-41542018000036B009. 4. Victoria Stodden et al., comps. and eds., “Setting the Default to Reproducible” (developed by participants in ICERM workshop “Reproducibility in Computational and Experimental Mathematics,” Providence, RI, December 14, 2012), 5, https://stodden.net/icerm_report.pdf. 5. Vicky Steeves, “Reproducibility Librarianship,” Collaborative Librarianship 9, no. 2 (2017): article 4, https:// digitalcommons.du.edu/collaborativelibrarianship/vol9/iss2/4. 6. National Institutes of Health Advisory Committee to the Director. (2015). National Library of Medicine Working Group Final Report (NLM-06112015-ACD; p. 17). National Institutes of Health. https://acd. od.nih.gov/documents/reports/Report-NLM-06112015-ACD.pdf. 7. Franklin Sayre et al., Librarians Building Momentum for Reproducibility, virtual conference, January 29, 2020, https://vickysteeves.gitlab.io/librarians-reproducibility/. 8. American Medical Association, AMA Manual of Style, 11th ed. (New York: Oxford University Press, 2020); American Psychological Association, “Journal Article Reporting Standards (JARS),” APA Style, accessed November 24, 2020, https://apastyle.apa.org/jars. 9. Bridget C. O’Brien et al., “Standards for Reporting Qualitative Research: A Synthesis of Recommendations,” Academic Medicine 89, no. 9 (September 2014): 1245–51, https://doi.org/10.1097/ ACM.0000000000000388. 10. O’Brien, Bridget C. PhD; Harris, Ilene B. PhD; Beckman, Thomas J. MD; Reed, Darcy A. MD, MPH; Cook, David A. MD, MHPE. Standards for Reporting Qualitative Research: A Synthesis of Recommendations. Academic Medicine 89(9):p 1245-1251, September 2014. | DOI: 10.1097/ACM.0000000000000388 11. David Moher et al., “Preferred Reporting Items for Systematic Reviews and Meta-analyses: The PRISMA Statement,” PLOS Medicine 6, no. 7 (2009): e1000097, https://doi.org/10.1371/journal.pmed.1000097. 12. Melissa L. Rethlefsen et al., “PRISMA-S: An Extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews,” Systematic Reviews 10, no. 1 (January 26, 2021): article 39, https://doi. org/10.1186/s13643-020-01542-z. 13. Raoni Lourenço, Juliana Freire, and Dennis Shasha, “BugDoc: Algorithms to Debug Computational Processes,” in SIGMOD ’20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (New York: Association for Computing Machinery, 2020), 463-478, https://doi. org/10.1145/3318464.3389763. 14. OpenRefine, home page, https://openrefine.org/. 15. Wikipedia, s.v. “JSON,” last updated March 22, 2023, https://en.wikipedia.org/wiki/JSON. 16. Glenda M. Yenni et al., “Developing a Modern Data Workflow for Evolving Data,” preprint, BioRxiv, July 24, 2018, https://doi.org/10.1101/344804. 17. Yenni et al., “Developing a Modern Data Workflow,” 1 18. Yenni et al., “Developing a Modern Data Workflow.” 19. Research Compendium, home page, https://research-compendium.science/; Daniel Nüst, Carl Boettiger, and Ben Marwick, “How to Read a Research Compendium,” ArXiv:1806.09525 [Cs], June 11, 2018, https:// doi.org/10.48550/arXiv.1806.09525. 20. Nüst, Boettiger, and Marwick, “How to Read a Research Compendium.” 1. Supporting Reproducible Research 21. Victoria Stodden, “Resolving Irreproducibility in Empirical and Computational Research,” Institute of Mathematical Statistics, November 17, 2013, https://imstat.org/2013/11/17/ resolving-irreproducibility-in-empirical-and-computational-research/. 22. Ed H. B. M. Gronenschild et al., “The Effects of FreeSurfer Version, Workstation Type, and Macintosh Operating System Version on Anatomical Volume and Cortical Thickness Measurements,” PLOS ONE 7, no. 6 (2012): e38234, https://doi.org/10.1371/journal.pone.0038234. 23. Geir Kjetil Sandve et al., “Ten Simple Rules for Reproducible Computational Research,” PLOS Computational Biology 9, no. 10 (October 24, 2013): e1003285, https://doi.org/10.1371/journal.pcbi.1003285. 24. Katherine Bode, “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for DataRich Literary History,” Modern Language Quarterly 78, no. 1 (March 1, 2017): 77–106, https://doi. org/10.1215/00267929-3699787; Franco Moretti, Graphs, Maps, Trees (London: Verso, 2005). 25. Archive-It, home page, https://archive-it.org/; Webrecorder, home page, https://webrecorder.net/. 26. Saving Data Journalism, home page, https://savingjournalism.reprozip.org/. 27. Katherine Boss et al., “Saving Data Journalism: Using ReproZip-Web to Capture Dynamic Websites for Future Reuse” (presentation, Librarians Building Momentum for Reproducibility, virtual conference, January 28, 2020), slides: https://osf.io/nr9d8/, YouTube video, 13:16, https://youtu.be/xLdFaDL2VWc. 28. Taguette, home page, https://www.taguette.org/; Qcoder, GitHub, https://github.com/ropenscilabs/qcoder; Beth M. Duckles and Vicky Steeves, “Qualitative Research Using Open Tools” (presentation, CSV,conf,v4, Portland, Oregon, May 2019), slides: https://doi.org/10.5281/zenodo.2673016, YouTube video, 17:51: https://youtu.be/DwCunW19wcQ. 29. Mateusz Pawlik et al., “A Link Is Not Enough—Reproducibility of Data,” Datenbank-Spektrum 19, no. 2 (2019): 107–15, https://doi.org/10.1007/s13222-019-00317-8. 30. Michel Castagné, “Consider the Source: The Value of Source Code to Digital Preservation Strategies,” School of Information Student Research Journal (San José State University) 2, no. 2 (January 2013): article 5, p. 2, https://doi.org/10.31979/2575-2499.020205. 31. Digital Preservation Coalition, Insert Coin to Continue: A Briefing Day on Software Preservation (London, May 7, 2019), https://www.dpconline.org/events/past-events/software-preservation. 32. Neil Chue Hong et al., Sustainability and Preservation Framework (Edinburgh, UK: Software Sustainability Institute, December 7, 2010), https://www.software.ac.uk/sustainability-and-preservation-framework. 33. Hong et al., Sustainability and Preservation Framework. 34. J. Kunze et al., “The BagIt File Packaging Format (V1.0),” RFC Editor, October 2018, https://tools.ietf.org/ html/rfc8493. 35. Kunze et al., “BagIt File Packaging Format.” 36. Kyle Chard et al., “Application of BagIt-Serialized Research Object Bundles for Packaging and Re-execution of Computational Analyses,” in 2019 15th International Conference on 3Science (San Diego: IEEE, 2019), 514–21, https://doi.org/10.1109/eScience.2019.00068. 37. Educopia Institute, “Scaling Emulation as a Service Infrastructure (EaaSI) (subcontract),” 2017–2020, https://educopia.org/emulation-as-a-service-eaasi/. 38. Klaus Rechert et al., “bwFLA—A Functional Approach to Digital Preservation,” PIK— Praxis der Informationsverarbeitung und Kommunikation 35, no. 4 (November 2012): 259–67, https://doi.org/10.1515/ pik-2012-0044. 39. Julia Kim and Don Mennerich, “Jeremy Blake’s Time-Based Paintings: A Case Study,” Electronic Media Review 4 (2015–2016), https://resources.culturalheritage.org/emg-review/volume-4-2015-2016/kim/. 40. Chassanoff, A., & Altman, M. (2020). Curation as “Interoperability With the Future”: Preserving Scholarly Research Software in Academic Libraries. Journal of the Association for Information Science and Technology, 71(3), 325–337. https://doi.org/10.1002/asi.24244 41. Brian A. Nosek, Jeffrey R. Spies, and Matt Motyl, “Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth over Publishability,” Perspectives on Psychological Science 7, no. 6 (November 2012): 615–31, https://doi.org/10.1177/1745691612459058; Jere D. Odell, Heather L. Coates, and Kristi L. Palmer, “Rewarding Open Access Scholarship in Promotion and Tenure: Driving Institutional Change,” College and Research Libraries News 77, no. 7 (2016): 322–25, https://doi.org/10.7912/C2R60B. 42. For an overview of some issues in the hard sciences, see National Academies of Sciences, Engineering, and Medicine, Reproducibility and Replicability in Science (Washington, DC: National Academies Press, 2019), https://doi.org/10.17226/25303. 43. Open Science Collaboration, “Estimating the Reproducibility of Psychological Science,” Science 349, no. 6251 (August 28, 2015): aac4716, https://doi.org/10.1126/science.aac4716; Makel and Plucker, Toward a More Perfect Psychology. 44. Garret Christensen, Jeremy Freese, and Edward Miguel, Transparent and Reproducible Social Science Research (Oakland: University of California Press, 2019), https://www.ucpress.edu/book/9780520296954/ transparent-and-reproducible-social-science-research. 199 200 Section 2.2.3 45. Dorothy Bishop et al., Reproducibility and Reliability of Biomedical Research, symposium report (London: Academy of Medical Sciences, October 2015), https://acmedsci.ac.uk/policy/policy-projects/ reproducibility-and-reliability-of-biomedical-research. 46. National Science Foundation, “Scientists Seeking NSF Funding Will Soon Be Required to Submit Data Management Plans,” news release 10-077, May 10, 2010, https://www.nsf.gov/news/news_summ. jsp?cntn_id=116928. 47. Steeves, “Reproducibility Librarianship”; Cynthia R. H. Vitale, “Is Research Reproducibility the New Data Management for Libraries?” Bulletin of the Association for Information Science and Technology 42, no. 3 (2016): 38–41, https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/bul2.2016.1720420313. 48. Melissa L. Rethlefsen, Mellanye J. Lackey, and Shirley Zhao, “Building Capacity to Encourage Research Reproducibility and #MakeResearchTrue,” Journal of the Medical Library Association 106, no. 1 (January 12, 2018): 113–19, https://doi.org/10.5195/jmla.2018.273. 49. Franklin Sayre and Amy Riegelman, “Replicable Services for Reproducible Research: A Model for Academic Libraries,” College and Research Libraries 80, no. 2 (March 2019): 260. 50. Open Syllabus Explorer, home page, https://opensyllabus.org/. See also Bruce Herbert, Sarah Potvin, and Tina Budzise-Weaver, “Best Practices for the Use of Scholarly Impact Metrics,” working paper, February 10, 2016, https://oaktrust.library.tamu.edu/handle/1969.1/156054. 51. Steven Braun, “Supporting Research Impact Metrics in Academic Libraries: A Case Study,” portal: Libraries and the Academy 17, no. 1 (January 2017): 111–27, https://doi.org/10.1353/pla.2017.0007; Rebecca B. French and Jody Condit Fagan, “The Visibility of Authority Records, Researcher Identifiers, Academic Social Networking Profiles, and Related Faculty Publications in Search Engine Results,” Journal of Web Librarianship 13, no. 2 (2019): 156–97, https://doi.org/10.1080/19322909.2019.1591324; Björn Brembs, Katherine Button, and Marcus Munafò, “Deep Impact: Unintended Consequences of Journal Rank,” Frontiers in Human Neuroscience 7 (2013), https://doi.org/10.3389/fnhum.2013.00291; Amy M. Suiter and Heather Lea Moulaison, “Supporting Scholars: An Analysis of Academic Library Websites’ Documentation on Metrics and Impact,” Journal of Academic Librarianship 41, no. 6 (November 2015): 814–20, https://doi. org/10.1016/j.acalib.2015.09.004. 52. Sayre and Riegelman, “Replicable Services.” 53. For an interesting theoretical examination of repetition and translation in literary theory and their relationship to the concept of reproducibility, see Ladina Bezzola Lambert, “Repetition with a Difference: Reproducibility in Literature Studies,” in Reproducibility: Principles, Problems, Practices, and Prospects, ed. Harald Atmanspacher and Sabine Maasen (Hoboken, NJ: John Wiley & Sons, 2016), 491–509, https://doi. org/10.1002/9781118865064.ch23. 54. Herman Aguinis and Angelo M. Solarino, “Transparency and Replicability in Qualitative Research: The Case of Interviews with Elite Informants,” Strategic Management Journal 40, no. 8 (August 2019): 1291–1315, https://doi.org/10.1002/smj.3015; Leonelli, “Rethinking Reproducibility”; Rik Peels and Lex Bouter, “Humanities Need a Replication Drive Too,” Nature 558, no. 7710 (June 21, 2018): 372–372, https:// doi.org/10.1038/d41586-018-05454-w; Bart Penders, J. Britt Holbrook, and Sarah de Rijcke, “Rinse and Repeat: Understanding the Value of Replication across Different Ways of Knowing,” Publications 7, no. 3 (September 2019): 52, https://doi.org/10.3390/publications7030052; Michael G. Pratt, Sarah Kaplan, and Richard Whittington, “The Tumult over Transparency: Decoupling Transparency from Replication in Establishing Trustworthy Qualitative Research,” Administrative Science Quarterly 65, no. 1 (March 1, 2020): 1–19, https://doi.org/10.1177/0001839219887663; Sarah de Rijcke and Bart Penders, “Resist Calls for Replicability in the Humanities,” Nature 560, no. 7716 (August 1, 2018): 29, https://doi.org/10.1038/ d41586-018-05845-z. 55. Christine L. Borgman, Big Data, Little Data, No Data (Cambridge, MA: MIT Press, 2015), https://mitpress. mit.edu/9780262529914/big-data-little-data-no-data/. 56. Michael S. Lewis-Beck, et al, The SAGE Encyclopedia of Social Science Research Methods (2004) https:// doi.org/10.4135/9781412950589 57. See citations in Tamarinde L. Haven et al., “Preregistering Qualitative Research: A Delphi Study,” SocArXiv, last edited November 9, 2020, https://doi.org/10.31235/osf.io/pz9jr. 58. David Thomas Mellor et al., “Templates of OSF Registration Forms,” OSF, October 31, 2016, https://osf.io/ zab38/. The qualitative preregistration form is based on the work of Haven et al., “Preregistering Qualitative Research.” 59. Sebastian Karcher and Nicholas Weber, “Annotation for Transparent Inquiry: Transparent Data and Analysis for Qualitative Research,” IASSIST Quarterly 43, no. 2 (2019): 1–9, https://doi.org/10.29173/iq959; Aguinis and Solarino, “Transparency and Replicability.” 60. Bode, “Equivalence of ‘Close’ and ‘Distant’ Reading”; James O’Sullivan, “The Humanities Have a ‘Reproducibility’ Problem,” Talking Humanities (blog), July 9, 2019, https://talkinghumanities.blogs.sas. ac.uk/2019/07/09/the-humanities-have-a-reproducibility-problem/. Supporting Reproducible Research 61. Paul Fehrmann, “Reproducibility of Computer Searches in Systematic Reviews: Checklist Items Used to Assess Computer Search Reports” (presentation, Librarians Building Momentum for Reproducibility, online conference, January 28, 2023), YouTube video, 6:01, posted February 3, 2020, https://www.youtube. com/watch?v=HqYv7IhQ4GU. 62. Heidi Tebbe and Danica Madison Lewis, “What’s Sauce for the Goose Is Sauce for the Gander: Reproducible Practice in Library Work,” OSF, February 3, 2020, https://osf.io/3xuj2/. 63. Bollen et al., Social, Behavioral, and Economic Sciences Perspectives, 3. 64. Lorena A. Barba, “Terminologies for Reproducible Research,” ArXiv:1802.03311 [Cs], February 9, 2018, http://arxiv.org/abs/1802.03311; Steven N. Goodman, Danielle Fanelli, and John P. A. Ioannidis, “What Does Research Reproducibility Mean?” Science Translational Medicine 8, no. 341 (June 1, 2016): 341ps12, https://doi.org/10.1126/scitranslmed.aaf5027; Stodden, “Resolving Irreproducibility.” 65. Stodden, “Resolving Irreproducibility.” 66. Goodman, Fanelli, and Ioannidis, “What Does Research Reproducibility Mean?” 67. Bollen, et al., Social, Behavioral, and Economic Sciences Perspectives, 4. 68. E. Miguel et al., “Promoting Transparency in Social Science Research,” Science 343, no. 6166 (2014): 30–31, https://doi.org/10.1126/science.1245317. 69. Association for Computing Machinery, “Artifact Review and Badging—Version 1.0 (Not Current),” last updated August 24, 2020, https://www.acm.org/publications/policies/artifact-review-badging. 70. National Institutes of Health, “Research Misconduct—Definitions,” November 29, 2018, https://grants.nih. gov/policy/research_integrity/definitions.htm. 71. Ulrich Schimmack, “Questionable Research Practices: Definition, Detection, and Recommendations for Better Practices,” Replicability-Index (blog), January 24, 2015, https://replicationindex.com/2015/01/24/ qrps/. 72. Megan L. Head et al., “The Extent and Consequences of P-Hacking in Science,” PLOS Biology 13, no. 3 (March 13, 2015), https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106. 73. Norbert L. Kerr, “HARKing: Hypothesizing after the Results Are Known,” Personality and Social Psychology Review, 2, no. 3 (1998): 196-217, https://doi.org/10.1207/s15327957pspr0203_4. 74. Center for Open Science, “Preregistration,” https://www.cos.io/initiatives/prereg; Brian A. Nosek et al. “The Preregistration Revolution.” Proceedings of the National Academy of Sciences 115, no. 11 (March 13, 2018): 2600–2606, https://doi.org/10.1073/pnas.1708274114; Association for Psychological Science, “Registered Replication Reports,” https://www.psychologicalscience.org/publications/replication. 75. Center for Open Science, “Registered Reports,” https://www.cos.io/initiatives/registered-reports. 76. Nüst, D., Sochat, V., Marwick, B., Eglen, S. J., Head, T., Hirst, T., & Evans, B. D. (2020). Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology, 16(11), e1008316. https:// doi.org/10.1371/journal.pcbi.1008316 77. Scott Hogg, “Software Containers: Used More Frequently Than Most Realize,” Network World, May 26, 2014, https://www.networkworld.com/article/2226996/software-containers--used-more-frequently-thanmost-realize.html. 78. Sylabs.io. “Singularity,” accessed August 13, 2020, https://sylabs.io/; Docker, home page, accessed August 13, 2020, https://www.docker.com/. 79. Docker, home page. 80. Sylabs.io, “Singularity.” 81. Margaret Rouse, “What Does Integrated Development Environment Mean?” Techopedia, January 11, 2017, http://www.techopedia.com/definition/26860/integrated-development-environment-ide. 82. Jupyter, home page, https://jupyter.org/; Posit, “RStudio IDE,” https://posit.co/downloads/. 83. Whole Tale, home page, 2019, https://wholetale.org/. 84. Whole Tale, home page. 85. Apache Spark, home page, https://spark.apache.org/; JupyterLab Doumentation, https://jupyterlab.readthedocs.io/en/stable/. 86. Dataverse, home page, https://dataverse.org/. 87. Chard et al., “Application of BagIt.” 88. Binder, home page, https://mybinder.org/; Jupyter home page, https://jupyter.org/. 89. repo2Docker, GitHub, https://github.com/jupyterhub/repo2docker; Project Jupyter et al., “Binder 2.0— Reproducible, Interactive, Sharable Environments for Science at Scale,” Proceedings of the 17th Python in Science Conference, July 15, 2018, 113–20, https://doi.org/10.25080/Majora-4af1f417-011. 90. Project Jupyter et al., “Binder 2.0.” 91. REANA, home page, accessed August 13, 2020, https://reanahub.io/. 92. REANA, home page. 93. Peter Amstutz et al., “Common Workflow Language, v1.0,” Figshare, July 8, 2016, https://doi.org/10.6084/ M9.FIGSHARE.3115156.V2. 94. ReproZip, home page, https://www.reprozip.org/. 201 202 Section 2.2.3 95. Fernando Chirigati et al., “ReproZip: Computational Reproducibility with Ease,” in Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16 (New York: Association for Computing Machinery, 2016), 2085–88, https://doi.org/10.1145/2882903.2899401. 96. Chirigati et al., “ReproZip.” 97. ReproZip Web, “ReproZip Web’s Documentation, https://reprozip-web.readthedocs.io/en/latest/; Reprozip-Jupyter 1.2, home page, https://pypi.org/project/reprozip-jupyter/; VIDA-NYU, “ReproZip Jupyter Extension,” example videos, YouTube video, 0:51, https://www.youtube.com/watch?v=Y8YmGVYHhS8&list=PLjgZ3v4gFxpWb277AEyjsVerB6nViGTVL; ReproZip, “Using Reprounzip,” https://docs.reprozip.org/en/1.0.x/unpacking.html; ReproServer, home page, https://server.reprozip.org/. 98. Vicky Steeves, Rémi Rampin, and Fernando Chirigati, “Reproducibility, Preservation, and Access to Research with ReproZip and ReproServer,” IASSIST Quarterly 44, no. 1–2 (June 29, 2020): 1–11, https:// doi.org/10.29173/iq969. 99. Justin Kitzes, Daniel Turek, and Fatima Deniz, eds., The Practice of Reproducible Research(Oakland, CA: University of California Press, 2018). 100. Logan Ward, “Predicting the Properties of Inorganic Materials with Machine Learning,” Whole Tale dashboard, https://dashboard.wholetale.org/run/59fc7f9d60221d000163c37b?token=aJgFSGxFQUSUgPV6vkerjk7pfowAAOFyhW0NKiE3rx8k9LSbxzugxp8U3XRrB1ev. 101. L. Michelle Lewis et al., “Replication Study: Transcriptional Amplification in Tumor Cells with Elevated c-Myc,” eLife 7 (2018): e30274, https://doi.org/10.7554/eLife.30274. 102. Stencila, home page, https://stenci.la/. 103. Nicholas Wolf, “National School System and the Irish Language Heaney Lecture 2015,” April 12, 2016, https://doi.org/10.17605/OSF.IO/PGK8V. 104. Nicholas Wolf, “The National-School System and the Irish Language in the Nineteenth Century,” in Schools and Schooling, 1650–2000, ed. James Kelly and Susan Hegarty (Dublin: Four Courts Press, 2017), 208, https://www.fourcourtspress.ie/books/2017/schools-and-schooling/. 105. Nicholas Wolf, “National School System and the Irish Language Heaney Lecture 2015,” OSF, https://osf. io/wfvqr. 106. Rob De Almeida, “Analyzing Whale Tracks,” EarthPy, September 23, 2013, http://earthpy.org/analyzingwhale-tracks.html. 107. Diego Rodriguez et al., Reanahub/Reana-Demo-Bsm-Search, Python (2018; repr., REANA, 2020), https://github.com/reanahub/reana-demo-bsm-search. 108. https://github.com/reanahub/reana-demo-bsm-search 109. Rodriguez et al., Reanahub/Reana-Demo-Bsm-Search. Bibliography Aguinis, Herman, and Angelo M. Solarino. “Transparency and Replicability in Qualitative Research: The Case of Interviews with Elite Informants.” Strategic Management Journal 40, no. 8 (August 2019): 1291–1315. https://doi.org/10.1002/smj.3015. Almugbel, Reem, Ling-Hong Hung, Jiaming Hu, Abeer Almutairy, Nicole Ortogero, Yashaswi Tamta, and Ka Yee Yeung. “Reproducible Bioconductor Workflows Using Browser-Based Interactive Notebooks and Containers.” Journal of the American Medical Informatics Association 25, no. 1 (January 2018): 4–12. https://doi.org/10.1093/jamia/ocx120. American Medical Association. AMA Manual of Style: A Guide for Authors and Editors, 11th ed. New York: Oxford University Press, 2020. American Psychological Association. “Journal Article Reporting Standards (JARS).” APA Style. Accessed November 24, 2020. https://apastyle.apa.org/jars. Amstutz, Peter, Michael R. Crusoe, Nebojša Tijanić, Brad Chapman, John Chilton, Michael Heuer, Andrey Kartashov, et al. “Common Workflow Language, v1.0.” Figshare, July 8, 2016. https://doi.org/10.6084/ M9.FIGSHARE.3115156.V2. Apache Spark. Home page. https://spark.apache.org/. Archive-It. Home page. https://archive-it.org/. Association for Computing Machinery. “Artifact Review and Badging—Version 1.0 (Not Current).” Last updated August 24, 2020. https://www.acm.org/publications/policies/artifact-review-badging. Association for Psychological Science. “Registered Replication Reports.” https://www.psychologicalscience.org/ publications/replication. Barba, Lorena A. “Terminologies for Reproducible Research.” ArXiv:1802.03311 [Cs], February 9, 2018. http:// arxiv.org/abs/1802.03311. Binder. Home page. https://mybinder.org/. Supporting Reproducible Research Bishop, Dorothy, Doreen Cantrell, Peter Johnson, Shitij Kapur, Malcom Macleod, Caroline Savage, Jim Smith, et al. Reproducibility and Reliability of Biomedical Research: Improving Research Practice. Symposium report. London: Academy of Medical Sciences, October 2015. https://acmedsci.ac.uk/policy/ policy-projects/reproducibility-and-reliability-of-biomedical-research. Bode, Katherine. “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for DataRich Literary History.” Modern Language Quarterly 78, no. 1 (March 1, 2017): 77–106. https://doi. org/10.1215/00267929-3699787. Boettiger, Carl. “An Introduction to Docker for Reproducible Research.” ACM SIGOPS Operating Systems Review 49, no. 1 (January 2015): 71–79. https://doi.org/10.1145/2723872.2723882. Bollen, Kenneth, John T. Cacioppo, Robert M. Kaplan, Jon A. Krosnick, and James L. Olds. Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science: Report of the Subcommittee on Replicability in Science, Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences. Alexandria, VA: National Science Foundation, May 2015. https://www.nsf.gov/sbe/ AC_Materials/SBE_Robust_and_Reliable_Research_Report.pdf. Borgman, Christine L. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, MA: MIT Press, 2015. https://mitpress.mit.edu/9780262529914/big-data-little-data-no-data/. Boss, Katherine, Vicky Steeves, Fernando Chirigati, Rémi Rampin, and Brian Hoffman. “Saving Data Journalism: Using ReproZip-Web to Capture Dynamic Websites for Future Reuse.” Presentation, Librarians Building Momentum for Reproducibility, virtual conference, January 28, 2020. Slides: https://osf.io/nr9d8/. YouTube video, 13:16, https://youtu.be/xLdFaDL2VWc. Braun, Steven. “Supporting Research Impact Metrics in Academic Libraries: A Case Study.” portal: Libraries and the Academy 17, no. 1 (January 2017): 111–27. https://doi.org/10.1353/pla.2017.0007. Brembs, Björn, Katherine Button, and Marcus Munafò. “Deep Impact: Unintended Consequences of Journal Rank.” Frontiers in Human Neuroscience 7 (2013). https://doi.org/10.3389/fnhum.2013.00291. Campbell-Jensen, Allison. “Award-Winning Changemaker.” Continuum (blog), University of Minnesota Libraries, September 30, 2020. https://www.continuum.umn.edu/2020/09/award-winning-changemaker/. Castagné, Michel. “Consider the Source: The Value of Source Code to Digital Preservation Strategies.” School of Information Student Research Journal (San José State University) 2, no. 2 (January 2013): article 5. https:// doi.org/10.31979/2575-2499.020205. Center for Open Science. “Preregistration.” https://www.cos.io/initiatives/prereg. ———. “Registered Reports.” https://www.cos.io/initiatives/registered-reports. Chard, Kyle, Niall Gaffney, Matthew B. Jones, Kacper Kowalik, Bertram Ludascher, Timothy McPhillips, Jarek Nabrzyski, et al. “Application of BagIt-Serialized Research Object Bundles for Packaging and Re-execution of Computational Analyses.” In 2019 15th International Conference on EScience, 514–21. San Diego: IEEE, 2019. https://doi.org/10.1109/eScience.2019.00068. Chassanoff, Alexandra, and Micah Altman. “Curation as ‘Interoperability with the Future’: Preserving Scholarly Research Software in Academic Libraries.” Journal of the Association for Information Science and Technology 71, no. 3 (2020): 325–37. https://doi.org/10.1002/asi.24244. Chirigati, Fernando, Rémi Rampin, Dennis Shasha, and Juliana Freire. “ReproZip: Computational Reproducibility with Ease.” In Proceedings of the 2016 International Conference on Management of Data, 2085–2088. SIGMOD ’16. New York: Association for Computing Machinery, 2016. https://doi. org/10.1145/2882903.2899401. Christensen, Garret, Jeremy Freese, and Edward Miguel. Transparent and Reproducible Social Science Research: How to Do Open Science. Oakland: University of California Press, 2019. https://www.ucpress.edu/ book/9780520296954/transparent-and-reproducible-social-science-research. Cochrane, Euan, Rechert, Klaus, Anderson, Seth, Meyerson, Jessica, and Ethan Gates. (2019) ‘Towards a Universal Virtual Interactor (UVI) for Digital Objects.’ Proceedings of the 16th International Conference on Digital Preservation iPRES 2019. Available at https://ipres2019.org/static/pdf/iPres2019_paper_128.pdf. DOI:10.17605/OSF.IO/AZEWJ. Dataverse Project. Home page. https://dataverse.org/. De Almeida, Rob. “Analyzing Whale Tracks.” EarthPy, September 23, 2013. http://earthpy.org/analyzing-whaletracks.html. de Rijcke, Sarah, and Bart Penders. “Resist Calls for Replicability in the Humanities.” Nature 560, no. 7716 (August 1, 2018): 29. https://doi.org/10.1038/d41586-018-05845-z. Digital Preservation Coalition. Insert Coin to Continue: A Briefing Day on Software Preservation, London, May 7, 2019. https://www.dpconline.org/events/past-events/software-preservation. Docker. Home page. Accessed August 13, 2020. https://www.docker.com/. Donoho, David L. “An Invitation to Reproducible Computational Research.” Biostatistics 11, no. 3 (July 2010): 385–88. https://doi.org/10.1093/biostatistics/kxq028. 203 204 Section 2.2.3 Duckles, Beth M., and Vicky Steeves. “Qualitative Research Using Open Tools.” Presentation, CSV,conf,v4, [Portland, Oregon], May 2019. Slides: https://doi.org/10.5281/zenodo.2673016. YouTube video, 17:51: https://youtu.be/DwCunW19wcQ. Educopia Institute. “Scaling Emulation as a Service Infrastructure (EaaSI) (subcontract). 2017–2020. https:// educopia.org/emulation-as-a-service-eaasi/. Evans, Julia. How Containers Work. Wizard Zines. 2021. https://wizardzines.com/zines/containers/. Fehrmann, Paul. “Reproducibility of Computer Searches in Systematic Reviews: Checklist Items Used to Assess Computer Search Reports.” Presentation, Librarians Building Momentum for Reproducibility, online conference, January 28, 2023. YouTube video, 6:01, posted February 3, 2020. https://www.youtube.com/ watch?v=HqYv7IhQ4GU. French, Rebecca B., and Jody Condit Fagan. “The Visibility of Authority Records, Researcher Identifiers, Academic Social Networking Profiles, and Related Faculty Publications in Search Engine Results.” Journal of Web Librarianship 13, no. 2 (2019): 156–97. https://doi.org/10.1080/19322909.2019.1591324. Goodman, Steven N., Danielle Fanelli, and John P. A. Ioannidis. “What Does Research Reproducibility Mean?” Science Translational Medicine 8, no. 341 (June 1, 2016): 341ps12. https://doi.org/10.1126/scitranslmed. aaf5027. Gronenschild, Ed H. B. M., Petra Habets, Heidi I. L. Jacobs, Ron Mengelers, Nico Rozendaal, Jim van Os, and Machteld Marcelis. “The Effects of FreeSurfer Version, Workstation Type, and Macintosh Operating System Version on Anatomical Volume and Cortical Thickness Measurements.” PLOS ONE 7, no. 6 (2012)): e38234. https://doi.org/10.1371/journal.pone.0038234. Haven, Tamarinde L., Timothy M. Errington, Kristian Gleditsch, Leonie van Grootel, Alan M. Jacobs, Florian Kern, Rafael Piñeiro, et al. “Preregistering Qualitative Research: A Delphi Study.” SocArXiv. Last edited November 9, 2020. https://doi.org/10.31235/osf.io/pz9jr. Head, Megan L., Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions. “The Extent and Consequences of P-Hacking in Science.” PLOS Biology 13, no. 3 (March 13, 2015). https://journals.plos.org/ plosbiology/article?id=10.1371/journal.pbio.1002106. Herbert, Bruce, Sarah Potvin, and Tina Budzise-Weaver. “Best Practices for the Use of Scholarly Impact Metrics.” Working paper, February 10, 2016. https://oaktrust.library.tamu.edu/handle/1969.1/156054. Hettne, Kristina, Ricarda Proppert, Linda Nab, L. Paloma Rojas-Saunero, and Daniela Gawehns. “ReprohackNL 2019: How Libraries Can Promote Research Reproducibility through Community Engagement.” IASSIST Quarterly 44, no. 1–2 (2020): 1–10. https://doi.org/10.29173/iq977. Hogg, Scott. “Software Containers: Used More Frequently Than Most Realize.” Network World, May 26, 2014. https://www.networkworld.com/article/2226996/software-containers--used-more-frequently-than-mostrealize.html. Hong, Neil Chue, Steve Crouch, Simon Hettrick, Tim Parkinson, and Matt Shreeve. Sustainability and Preservation Framework. Edinburgh, UK: Software Sustainability Institute, December 7, 2010. https://www. software.ac.uk/sustainability-and-preservation-framework. Jupyter. Home page. https://jupyter.org/. JupyterLab Documentation. https://jupyterlab.readthedocs.io/en/stable/. Jupyter Project, Matthias Bussonnier, Jessica Forde, Jeremy Freeman, Brian Granger, Tim Head, Chris Holdgraf, et al. “Binder 2.0—Reproducible, Interactive, Sharable Environments for Science at Scale.” Proceedings of the 17th Python in Science Conference, July 15, 2018, 113–20. https://doi.org/10.25080/ Majora-4af1f417-011. Karcher, Sebastian, and Nicholas Weber. “Annotation for Transparent Inquiry: Transparent Data and Analysis for Qualitative Research.” IASSIST Quarterly 43, no. 2 (2019): 1–9. https://doi.org/10.29173/iq959. Kerr, Norbert L. “HARKing: Hypothesizing after the Results Are Known.” Personality and Social Psychology Review 2, no. 3 (1998): 196–217. https://doi.org/10.1207/s15327957pspr0203_4. Kim, Julia, and Don Mennerich. “Jeremy Blake’s Time-Based Paintings: A Case Study.” Electronic Media Review 4 (2015–2016). https://resources.culturalheritage.org/emg-review/volume-4-2015-2016/kim/. Kitzes, Justin, Daniel Turek, and Fatima Deniz, eds. The Basic Reproducible Workflow Template: The Practice of Reproducible Research. Oakland: University of California Press, 2018. http://www.practicereproducibleresearch.org/ Koffel, Jonathan B., and Melissa L. Rethlefsen. “Reproducibility of Search Strategies Is Poor in Systematic Reviews Published in High-Impact Pediatrics, Cardiology and Surgery Journals: A Cross-sectional Study.” Edited by Brett D Thombs. PLOS ONE 11, no. 9 (September 26, 2016): e0163309. https://doi.org/10.1371/ journal.pone.0163309. Kunze, J., J. Littman, E. Madden, J. Scancella, and C. Adams. “The BagIt File Packaging Format (V1.0),” RFC Editor, October 2018. https://tools.ietf.org/html/rfc8493. Lambert, Ladina Bezzola. “Repetition with a Difference: Reproducibility in Literature Studies.” In Reproducibility: Principles, Problems, Practices, and Prospects, edited by Harald Atmanspacher and Sabine Maasen, 491–509. Hoboken, NJ: John Wiley & Sons, 2016. https://doi.org/10.1002/9781118865064.ch23. Supporting Reproducible Research Leonelli, Sabina. “Rethinking Reproducibility as a Criterion for Research Quality.” In Including a Symposium on Mary Morgan: Curiosity, Imagination, and Surprise, Research in the History of Economic Thought and Methodology, vol. 36B, edited by Luca Fiorito, Scott Scheall, and Carlos Eduardo Suprinyak, 129–46. Bingley, UK: Emerald Publishing, 2018. https://doi.org/10.1108/S0743-41542018000036B009. Lewis, L. Michelle, Meredith C. Edwards, Zachary R. Meyers, C. Conover Talbot Jr., Haiping Hao, David Blum, and Reproducibility Project: Cancer Biology. “Replication Study: Transcriptional Amplification in Tumor Cells with Elevated c-Myc.” eLife 7 (2018): e30274. https://doi.org/10.7554/eLife.30274. Michael S. Lewis-Beck, et al, The SAGE Encyclopedia of Social Science Research Methods (2004) https://doi. org/10.4135/9781412950589 Lourenço, Raoni, Juliana Freire, and Dennis Shasha. “BugDoc: Algorithms to Debug Computational Processes.” In SIGMOD ’20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 463–78. New York: Association for Computing Machinery, 2020. https://doi.org/10.1145/3318464.3389763. Makel, Matthew C., and Jonathan A. Plucker. Toward a More Perfect Psychology: Improving Trust, Accuracy, and Transparency in Research. Washington, DC: American Psychological Association, 2017. https://doi. org/10.1037/0000033-000. Mellor, David Thomas, Alexander C. DeHaven, Nicole Pfeiffer, Olivia Lowery, and Mark Call. “Templates of OSF Registration Forms.” OSF, October 31, 2016. https://osf.io/zab38/. Miguel, E., C. Camerer, K. Casey, J. Cohen, K. M. Esterling, A. Gerber, R. Glennerster, et al. “Promoting Transparency in Social Science Research.” Science 343, no. 6166 (2014): 30–31. https://doi.org/10.1126/ science.1245317. Moher, David, Alessandro Liberati, Jennifer Tetzlaff, Douglas G. Altman, and PRISMA Group. “Preferred Reporting Items for Systematic Reviews and Meta-analyses: The PRISMA Statement.” PLOS Medicine 6, no. 7 (2009): e1000097. https://doi.org/10.1371/journal.pmed.1000097. Moretti, Franco. Graphs, Maps, Trees: Abstract Models for a Literary Theory. London: Verso, 2005. National Academies of Sciences, Engineering, and Medicine. Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: National Academies Press, 2018. https://doi.org/10.17226/25116. National Academies of Sciences, Engineering, and Medicine. Reproducibility and Replicability in Science. Washington, DC: National Academies Press, 2019. https://doi.org/10.17226/25303. National Institutes of Health. “Research Misconduct—Definitions.” November 29, 2018. https://grants.nih.gov/ policy/research_integrity/definitions.htm. National Science Foundation. “Scientists Seeking NSF Funding Will Soon Be Required to Submit Data Management Plans.” News release 10-077, May 10, 2010. https://www.nsf.gov/news/news_summ. jsp?cntn_id=116928. Nosek, Brian A., Charles R. Ebersole, Alexander C. DeHaven, and David T. Mellor. “The Preregistration Revolution.” Proceedings of the National Academy of Sciences 115, no. 11 (March 13, 2018): 2600–06. https://doi. org/10.1073/pnas.1708274114. Nosek, Brian A., Jeffrey R. Spies, and Matt Motyl. “Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth over Publishability.” Perspectives on Psychological Science 7, no. 6 (November 2012): 615–31. https://doi.org/10.1177/1745691612459058. Nüst, D., Sochat, V., Marwick, B., Eglen, S. J., Head, T., Hirst, T., & Evans, B. D. (2020). Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology, 16(11), e1008316. https:// doi.org/10.1371/journal.pcbi.1008316 Nüst, Daniel, Carl Boettiger, and Ben Marwick. “How to Read a Research Compendium.” ArXiv:1806.09525 [Cs], June 11, 2018. https://doi.org/10.48550/arXiv.1806.09525. Nüst, Daniel, and Matthias Hinz. “Containerit: Generating Dockerfiles for Reproducible Research with R.” Journal of Open Source Software 4, no. 40 (August 21, 2019): 1603. https://doi.org/10.21105/joss.01603. O’Brien, Bridget C., Ilene B. Harris, Thomas J. Beckman, Darcy A. Reed, and David A. Cook. “Standards for Reporting Qualitative Research: A Synthesis of Recommendations.” Academic Medicine 89, no. 9 (September 2014): 1245–51. https://doi.org/10.1097/ACM.0000000000000388. Odell, Jere D., Heather L. Coates, and Kristi L. Palmer. “Rewarding Open Access Scholarship in Promotion and Tenure: Driving Institutional Change.” College and Research Libraries News 77, no. 7 (2016): 322–25. https://doi.org/10.7912/C2R60B. OpenRefine. Home page. https://openrefine.org/. Open Science Collaboration. “Estimating the Reproducibility of Psychological Science.” Science 349, no. 6251 (August 28, 2015): aac4716. https://doi.org/10.1126/science.aac4716. Open Syllabus Explorer. Home page. https://opensyllabus.org/. O’Sullivan, James. “The Humanities Have a ‘Reproducibility’ Problem.” Talking Humanities (blog), July 9, 2019. https://talkinghumanities.blogs.sas.ac.uk/2019/07/09/the-humanities-have-a-reproducibility-problem/. Pawlik, Mateusz, Thomas Hütter, Daniel Kocher, Willi Mann, and Nikolaus Augsten. “A Link Is Not Enough— Reproducibility of Data.” Datenbank-Spektrum 19, no. 2 (2019): 107–15. https://doi.org/10.1007/ s13222-019-00317-8. 205 206 Section 2.2.3 Peels, Rik, and Lex Bouter. “Humanities Need a Replication Drive Too.” Nature 558, no. 7710 (June 21, 2018): 372–372. https://doi.org/10.1038/d41586-018-05454-w. Penders, Bart, J. Britt Holbrook, and Sarah de Rijcke. “Rinse and Repeat: Understanding the Value of Replication across Different Ways of Knowing.” Publications 7, no. 3 (September 2019): 52. https://doi. org/10.3390/publications7030052. Posit. “RStudio IDE.” https://posit.co/downloads/. Pratt, Michael G., Sarah Kaplan, and Richard Whittington. “The Tumult over Transparency: Decoupling Transparency from Replication in Establishing Trustworthy Qualitative Research.” Administrative Science Quarterly 65, no. 1 (March 1, 2020): 1–19. https://doi.org/10.1177/0001839219887663. Qcoder. GitHub. https://github.com/ropenscilabs/qcoder. REANA. Home page. Accessed August 13, 2020. https://reanahub.io/. Rechert, Klaus, Isgandar Valizada, Dirk von Suchodoletz, and Johann Latocha. “bwFLA—A Functional Approach to Digital Preservation.” PIK—Praxis der Informationsverarbeitung und Kommunikation 35, no. 4 (November 2012): 259–67. https://doi.org/10.1515/pik-2012-0044. Repo2docker. GitHub. https://github.com/jupyterhub/repo2docker. ReproServer. Home page. https://server.reprozip.org/. ReproZip. Home page. https://www.reprozip.org/. ———. “Using Reprounzip.” https://docs.reprozip.org/en/1.0.x/unpacking.html; https://server.reprozip.org/. ReproZip-Jupyter 1.2. Home page. https://pypi.org/project/reprozip-jupyter/. ReproZip Web. “ReproZip Web’s Documentation.” https://reprozip-web.readthedocs.io/en/latest/. Research Compedium. Home page. https://research-compendium.science/. Rethlefsen, Melissa L., Ann M. Farrell, Leah C. Osterhaus Trzasko, and Tara J. Brigham. “Librarian Co-authors Correlated with Higher Quality Reported Search Strategies in General Internal Medicine Systematic Reviews.” Journal of Clinical Epidemiology 68, no. 6 (June 2015): 617–26. https://doi.org/10.1016/j. jclinepi.2014.11.025. Rethlefsen, Melissa L., Shona Kirtley, Siw Waffenschmidt, Ana Patricia Ayala, David Moher, Matthew J. Page, Jonathan B. Koffel, and PRISMA-S Group. “PRISMA-S: An Extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews.” Systematic Reviews 10, no. 1 (January 26. 2021): article 39. https://doi.org/10.1186/s13643-020-01542-z. Rethlefsen, Melissa L., Mellanye J. Lackey, and Shirley Zhao. “Building Capacity to Encourage Research Reproducibility and #MakeResearchTrue.” Journal of the Medical Library Association 106, no. 1 (January 12, 2018): 113–19. https://doi.org/10.5195/jmla.2018.273. Riegelman, Amy. “A Primer on Preregistration (& Why I Think It Should Be a Submission Track in LIS Journals).” Presentation, Librarians Building Momentum for Reproducibility, virtual conference, January 28, 2020. https://osf.io/w4dfh/. Rodriguez, Diego, Lukas Heinrich, Maciulaitis Rokas, Tibor Simko, and Jan Okraska. Reanahub/Reana-DemoBsm-Search. Python. 2018. Reprint, REANA, 2020. https://github.com/reanahub/reana-demo-bsm-search. Rouse, Margaret. “What Does Integrated Development Environment Mean?” Techopedia, January 11, 2017. http://www.techopedia.com/definition/26860/integrated-development-environment-ide. Sandve, Geir Kjetil, Anton Nekrutenko, James Taylor, and Eivind Hovig. “Ten Simple Rules for Reproducible Computational Research.” PLOS Computational Biology 9, no. 10 (October 24, 2013): e1003285. https://doi. org/10.1371/journal.pcbi.1003285. Saving Data Journalism. Home page. https://savingjournalism.reprozip.org/. Sayre, Franklin, and Amy Riegelman. “Replicable Services for Reproducible Research: A Model for Academic Libraries.” College and Research Libraries 80, no. 2 (March 2019): 260–72. ———. “Reproducibility Bibliography.” Accessed September 30, 2020. https://reproducibility.dash.umn.edu/. Sayre, Franklin, Amy Riegelman, Shirley Zhao, Vicky Steeves, and Tisha Mentnech. Librarians Building Momentum for Reproducibility, virtual conference, January 28, 2020. https://vickysteeves.gitlab.io/ librarians-reproducibility/. Schimmack, Ulrich. “Questionable Research Practices: Definition, Detection, and Recommendations for Better Practices.” Replicability-Index (blog), January 24, 2015. https://replicationindex.com/2015/01/24/qrps/. Steeves, Vicky. “Reproducibility Librarianship.” Collaborative Librarianship 9, no. 2 (2017): article 4. https:// digitalcommons.du.edu/collaborativelibrarianship/vol9/iss2/4. Steeves, Vicky, Rémi Rampin, and Fernando Chirigati. “Reproducibility, Preservation, and Access to Research with ReproZip and ReproServer.” IASSIST Quarterly 44, no. 1–2 (June 29, 2020): 1–11. https://doi. org/10.29173/iq969. Stencila. Home page. https://stenci.la/. Stodden, Victoria. “Resolving Irreproducibility in Empirical and Computational Research.” Institute of Mathematical Statistics, November 17, 2013. https://imstat.org/2013/11/17/ resolving-irreproducibility-in-empirical-and-computational-research/. Supporting Reproducible Research Stodden, Victoria, David Bailey, Bill Rider, Jonathan Borwein, William Stein, and Randall LeVeque, comps. and eds. “Setting the Default to Reproducible.” Developed by participants in ICERM workshop Reproducibility in Computational and Experimental Mathematics, Providence, RI, December 14, 2012. https://stodden. net/icerm_report.pdf. Suiter, Amy M., and Heather Lea Moulaison. “Supporting Scholars: An Analysis of Academic Library Websites’ Documentation on Metrics and Impact.” Journal of Academic Librarianship 41, no. 6 (November 2015): 814–20. https://doi.org/10.1016/j.acalib.2015.09.004. Sylabs.io. “Singularity.” Accessed August 13, 2020. https://sylabs.io/. Taguette. Home page. https://www.taguette.org/. Tebbe, Heidi, and Danica Madison Lewis. “What’s Sauce for the Goose Is Sauce for the Gander: Reproducible Practice in Library Work.” OSF, February 3, 2020. https://osf.io/3xuj2/. Trisovic, Ana, Philip Durbin, Tania Schlatter, Gustavo Durand, Sonia Barbosa, Danny Brooke, and Mercè Crosas. “Advancing Computational Reproducibility in the Dataverse Data Repository Platform.” In P-RECS ’20: Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems, 15–20. New York: ACM, 2020. https://doi.org/10.1145/3391800.3398173. VIDA-NYU. “ReproZip Jupyter Extension.” Example videos. YouTube video, 0:51. https://www.youtube.com/ watch?v=Y8YmGVYHhS8&list=PLjgZ3v4gFxpWb277AEyjsVerB6nViGTVL. Vitale, Cynthia R. H. “Is Research Reproducibility the New Data Management for Libraries?” Bulletin of the Association for Information Science and Technology 42, no. 3 (2016): 38–41. https://asistdl.onlinelibrary. wiley.com/doi/full/10.1002/bul2.2016.1720420313. Ward, Logan. “Predicting the Properties of Inorganic Materials with Machine Learning.” Whole Tale dashboard. https://dashboard.wholetale.org/run/59fc7f9d60221d000163c37b?token=aJgFSGxFQUSUgPV6vkerjk7pfowAAOFyhW0NKiE3rx8k9LSbxzugxp8U3XRrB1ev. Webrecorder. Home page. https://webrecorder.net/. Whole Tale. Home page. 2019. https://wholetale.org/. Wikipedia. S.v. “JSON.” Last updated March 22, 2023. https://en.wikipedia.org/wiki/JSON. Wolf, Nicholas. “National School System and the Irish Language Heaney Lecture 2015.” April 12, 2016. https:// doi.org/10.17605/OSF.IO/PGK8V. ———. “National School System and the Irish Language Heaney Lecture 2015.” OSF. https://osf.io/wfvqr. ———. “The National-School System and the Irish Language in the Nineteenth Century.” In Schools and Schooling, 1650–2000, edited by James Kelly and Susan Hegarty, 208. Dublin: Four Courts Press, 2017. https://www.fourcourtspress.ie/books/2017/schools-and-schooling/. Yenni, Glenda M., Erica M. Christensen, Ellen K. Bledsoe, Sarah R. Supp, Renata M. Diaz, Ethan P. White, and S. K. Morgan Ernest. “Developing a Modern Data Workflow for Evolving Data.” Preprint. BioRxiv, July 24, 2018. https://doi.org/10.1101/344804. 207