The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/1468-4527.htm OIR 45,2 An analysis of use and performance data aggregated from 35 institutional repositories 316 Kenning Arlitsch Library, Montana State University Bozeman, Montana, USA Received 4 August 2020 Revised 10 September 2020 Jonathan Wheeler Accepted 14 September 2020 University Libraries, University of New Mexico, Albuquerque, New Mexico, USA Minh Thi Ngoc Pham School of Information Science and Learning Technologies, University of Missouri, Columbia, Missouri, USA, and Nikolaus Nova Parulian School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA Abstract Purpose – This study demonstrates that aggregated data from the Repository Analytics and Metrics Portal (RAMP) have significant potential to analyze visibility and use of institutional repositories (IR) as well as potential factors affecting their use, including repository size, platform, content, device and global location. The RAMP dataset is unique and public. Design/methodology/approach – The webometrics methodology was followed to aggregate and analyze use and performance data from 35 institutional repositories in seven countries that were registered with the RAMP for a five-month period in 2019. The RAMP aggregates Google Search Console (GSC) data to show IR items that surfaced in search results from all Google properties. Findings – The analyses demonstrate large performance variances across IR as well as low overall use. The findings also show that device use affects search behavior, that different content types such as electronic thesis and dissertation (ETD)may affect use and that searches originating in the Global South showmuch higher use of mobile devices than in the Global North. Research limitations/implications – The RAMP relies on GSC as its sole data source, resulting in somewhat conservative overall numbers. However, the data are also expected to be as robot free as can be hoped. Originality/value – This may be the first analysis of aggregate use and performance data derived from a global set of IR, using an openly published dataset. RAMP data offer significant research potential with regard to quantifying and characterizing variances in the discoverability and use of IR content. Peer review –The peer review history for this article is available at: https://publons.com/publon/10.1108/OIR- 08-2020-0328 Keywords Webometrics, Digital repositories, Institutional repositories, IR, Academic publishing, Web analytics Paper type Research paper ©KenningArlitsch, JonathanWheeler,MinhThi Ngoc Pham andNikolausNova Parulian. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and noncommercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at: http://creativecommons.org/licences/by/4.0/ legalcode Online Information Review The authors are grateful to the managers of the 35 repositories described in this study, who Vol. 45 No. 2, 2021 pp. 316-335 recognized the value of participating in RAMP. Emerald Publishing Limited This research was conducted as part of a grant generously funded by the Institute of Museum and 1468-4527 DOI 10.1108/OIR-08-2020-0328 Library Services (IMLS Log Number: LG-72-18-0179). Introduction An analysis of The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates IR use and performance and use data of institutional repositories (IR); it produces a unique dataset of performance standardized metrics across time [1]. Developed by Montana State University, the University of New Mexico, OCLC Research and the Association of Research Libraries (OBrien et al., data 2017), RAMP currently aggregates data frommore than 55 repositories around theworld, and new repositories continue to be added. RAMP tracks items from registered repositories that have surfaced in search results across all Google properties, including data that show 317 whether users clicked through to the IR and downloaded the file. This article demonstrates basic use and performance data for 35 repositories in seven countries that were registered with the RAMP from January 1–May 31, 2019. The RAMP dataset is composed of Google Search Console (GSC) data aggregated from registered repositories. It is the first openly published dataset of use and performance data aggregated from a cross platform set of IR in multiple countries, using a common set of metrics. GSC provides accurate non-HTML download counts executed directly from all Google search engine results pages (SERP). Metrics include impressions (the number of times an item appears in the SERP); position (the location of the item in the SERP); clicks; click- through ratios; date; device and country (Frost, 2019; Google, Inc., 2020). RAMP data are collected from GSC in two separate sets: page-click and country-device data. The page-click data include the Uniform Resource Locator (URL) of every item that appeared in the SERP, which opens significant possibilities for additional research if the metadata of those items were mined. The basic analyses of RAMP data performed for this paper demonstrate large performance variances across IR as well as low overall use. The data confirm that device use affects search behavior, that different content types such as electronic theses and dissertations (ETD) may affect use and that searches originating in the Global South show much higher use of mobile devices than in the Global North. In addition to providing a baseline for IR performance metrics, the data offer significant research potential. RAMP data can help scholars understand the disciplinary scope and breadth of the content contained in the IR. Research questions The research questions are designed to elicit a data-driven story about IR that has not previously been available due to a lack of a uniform analytics model applied across a disparate set of repositories. (1) What do RAMP data show about the size and the use of IR? (2) How do devices used to access IR content affect search behavior? Literature review Calls for standardized reporting for IR have been evident almost as long as they have existed (Fralinger and Bull, 2013; Harnad and McGovern, 2009; Organ, 2006). However, nearly all such calls end with some variation of the lament that “no clear standard has yet emerged for repository reporting and assessment” (McDonald and Thomas, 2008). The siloed and distributed nature of the IR landscape has exacerbated this situation as research libraries collectively commit significant resources toward running their own IR (Arlitsch and Grant, 2018) and make “long-term commitment[s] for safeguarding, preserving and making accessible the intellectual content of an institution” (Lagzian et al., 2015) despite having little idea of how much they are really being used and with no way to compare OIR against other institutions. Hosted solutions like BePress’s Digital Commons platform have 45,2 the capacity to collect aggregate data, but due to the commercial orientation of this platform these data are not openly available to the IR community. Bruns and Inefuka assert that metrics must be contextualized to be useful and that download statistics are only part of assessing IR performance, but when discussing the collection download counts from the big three IR platforms (DSpace, EPrints and Digital Commons), they suggest using platform-generated statistics and do not question the 318 accuracy of those numbers (Bruns and Inefuku, 2015). Some research has indicated that platform-generated statistics may be inflated by as much as 85% because they rely on log file analysis, and nonhuman traffic is very difficult to filter (Greene, 2016). Other researchers also correctly warn of the limitations associated with placing too much faith in statistics to determine quality, use and performance of repositories. They point out that it is difficult to distinguish different kinds of traffic, that items may have multiple URLs and that different kinds of collections act differently with search engines, among other limitations (Perrin et al., 2017). The importance of standardized usage statistics “across repositories and repository platforms” is highlighted in the Next Generation Repositories Report (Rodrigues et al., 2017). The IRUS-UK project is similar in intent to RAMP and has been operating since 2012 (Needham and Stone, 2012). IRUS-UK collects data from approximately 180 repositories in the United Kingdom (Jisc, 2019), and a pilot project known as IRUS-USAwas launched to test the IRUSmodel with 11 repositories in the United States (Kim, 2018). An assessment of the IRUS- USA pilot project concluded that the IRUS model was favorably received, and that participants desiredmore documentation, more functionality and granularity available in the reports and more visual statistics (Thompson et al., 2019). A principal difference between IRUS and RAMP is the IRUS Tracker Protocol that must be installed at each repository. It “gathers basic raw data for each download,”which are then processed to create COUNTER-compliant data and to remove “robot activity or other unusual activity” (MacIntyre and Jones, 2016). IRUS-UK attempts the difficult process of filtering robot activity from log files and then performs various analyses on the cleaned data. Filtering is defined in an IRUS-UK position statement: “COUNTER provides a list of robots, whose usage should be removed as a bare minimum. The list is used as part of the audit process and is not intended to be a comprehensive list. The need for more sophisticated rules and processes is well understood.” The document also states that IRUS-UK has “added further filters to remove more user agents identified as robots and applied a simple threshold for ‘overactive’ IP addresses” (IRUS-UK Team, 2013). The RAMP requires no local installation, as it simply utilizes the data that Google passes to it through the GSC Application Programming Interface (API) (Google, Inc., 2020). RAMP also relies on Google to filter robot activity. Google’s success as an advertising platform depends on its ability to guarantee to customers that clicks are human generated. The data that are passed to RAMP through GSC are therefore as robot free as can be hoped, and the relatively conservative download numbers demonstrated by RAMP, compared to platform-generated statistics, support this theory (OBrien et al., 2017). The RAMP misses non-Google referrals to repositories, but prior work by the research team indicates that a vast majority of traffic to IR is driven by Google properties and that the traffic referred by other search engines, social media sites, email, etc. is comparatively small. Therefore, while the sheer numbers of referrals and downloads tracked by RAMPmay be conservative, we believe that the trade-off for nearly robot-free data is worthwhile. Methodology An analysis of The findings reported below represent a webometric analysis of RAMP data. The term IR use and “webometrics” was coined to describe research into network-based information using performance quantitative measures (Almind and Ingwersen, 1997) and includes the application of mathematical and statistical methods to the study of quantitative aspects of information data sources on the World Wide Web. Webometrics regards web pages as “information entities,” with hyperlinks across pages acting as citations and citation networks (Almind and Ingwersen, 1997). 319 Webometrics includes four main areas of research: (1) web page content analysis; (2) web link structure analysis; (3) web usage analysis, which includes analysis of users’ search and browsing behavior and (4) web technology analysis, which includes search engine performance (Bj€orneborn and Ingwersen, 2004). The current study focuses on two areas of webometric analysis: web usage and web technology. Data analytic methods including data aggregation, data reshaping, visualization and descriptive statistics are used to examine the performance of RAMP participants, including the use of IR items, their discoverability in search engines and technologies used to search and view the IR content. Data creation and analysis methods The published data include monthly CSV files of page-click and country-device data for the period of January 1–May 31, 2019 (Wheeler et al., 2020b). Python and R scripts used for the analysis are available on GitHub (Wheeler et al., 2020a). More detailed versions of the Tableau visualizations shown in this paper may be viewed at Tableau Public (Parulian, 2020). The number of repositories currently registered with RAMP exceeds 55, but data from only 35 repositories were analyzed for this research (Table 1). The reason for this is consistency. Most of the other repositories were registered more recently and had not accumulated data for the same five-month period of time (January 1, 2019–May 31, 2019). A few had also experienced configuration errors that were usually the result of updates to the repository platform. The authors have a high degree of confidence in the accuracy and consistency of the data from 35 repositories for the period under study. The dataset created for this study is available for download from the Dryad data repository, along with documentation (Wheeler et al., 2020b). In order to calculate summary statistics relative to overall IR content, additional data were gathered for the page-click analyses described below (Arlitsch et al., 2019). These data include the total number of items hosted by each repository, the corresponding IR platform, the country in which the IR is located and the count of electronic theses and dissertations (ETDs) in each IR. RAMP data are harvested daily via the GSC API (Wheeler et al., 2020b). Two datasets are downloaded per IR: a page-click dataset that includes granular data per URL; and a country-device dataset that is less granular and does not include URLs. The lack of URLs within the country-device data means that these data cannot be combined or cross referenced with page-click data. Another implication for the results is that statistics derived from country-device data represent the aggregate SERP performance of all pages within an IR, including HTML pages and citable content pages (definition below). It is not possible within country-device datasets to disaggregate data about citable content from other activity. The analyses are therefore divided into two sections. The first section demonstrates a baseline analysis of repository use and citable content downloads (CCD) using page-click data. Data aggregation scripts and resulting summary statistics per analyzed IR are provided on GitHub (Wheeler et al., 2020a). OIR Country Type Repository name Platform #Items 45,2 The USA University Deep Blue (U Michigan) DSpace 124,436 Digital Commons@UNebraska Lincoln Digital 105,065 Commons Digital Repository (UNM) Digital 93,564 Commons 320 Caltech Authors EPrints 3 81,000 Caltech THESIS EPrints 3 9,814 ScholarWorks (U Montana) Digital 75,715 Commons VTechWorks DSpace 72,275 ScholarWorks (U Texas) DSpace 60,359 Rucore (Rutgers) Fedora 43,008 K-REX (Kansas State University) DSpace 37,084 UKnowledge (U Kentucky) Digital 34,196 Commons Digital Scholarship at UNLV Digital 23,896 Commons D-Scholarship@Pitt EPrints 3 21,358 DRUM (U Maryland) DSpace 21,246 IUPUI ScholarWorks DSpace 15,000 Montana State ScholarWorks DSpace 14,381 Swarthmore Works Digital 11,095 Commons Digital Repository Service Fedora/Samvera 5,446 (Northeastern) NKU Digital Repository DSpace 2,420 Scholarly Works @ SHSU DSpace 2,095 Consortium Mountain Scholar DSpace 71,708 ShareOK DSpace 64,972 Maryland SOAR DSpace 9,359 TriCollege Libraries IR DSpace 8,728 Australia University Research Online (U Wollongong) Digital 69,396 Commons The United University Strathprints EPrints 3 51,386 Kingdom PEARL (U Plymouth) DSpace 9,903 Canada University MacSphere (McMaster) DSpace 17,979 UWSpace (U Waterloo) DSpace 13,021 Consortium VIURRSpace DSpace 10,366 Table 1. Sweden University Epsilon Archive for Student Projects EPrints 3 11,895 List of 35 IR by Epsilon Open Archive EPrints 3 9,307 country, type, name, New Zealand University Massey Research Online DSpace 12,207 platform and number South Africa University Western Cape Research Repository DSpace 5,143 of items Western Cape ETD Repository DSpace 3,640 The second section demonstrates an analysis of trends evident in the country-device data. Specifically, we examine the number of visits from each country and characterize some limited effects devices seem to have on user search behaviors. In both cases, it is important to remember that the datasets represent 35 repositories on four different platforms, and that the numbers shown are not necessarily indicative of characteristics of those platforms. Although themajority of RAMP IR host a range of general- purpose academic content including self-archived copies of scholarly articles, datasets, and ETD, some of the analyzed repositories focus more narrowly on hosting specific kinds of content. Some types of content may drive use more than other types. Consequently, the authors reiterate that the results presented below represent a baseline analysis of a unique, An analysis of open dataset. The authors recommend further study with a larger sample of IR and a longer IR use and date range in order to better generalize results across the IR ecosystem at large. performance Data definitions data Combining the descriptive data reported by (Arlitsch et al., 2019) with IR search engine performance data from RAMP allows further analysis of IR characteristics – size, platform and location – in terms of the actual access and use of IR content. The page-click analysis and 321 discussion rely on the following definitions: (1) Citable content URLs: Page-click data harvested from the GSC API include search engine performance statistics for individual URLs. Many of these URLs point to ancillary IR pages such as the IR homepage or the HTML pages that contain item level metadata. RAMP maintains these data but primarily reports click activity on content files (PDF, CSV, etc.), so it is necessary to differentiate URLs that point to content files from those that point to HTMLpages. URLs that point to content files are referred here as “citable content URLs.” (2) Citable content downloads (CCD): These are RAMP’s primary metric and are defined as clicks on citable content URLs. (3) Item: An “item” is any single asset published within an IR, including item level metadata and all content files or bitstreams associated with the asset. Items are commonly represented by HTML pages. For example, everything found at https:// scholarworks.montana.edu/xmlui/handle/1/9939; it is considered part of a single “item” (Obrien et al., 2016), which includes the published metadata and the seven associated bitstreams or content files. By contrast, each of the seven content file URLs is considered a single citable content URL. (4) Use ratio: For each IR, the use ratio is the count of unique items with CCD divided by the total count of items hosted by the IR. Since a single itemwithin an IR may contain many content files, this calculation requires inferring the HTML URL of the corresponding IR item page for any citable content URL with a positive click value in the RAMP page-click dataset. The inferred HTML URLs are further processed to deduplicate items that occur in the data with both secure (HTTPS) and non-secure (HTTP) connection protocols. The final count of deduplicated URLs is the numerator of this ratio. Using the example, above, if each of the seven content files accessible from (Obrien et al., 2016) were clicked on during the period of study, all of that activity would be aggregated as one item use under the single, corresponding item HTML URL for the sake of calculating the use ratio. Python code and documentation for inferring item pages and calculating use ratio is available at (Wheeler et al., 2020a). More information about citable content URLs and CCDs, including how data harvested from the GSC API are processed to identify citable content URLs, is available in the published dataset documentation (Wheeler et al., 2020b). Further limitations of the dataset are noted in the Discussion section of this paper. Results of the page-click data analysis RQ1. What do RAMP data show about the size and use of IR? The number of items hosted by individual repositories ranges from 2,095 to 124,436 (Figure 1); the average number of items in each repository is 34,928 items (M5 34,928) (Table 2) and the IR OIR 45,2 322 Figure 1. Number of items in 35 repositories registered with RAMP Table 2. Overall statistics of Variable N Min MDN M Max items in repositories registered with Repository 35 2,095 17,979 34,928 124,436 the RAMP Note(s): N 5 repositories, M 5 mean and MDN 5 median operate on four platforms: Digital Commons, DSpace, EPrints and Fedora (Table 3). The four platforms host 7, 20, 6 and 2 RAMP-registered repositories, respectively. DSpace is the platform that is used to host the most repositories in the dataset (20) and consequently contains the most items (576,322). Digital Commons is the platform with the second most items (412,927 items) in seven repositories. There are six repositories totaling 184,760 items using the EPrints platform. Fedora has the fewest items with 48,454 items in two repositories. On average, each repository in DSpace has 28,816 items. The EPrints platform ranks second for the average number of items with 30,793. Digital Commons repositories contain an average 58,793 items, almost double the average number of items in DSpace repositories [2]. The repositories are hosted in seven countries: Australia, Canada, New Zealand, South Africa, Sweden, the United Kingdom and the USA (Table 4). Most of the 35 repositories Platform N Min MDN M Max Total DSpace 20 2,095 13,701 28,816 124,436 576,322 Digital Commons 7 11,095 69,396 58,990 105,065 412,927 Table 3. EPrints 6 9,307 16, 627 30,793 81,000 184,760 Number of items by Fedora 2 5,446 24,227 24,227 43,008 48,454 platform Note(s): N 5 platform, M 5 mean and MDN 5 median represented in this dataset are based in the USA (N 5 24) and contain a total of 1,008,220 An analysis of items. Canada (M 5 13,789) has three participating repositories containing a total of 41,366 IR use and items. The United Kingdom, Sweden and South Africa each have two participating performance repositories with a total of 61,289 items, 21,202 items and 8,783 items, respectively. South Africa has two participating repositories containing a total of 8,783 items. Australia and New data Zealand each have one participating repository with 69,396 items and 12,207, respectively. Of particular interest is the use ratio, which denotes how much IR content is accessed compared to how much is available. Use falls into three categories (Figure 2): IR with high 323 numbers of items but low use ratio (11 repositories); IRwith low numbers of items and lowuse ratio (17 repositories) and IR with low numbers of items but high use ratio (seven repositories). The first category (high item/low use ratio) has a total of 869,876 items from 11 Country N Min MDN M Max Total The USA 24 2,095 29,046 42,009 124,436 1,008,220 Australia 1 69,396* The United Kingdom 2 9,903 30,645 30,645 51,386 61,289 Canada 3 10,366 13, 021 13,789 17,979 41,366 Sweden 2 9,307 10,601 10,601 11,895 21,202 New Zealand 1 12,207* South Africa 2 3,640 4,392 4,392 5,143 8,783 Note(s): N 5 institutional repository,M5 mean and MDN5 median, * indicates the subset of RAMP data Table 4. used for the analysis. There is only one participating IR in those countries; therefore, we only present the total Number of repository number of items for those countries items per country Use ratio by category Categories High Item Count Low Use Ratio 120K Low Item Count Low Use Ratio Low Item Count High Use Ratio 100K 80K 60K 40K 20K 0K 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 2. Use ratio by category Use Ratio Items in repository OIR repositories (M5 79,080), and the mean use ratio is 0.22. The second category (low item/low 45,2 use ratio) has a total of 277,589 items from 17 repositories (M5 16,328) and a mean use ratio of 0.25. The last category (low item/high use ratio) has a total of 74,998 items from seven repositories (M5 10,714) and a mean use ratio of 0.63. In the first two categories, during the five months from January to May 2019, on average, 25% or fewer of the items in the corresponding IRwere accessed. On the other hand, in the last category,more than 60%of the items in the IR were accessed [3]. Complete statistics about the number of items per use 324 ratio category and use ratio categories can be found in Tables 5 and 6. In the first category, high item/low use (M5 79,080), the average number of items in each repository is 79,080 items. In the second category, low-item/low-use (M 5 16,392), each repository has on average 16,392 items registered. The repositories in the low item/high use category (M5 10,714) have an average of 10,714 items. The total numbers of items in the first, second and third categories are 869,876, 277,589, and 74,998, respectively. The common pattern of use which can be seen from these statistics is that the average percentage of use is out of proportion with the total number items in the IR. The further examination of the use ratio in each category supports the finding above that the use of the items in the IR is disproportionate to the total numbers of items within the IR. The first category has the most items on average and the lowest use ratio. In the first category, high-item/low-use (M5 0.22, SD5 0.1), during the five-month period, on average, only 22% (0.22) of items within these repositories were downloaded . Conversely, the low item/high use category has the fewest items on average but the highest use ratio. The average percentage of use per item for the last category, low item/high use (M5 0.63, SD5 0.14), is 63% (0.63). The use ratio of each repository is visualized in Figure 3 and the use ratio by platform is shown in Table 7. The 35 repositories are reordered by use ratio in Table 8. Use ratio appears to be positively affected by the number of ETDs in a given repository (Figure 4). The repositories that contain more ETD as a portion of their total items tend to have higher use ratios than repositories that contain fewer or no ETD. Results of the country-device data analysis In addition to the page-click data analyzed in the previous section, the RAMP also harvests daily search engine performance data describing the devices used to conduct searches on Google properties and the countries from which the searches originated. These data are aggregated in a combination of country and device, and they do not include URLs. Since it is Category N Min MDN M Max Total High items low downloads 11 51,386 72,275 79,080 124,436 869,876 Table 5. Low items low downloads 17 2,095 11,095 16,329 43,008 277,589 Number of items by use Low items high downloads 7 5,143 11,895 10,714 17,979 74,998 ratio category Note(s): N 5 repository, M 5 mean and MDN 5 median Category N Min MDN M SD Max High items low downloads 11 0.08 0.18 0.22 0.11 0.45 Low items low downloads 17 0.06 0.27 0.25 0.09 0.40 Table 6. Low items high downloads 7 0.51 0.59 0.63 0.14 0.90 Use ratio category Note(s): N 5 repository, M 5 mean, MDN 5 median and SD 5 standard deviation Use ratio by repository An analysis of 120k Use ratio 0.0600 0.9000 0.8 IR use and 100k performance 80k 0.6 data 60k 0.4 40k 20k 0.2 325 0k 0.0 Figure 3. Use ratio of each repository Platform N Min MDN M SD Max EPrints 6 0.16 0.32 0.40 0.28 0.90 Fedora 2 0.19 0.38 0.38 0.28 0.58 DSpace 20 0.16 0.27 0.32 0.16 0.69 Digital Commons 7 0.06 0.22 0.22 0.15 0.45 Table 7. Note(s): N 5 repository, M 5 mean, MDN 5 median and SD 5 standard deviation Use ratio by platform Repository name #Items in repository #Unique CCD items Use ratio Epsilon Archive for Student Projects 11,895 10,753 0.90 Massey Research Online 12,207 8,445 0.69 Western Cape ETD Repository 5,143 3,244 0.63 UWSpace (U Waterloo) 13,021 7,657 0.59 Digital Repository Service (Northeastern) 5,446 3,173 0.58 MacSphere (McMaster) 17,979 9,122 0.51 Epsilon Open Archive 9,307 4,747 0.51 Digital Commons @U Nebraska Lincoln 105,065 47,418 0.45 DRUM (U Maryland) 21,246 8,437 0.40 CaltechTHESIS 9,814 3,673 0.37 IUPUI ScholarWorks 15,000 5,379 0.36 Montana State ScholarWorks 14,381 4,405 0.31 UKnowledge (U Kentucky) 34,196 10,628 0.31 Research Online (U Wollongong) 69,396 21,343 0.31 Scholarly Works @ SHSU 2,095 586 0.28 D-Scholarship@Pitt 21,358 6,011 0.28 Texas ScholarWorks 60,359 16,255 0.27 Western Cape Research Repository 3,640 972 0.27 Maryland SOAR 9,359 2,543 0.27 Table 8. Deep Blue (U Michigan) 124,436 31,773 0.26 Repositories sorted by use ratio (continued ) Items in repository Deep Blue (U Michigan) Digital Commons @U Nebraska Lincoln Digital Repository (UNM) Caltech Authors ScholarWorks (U Montana) VTechWorks Mountain Scholar Research Online (U Wollongong) ShareOK Texas ScholarWorks Strathprints Rucore (Rutgers) K-REX Uknowledge (U Kentucky) Digital Scholarship at UNLV D-Scholarship@Pitt DRUM (U Maryland) MacSphere (McMaster) IUPUI ScholarWorks Montana State ScholarWorks UWSpace (U Waterloo) Massey Research Online Epsilon Archive for Student Projects Swarthmore Works VIURRSpace PEARL (U Plymouth) CaltechTHESIS Maryland SOAR Epsilon Open Archive TriCollege Libraries IR Digital Repository Service Western Cape ETD Repository Western Cape Research Respository NKU Digital Repository Scholarly Works @ SHSU Use Ratio OIR Repository name #Items in repository #Unique CCD items Use ratio 45,2 PEARL (U Plymouth) 9,903 2,449 0.25 VTechWorks 72,275 17,610 0.24 Digital Scholarship at UNLV 23,896 5,339 0.22 RUcore (Rutgers) 43,008 8,264 0.19 Mountain Scholar 71,708 12,611 0.18 326 ShareOK 64,972 11,829 0.18 Strathprints 51,386 9,451 0.18 TriCollege Libraries IR 8,728 1,474 0.17 K-REX 37,084 6,433 0.17 NKU Digital Repository 2,420 408 0.17 Caltech Authors 81,000 13,183 0.16 VIURRSpace 10,366 1,670 0.16 Digital Repository (UNM) 93,564 8,839 0.09 ScholarWorks (U Montana) 75,715 5,836 0.08 Table 8. Swarthmore Works 11,095 687 0.06 Use ratio compared to percent of ETD in repositories 0.9 1.0 Sum of ETD percent 0.0 1.0 0.8 0.9 0.7 0.8 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0 Figure 4. Use ratio compared to percent of ETD in repositories Use Ratio Epsilon Archive for Student Projects Massey Research Online Western Cape ETD Repository UWSpace (U Waterloo) Digital Repository Service (Northeast... MacSphere (McMaster) Epsilon Open Archive Digital Commons @U Nebraska Lincoln DRUM (U Maryland) CaltechTHESIS IUPUI ScholarWorks Uknowledge (U Kentucky) Research Online (U Wollongong) Montana State ScholarWorks Scholarly Works @ SHSU D-Scholarship@Pitt Western Cape Research Respository Texas ScholarWorks Maryland SOAR Deep Blue (U Michigan) PEARL (U Plymouth) VTechWorks Digital Scholarship at UNLV Rucore (Rutgers) Strathprints ShareOK Mountain Scholar TriCollege Libraries IR NKU Digital Repository K-REX VIURRSpace Caltech Authors Digital Repository (UNM) ScholarWorks (U Montana) Swarthmore Works ETD as Percent of Items not possible to filter clicks on citable content URLs from clicks on HTMLURLs, the following An analysis of analysis describes trends in search engine performance per IR rather than per item. IR use and RQ2. How do devices used to access IR content affect user behavior? performance Table 9 shows that most of the clicks on IR content (70.3%) came from desktop users. Tablets data were used much less frequently than desktop and mobile devices. Figure 5 and Table 10 show how device use varies depending on the SERP position of IR pages. The number of clicks originating from desktops, mobile devices and tablets in the five- 327 month period was 7,438,457, 2,878,834 and 263,205, respectively. The “position” relates to position in the SERP where each page contains ten results; a lower position indicates a better placement in the SERP.When desktop interfaces were used to conduct Google searches, each item in the IR was, on average, downloaded 3.13 times and the median position of the IR content in the SERP was 60.2. The average download of an item when mobile phones were used to search for information in the repositories was 2.04, and the median position of the IR content in the SERP was 18.7. The average number of downloads of an item when tablets were used to search for information in the repositories was 0.48, and the median position of the IR content in the SERP was 11.9. Analysis of access to IR content by device, country and Global North/South origin was performed by merging RAMP country-device data with a manually tabulated dataset of country names, three letter ISO 3166 codes (International Organization for Standardization, 2020; Wikipedia Contributors, 2020) and Global North/South designations per country (Meta Contributors, 2020). The RAMP dataset contains search engine result data from “unknown Device Count of occurrences Total clicks % of total clicks Desktop 2,372,958 7,438,457 70.30 Table 9. Mobile 1,410,171 2,878,834 27.21 Breakdown of clicks by Tablet 550,301 263,205 2.49 device Figure 5. Position of items in Google search results vs clicks OIR regions,” and since there is no way to tell whether users conducting the corresponding 45,2 searches came from either the Global North or South, these data were dropped from the following analyses. The total number of dropped rows and click sums is documented in (Wheeler et al., 2020a). As indicated by Figure 6, users in the Global North generally accessed IRmore frequently than those in the Global South. Within the five-month period, 57% of clicks (5,993,841) in the data came from users in the Global North and 43% of clicks (4,562,586) came from users in 328 the Global South (Table 11). In total, three of the top five countries with the most clicks are in the Global North: USA (2,763,548); United Kingdom (590,516) and Canada (421,358). The Variable N Min MDN M SD Max Desktop 2,372,958 Clicks 0 0 3.13 34.59 3,258 Position 1 60.2 89.4 74 982 Mobile 1,410,171 Clicks 0 0 2.04 28.51 22,740 Position 1 18.7 45.3 58.2 771 Tablet 550,301 Clicks 0 0 0.48 3.95 211 Table 10. Position 1 11.9 45.26 67.2 685 Positions vs clicks Note(s):N5 device use frequency,M5mean (page position), SD5 standard deviation and MDN5median Figure 6. World map showing device use in the Global North and Global South Global South has two countries in the top five: India (950,106) and the Philippines (820,415) An analysis of (Table 12). IR use and Table 9 showed that desktop operating systems were consistently the dominant devices performance used to download and read documents (70.3%), followed by mobile phones (27.2%). Tablets accounted for only a small portion of use (2.5%). Table 13 shows that users in the Global data North accounted for 79% of the clicks from desktops, 18% frommobile phones and 3% from tablets, while users in the Global South accounted for 59% of the clicks from desktops, 39.4% of the clicks from mobile phones and 1.6% from tablets. 329 Discussion Limitations A limitation of RAMP data is the reliance of the service on GSC as the sole data source. While the current literature demonstrates that the vast majority of IR traffic comes from Google properties (Macgregor, 2019), content is sometimes shared through social media applications like Twitter and Facebook as well as academic professional networks like ResearchGate and Academia.edu and networked research services likeMendeley. In addition, links to IR content may be embedded in public relations pages, blogs and news stories, course management software, etc. RAMP can only capture clicks on IR content links embedded in these services and pages if they are indexed and exposed via Google SERP. RAMP data may therefore be considered to be somewhat conservative, but the authors contend that a full accounting of all clicks on IR content is far less important than a substantial dataset with assurance that nearly all clicks are human generated. As discussed in the literature review section, Google’s highly successful pay-per-click advertising model (Alphabet, Inc., 2015) depends on its ability to filter out robot traffic, so there is high confidence that the data provided through RAMP represent human traffic. Location Clicks Percent Table 11. Clicks from users in the Global North 5,993,841 57% Global North and the Global South 4,562,586 43% Global South # Country Location Clicks 1 The USA Global North 2,763,548 Table 12. 2 India Global South 950,106 Top five countries that 3 The Philippines Global South 820,415 generated IR traffic to 4 The United Kingdom Global North 590,516 RAMP-registered 5 Canada Global North 421,358 repositories Table 13. Location Desktop Mobile Tablet Device use between users in the Global Global North 4,731,597 (78.9%) 1,070,754 (17.9%) 191,490 (3.2%) North and the Global South 2,692,760 (59%) 1,798,312 (39.4%) 71,514 (1.6%) Global South OIR A limitation of the page-click analysis is the difference between collection dates of 45,2 information including IR item counts and the separate count of ETD that occurred for each IR. ETD counts were collected during the first ten days of June 2019 while the RAMP dataset reaches only to the end of May 2019, so it is likely that some ETD were added to one or more RAMP IR in the interim. The difference is very small and does not significantly affect the results of the analyses. A second limitation of the page-click and use ratio analysis is that the scale of the dataset 330 required the development of an automated method for deriving the ratio’s numerator. For each repository, this number is the count of unique items that contain content files with positive click values in RAMP. “Items” is here understood as the HTML pages that include the metadata and content for individual IR objects and are therefore the parent pages of the citable content URLs tracked by RAMP. HTML URLs of the parent pages have to be inferred or reconstructed using information contained within the content file URLs. The process for doing so is platform specific, and in the case of EPrints and Fedora it is also limited for identifying the parent pages of PDF files rather than all content types. Variation among platforms may result in a case where the use ratio is not an equivalent method of comparison between IR that use different platforms. Even so, the authors are confident that the method is accurate based on data integrity checks built into the process as described in available documentation (Wheeler et al., 2020a). Size of IR There is a large variance in the number of items in the 35 IR that comprised our dataset. The smallest repository contains just over 2,000 items and the largest nearly 125,000; the median was almost 18,000 items. The size of the IR did not always alignwith the size of the institution. For example, Caltech Authors is the fourth largest IR in the dataset with 81,000 items, but Caltech itself has only 300 professorial faculty, 600 research scholars and approximately 2,250 undergraduate and graduate students (California Institute of Technology, 2019). Caltech is therefore a very high performing institution when measured by research publications and one where most of the publications appear to be deposited in the IR. Conversely, some larger institutions fall near the lower end of the range in terms of number of items in their repositories. The fact that some small institutions have large repositories while some large institutions have small repositoriesmay be attributed to several factors including the research and publishing culture on campus, the success of the library in attracting participation in the IR and even the platform itself. For example, the Digital Commons platform from BePress can also be used as a journal publishing platform, which can result in large numbers of articles published in locally managed journals. Use ratio The use and performance of the IR also vary significantly and can be categorized into three groups. We calculated use ratio as the number of unique items containing content file URLs with positive click values in RAMP data divided by the total number of items in the repository. Using this calculation, we have shown that some IR have high numbers of items but comparatively low use ratio (<0.3); others have low numbers of items and low use ratio (<0.4); and yet other IR have low numbers of items, but the use ratio is relatively high (>0.7) compared to other institutions in the set. This shows that larger IR do not necessarily experience higher rates of use than smaller IR and that other factors may be at play. Factors affecting the ratio of use could include whether the repository has been successfully harvested and indexed by Google and Google Scholar, position of the items in the SERP, attractiveness of the item title and rich snippet that appear in the SERP, the intellectual content of the items and even current trends in research. In general, it must be noted that nearly all of the IR in this dataset suffer from low use as An analysis of indicated by downloads, although, admittedly, our use ratio is a rather blunt instrument of IR use and measure. For example, a gross simplification would be to say that 31% of the 14,381 items performance in the Montana State University ScholarWorks Repository were accessed at least once. While this is true as a general assessment of the repository’s usage, it is likely that a more data granular analysis would demonstrate that the majority of access and use of IR content is driven by a small number of very popular items that are downloaded many times. Conversely, there are probably many items that are seldom accessed or never downloaded 331 at all. Quantifying detailed usage is possible by examining individual URIs in the RAMP dataset. The RAMP team has completed a preliminary analysis of the data along these lines, but the discussion of these initial findings is beyond the scope of this paper [4]. For now, satisfaction must lie in the realization that the same use ratio calculation was applied to every repository. The different platforms also showed some difference in use ratio. Digital Commons repositories had the lowest overall use ratio of the four platforms in this dataset.Whether this is due to the capacity of Digital Commons to facilitate the archiving of materials that are not research publications or ETD is a question that cannot be answered with our current dataset. The effect of ETD on use ratio The Epsilon Archive for Student Projects, which primarily houses ETD, showed the highest use ratio in our dataset. Indeed, when one draws back the lens to determine whether there is any common characteristic that positively affects use ratios, the concentration of ETD in repositories emerges as a considerable factor. In addition to the Epsilon Archive for Student Projects, Massey Research Online, Western Cape ETD Repository, UWSpace and Caltech THESES are among the repositories whose content consists almost entirely of ETD and whose use ratios are the highest in our dataset, ranging from 0.37 to 0.90. For comparison, we can again use the example of Caltech Authors, where, despite a large repository of 81,000 items, it showed one of the lowest use ratios of only 0.16. Why do repositories with high concentrations of ETD seem to experience more use than repositories that consist mainly of faculty publications or other kinds of items? Is it because theses and dissertations are less likely to be published anywhere else, except perhaps in the fee-based ProQuest® Dissertations and Theses Global database? Is it that theses and dissertations represent new and original research that might be of interest to researchers, corporations or even governments? Here again lies another avenue for future research that can be facilitated by RAMP’s open dataset. Discoverability Multiple factors can affect the discoverability and use of IR items. RAMP data are confirming years of speculation that many IR suffer from low use and one significant causal factor is likely to be low indexing ratios in search engines. Search engine optimization (SEO) tends to be inconsistently practiced or even nonexistent in libraries, and this can make it difficult for search engines to uniformly harvest and index IR. If items in the repository do not appear in the various Google search indexes, then there is no possibility of them being surfaced in the Google SERP and no possibility of RAMP data showing downloads. As mentioned in the discussion on use ratio, another potentially significant factor in discoverability is the position where the item appears in the SERP. Items that appear within the first few pages of SERP logically have a much higher chance of being downloaded from a repository than items that appear further down the list. Position may be affected by SEO practices, including metadata that can help the search engine determine how relevant the item is to the user’s query. OIR Device use 45,2 RAMP data highlight how devices may affect the behavior of users who search and access IR content. Despite the ubiquity of mobile devices, our dataset shows that most users still employ desktop operating systems (including laptop computers) and that they tend to delve further into the SERP and download more items from IR than mobile or tablet users. It is possible that the user experience withmobile devices may be degraded enough in both search engine and IR interfaces to result in these lower numbers. Or perhaps researching andwriting 332 is simply more integrated and convenient on a desktop. For example, a researcher working in a desktop environment may search for articles in a browser, while simultaneously using a reference manager and a word processing application in other windows. These applications may even work in concert through plug-ins, which provide a level of convenience and sophistication that is simply not available on mobile or tablet devices. Theories aside, the fact remains thatmany people have onlymobile devices to satisfy their research needs. In fact, nearly five billion people worldwide were estimated to own mobile devices in 2019 and “over half of these connections are smartphones” (Taylor and Silver, 2019). By onemeasure, worldwide distribution ofmobile devices is approximately 52%,while desktops are 45% and tablets are 3%, with the tilt toward mobile devices being more pronounced in the Global South (StatCounter GlobalStats, 2019). The data in this study show that mobile device users in the Global South access the IR content at a much higher rate (39.4%) than mobile device users in the Global North (17.9%). When paired with the search behavior data of desktop users, it is reasonable to infer that users in the Global South are at a disadvantage because mobile devices are more limiting for in-depth research than desktops. Conclusion Wewish to emphasize that the conclusions drawn from this study should be considered with caution. This work still represents only 35 repositories from a worldwide landscape that may exceed 4,500 (University of Southampton, 2019). More participation in the RAMPwill help to grow the dataset available for analysis. The RAMP dataset holds much potential for further research. Some examples include the following: (1) Analyze the scholarly record in the IR to better understand what is available and what users seek. (2) Identify and correct SEO or other problems for repositories that experience low use. (3) Support or dispute the theory that developing countries benefit from open access to research publications in the IR. (4) Analyze metadata from articles with high SERP positions to reveal characteristics that foster discoverability and use. (5) Test the theory that IR is actually supplying preprint content for citations of articles behind paywalls. The purpose of this research was to demonstrate the kind of analysis that is possible with the openly published RAMP dataset and to encourage its use for further research. Future research holds promise of a more nuanced picture of information-seeking behaviors of users. Notes 1. RAMP website – https://rampanalytics.org 2. The higher average number of items may be due to the fact that many customers take advantage of Digital Commons’ journal publishing features. 3. Use ratio as defined here is a general indicator of how much of IR content is accessed via SERP. An analysis of Additional descriptive statistics available from (Wheeler et al., 2020a, b) demonstrate that in most cases overall use is driven by a smaller percentage of highly accessed items. IR use and performance 4. The RAMP_summary_stats__20200907.csv file available from this study’s GitHub repository (Wheeler et al., 2020a) contains summary statistics demonstrating that a majority of IR use is driven data by a few highly accessed items. In the interest of transparency, data sharing and reproducibility, the author(s) of this article have made the data underlying their research openly available. It can be accessed by following the link 333 here: https://rampanalytics.org References Almind, T.C. and Ingwersen, P. (1997), “Informetric analyses on the world wide web: methodological approaches to webometrics”, Journal of Documentation, Vol. 53 No. 4, pp. 404-426. Alphabet, Inc. (2015), Consolidated Revenues, Form 10K, United States Securities and Exchange Commission, Washington, District Columbia, available at: https://www.sec.gov/Archives/edgar/ data/1288776/000165204416000012/goog10-k2015.htm#s2A481E6E5C511C2C8AAECA5160BB 1908 (accessed 28 October 2016). Arlitsch, K. and Grant, C. (2018), “Why so many repositories? Examining the limitations and possibilities of the institutional repositories landscape”, Journal of Library Administration, Vol. 58 No. 3, pp. 264-281. Arlitsch, K., Askey, D. and Wheeler, J. (2019), “Analyzing aggregate IR use data from RAMP”, PowerPointpresented at the Open Repositories 2019, Hamburg, Germany, 11 June, doi: 10.5281/ zenodo.3243348 (accessed 9 February 2020).. Bj€orneborn, L. and Ingwersen, P. (2004), “Toward a basic framework for webometrics”, Journal of the American Society for Information Science and Technology, Vol. 55 No. 14, pp. 1216-1227. Bruns, T. and Inefuku, H.W. (2015), “Purposeful metrics: matching institutional repository metrics to purpose and audience”, in Callicott, B.B., Scherer, D. and Wesolek, A. (Eds),Making Institutional Repositories Work, Purdue University Press, West Lafayette, pp. 213-234. California Institute of Technology (2019), Caltech at a Glance, Caltech, Educational, 8 August, available at: https://www.caltech.edu/about/at-a-glance (accessed 6 October 2019). Fralinger, L. and Bull, J. (2013), “Measuring the international usage of US institutional repositories”, OCLC Systems and Services: International Digital Library Perspectives, Vol. 29 No. 3, pp. 134-150. Frost, A. (2019), “The ultimate guide to Google search console in 2019”, HubSpot, 5 September, available at: https://blog.hubspot.com/marketing/google-search-console (accessed 16 November 2019). Google, Inc. (2020), “Search console APIs”, Google Developers, available at: https://developers.google. com/webmaster-tools/search-console-api-original/ (accessed 7 January 2020). Greene, J. (2016), “Web robot detection in scholarly open access institutional repositories”, Library Hi Tech, Vol. 34 No. 3, pp. 500-520. Harnad, S. and McGovern, N. (2009), “Topic 4: institutional repository success is dependent upon mandates”, Bulletin of the American Society for Information Science and Technology, Vol. 35 No. 4, pp. 27-31. International Organization for Standardization (2020), ISO 3166 Country Codes, ISO, Non- Governmental Organization, available at: https://www.iso.org/iso-3166-country-codes.html (accessed 12 April 2020). IRUS-UK Team (2013), IRUS-UK Position Statement on the Treatment of Robots and Unusual Usage, November, available at: https://irus.jisc.ac.uk/documents/IRUS-UK_position_statement_robots_ and_unusual_usage_v1_0_Nov_2013.pdf (accessed 25 August 2019). Jisc (2019), Welcome to IRUS-UK, IRUS-UK, Government, available at: https://irus.jisc.ac.uk (accessed 11 November 2019). OIR Kim, K. (2018), “DLF-Jisc Pilot Project webinar”, 23 March, available at: https://www.diglib.org/ 45,2 recording-available-for-irus-usa-webinar/ (accessed 25 August 2019). Lagzian, F., Abrizah, A. and Wee, M.-C. (2015), “Measuring the gap between perceived importance and actual performance of institutional repositories”, Library and Information Science Research, Vol. 37 No. 2, pp. 147-155. Macgregor, G. (2019), “Improving the discoverability and web impact of open repositories: techniques and evaluation”, Code4Lib Journal, No. 43, available at: https://journal.code4lib.org/articles/ 334 14180 (accessed 25 August 2019). MacIntyre, R. and Jones, H. (2016), “IRUS-UK: improving understanding of the value and impact of institutional repositories”, The Serials Librarian, Vol. 70 Nos 1-4, pp. 100-105. McDonald, R.H. and Thomas, C. (2008), “The case for standardized reporting and assessment Requirements for institutional repositories”, Journal of Electronic Resources Librarianship, Vol. 20 No. 2, pp. 101-109. Meta Contributors (2020), List of Countries by Regional Classification, Wikimedia Meta-Wiki, Meta, discussion about Wikimedia projects, 1 April, available at: https://meta.wikimedia.org/w/index. php?title5List_of_countries_by_regional_classification&oldid519943813 (accessed 12 April 2020). Needham, P. and Stone, G. (2012), “IRUS-UK: making scholarly statistics count in UK repositories”, Insights: The UKSG Journal, Vol. 25 No. 3, pp. 262-266. Obrien, P., Arlitsch, K., Sterman, L., Mixter, J., Wheeler, J. and Borda, S. (2016), “Undercounting file downloads from institutional repositories”, Journal of Library Administration, Vol. 56 No. 7, pp. 854-874. OBrien, P., Arlitsch, K., Mixter, J., Wheeler, J. and Sterman, L.B. (2017), “RAMP – the Repository Analytics and Metrics Portal: a prototype web service that accurately counts item downloads from institutional repositories”, Library Hi Tech, Vol. 35 No. 1, pp. 144-158. Organ, M. (2006), “Download statistics – what do they tell us?: the example of research online, the open access institutional repository at the University of Wollongong, Australia”, D-Lib Magazine, Vol. 12 No. 11, doi: 10.1045/november2006-organ. Parulian, N. (2020), RAMP Data Viz, Tableau Public, Tableau Public, 9 February, available at: https:// tinyurl.com/rsxjdv8 (accessed 9 February 2020). Perrin, J.M., Yang, L., Barba, S. and Winkler, H. (2017), “All that glitters isn’t gold: the complexities of use statistics as an assessment tool for digital libraries”, The Electronic Library, Vol. 35 No. 1, pp. 185-197. Rodrigues, E., Bollini, A., Cabezas, A., Castelli, D., Carr, L., Chan, L., Humphrey, C., Johnson, R., Knoth, P., Manghi, P., Matizirofa, L., Perakakis, P., Schirrwagen, J., Selematsela, D., Shearer, K., Walk, P., Wilcox, D. and Yamaji, K. (2017), “Next generation repositories: behaviours and technical recommendations of the coar next generation repositories working group”, Zenodo. doi: 10.5281/ ZENODO.1215014. StatCounter GlobalStats (2019), Desktop vs Mobile vs Tablet Market Share Worldwide, Oct 2018–Oct 2019, StatCounter GlobalStats, October, available at: https://gs.statcounter.com/platform- market-share/desktop-mobile-tablet/worldwide (accessed 29 November 2019). Taylor, K. and Silver, L. (2019), Smartphone Ownership Is Growing Rapidly Around the World, but Not Always Equally, Pew Research Center, Washington, District Columbia, available at: https:// www.pewresearch.org/global/2019/02/05/smartphone-ownership-is-growing-rapidly-around- the-world-but-not-always-equally/ (accessed 29 November 2019). Thompson, S., Lambert, J., Macintyre, R., Chaplin, D., Jones, H., Wong, L., Perrin, J., Rubinow, S., Kim, K., Nowviskie, B., Needham, P., Williford, C. and Graham, W. (2019), “Bringing IRUS to the USA: international collaborations to standardize and assess repository usage statistics”, Proceedings of the 2018 Library Assessment Conference: Building Effective, Sustainable, Practical Assessment, 5–7 December 2018, Houston, TX, presented at the Library Assessment Conference – Building An analysis of Effective, Sustainable, Practical Assessment, Association of Research Libraries, pp. 564-577. IR use and University of Southampton (2019), Registry of Open Access Repositories, Educational, available at: performance http://roar.eprints.org (accessed 6 October 2019). data Wheeler, J., Arlitsch, K., Parulian, N. and Pham, M. (2020a), “RAMP analyses scripts, R”, available at: https://github.com/imls-measuring-up/ramp-analyses-scripts (accessed 9 May 2020). Wheeler, J., Arlitsch, K., Pham, M. and Parulian, N. (2020b), RAMP Data Subset, January 1 through 335 May 31, 2019, University of New Mexico, 14 January, doi: 10.5061/dryad.fbg79cnr0. Wikipedia Contributors (2020), ISO 3166-1 Alpha-3, Wikipedia: The Free Encyclopedia, 14 March, available at: https://en.wikipedia.org/w/index.php?title5ISO_3166-1_alpha-3&oldid5945527106 (accessed 12 April 2020). About the authors Kenning Arlitsch has been dean of the library at Montana State University since 2012. He has held positions as an instruction librarian, in digital library development, IT services and administration. His funded research has focused on SEOas well as measuring impact and use of digital repositories. Kenning holds a MLIS from the University ofWisconsin-Milwaukee and a Ph.D. in library and information science from Humboldt University in Berlin, Germany. His dissertation on Semantic Web Identity examined how well research libraries and other academic organizations are understood by search engines. Kenning Arlitsch is the corresponding author and can be contacted at: kenning.arlitsch@montana.edu Jonathan Wheeler is a Data Curation Librarian within the University of New Mexico’s College of University Libraries and Learning Sciences. Jon’s role in the Libraries’Data Services initiatives includes the development of research data ingest, packaging and archiving workflows. His research interests include workflow development in support of quality control and streamlined data storage, dissemination, archiving and preservation. Jon holds an M.S. in library science from the University of Illinois at Urbana-Champaign. Minh Thi Ngoc Pham is currently a Ph.D. candidate at the School of Information Science and Learning Technologies at the University of Missouri, Columbia. She holds a master’s degree in Globalization and Education Change from Lehigh University, Pennsylvania. Minh’s research interests include game-based learning with Virtual Reality (VR) and Augmented Reality (AR) tools and geographic information system (GIS) for research and decision-making. She was a fellow in Drexel University’s LIS Education and Data Science (LEADS) program in 2019. Nikolaus Nova Parulian is a Ph.D. student in the School of Information Sciences at the University of Illinois at Urbana-Champaign. He also holds a Master of Science in Information Management (MSIM) from the University of Illinois. His research interests include topics related to machine learning, text mining, data quality management, and network analysis. Nikolaus was a fellow in Drexel University’s LIS Education and Data Science (LEADS) program in 2019. For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: permissions@emeraldinsight.com