Scholarship & Research

Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/1

Browse

Search Results

Now showing 1 - 6 of 6
  • Thumbnail Image
    Item
    RAMP - The Repository Analytics and Metrics Portal: A prototype Web service that accurately counts item downloads from institutional repositories
    (2016-11) OBrien, Patrick; Arlitsch, Kenning; Mixter, Jeff; Wheeler, Jonathan; Sterman, Leila B.
    Purpose – The purpose of this paper is to present data that begin to detail the deficiencies of log file analytics reporting methods that are commonly built into institutional repository (IR) platforms. The authors propose a new method for collecting and reporting IR item download metrics. This paper introduces a web service prototype that captures activity that current analytics methods are likely to either miss or over-report. Design/methodology/approach – Data were extracted from DSpace Solr logs of an IR and were cross-referenced with Google Analytics and Google Search Console data to directly compare Citable Content Downloads recorded by each method. Findings – This study provides evidence that log file analytics data appear to grossly over-report due to traffic from robots that are difficult to identify and screen. The study also introduces a proof-of-concept prototype that makes the research method easily accessible to IR managers who seek accurate counts of Citable Content Downloads. Research limitations/implications – The method described in this paper does not account for direct access to Citable Content Downloads that originate outside Google Search properties. Originality/value – This paper proposes that IR managers adopt a new reporting framework that classifies IR page views and download activity into three categories that communicate metrics about user activity related to the research process. It also proposes that IR managers rely on a hybrid of existing Google Services to improve reporting of Citable Content Downloads and offers a prototype web service where IR managers can test results for their repositories.
  • Thumbnail Image
    Item
    Undercounting File Downloads from Institutional Repositories
    (Emerald, 2016-10) OBrien, Patrick; Arlitsch, Kenning; Sterman, Leila B.; Mixter, Jeff; Wheeler, Jonathan; Borda, Susan
    A primary impact metric for institutional repositories (IR) is the number of file downloads, which are commonly measured through third-party web analytics software. Google Analytics, a free service used by most academic libraries, relies on HTML page tagging to log visitor activity on Google’s servers. However, web aggregators such as Google Scholar link directly to high value content (usually PDF files), bypassing the HTML page and failing to register these direct access events. This paper presents evidence of a study of four institutions demonstrating that the majority of IR activity is not counted by page tagging web analytics software, and proposes a practical solution for significantly improving the reporting relevancy and accuracy of IR performance metrics using Google Analytics.
  • Thumbnail Image
    Item
    Data set supporting study on Undercounting File Downloads from Institutional Repositories [dataset]
    (Montana State University ScholarWorks, 2016-07) OBrien, Patrick; Arlitsch, Kenning; Sterman, Leila B.; Mixter, Jeff; Wheeler, Jonathan; Borda, Susan
    This dataset supports the study published as “Undercounting File Downloads from IR”. The following items are included: 1. gaEvent.zip = PDF exports of Google Analytics Events reports for each IR. 2. gaItemSummaryPageViews.zip = PDF exports of Google Analytics Item Summary Page Views reports. Also, included is a Text file containing the Regular Expressions used to generate each report’s Advanced Filter. 3. gaSourceSessions.zip = PDF exports of Google Analytics Referral reports to determine the percentage of referral traffic from Google Scholar. Note: does not include Utah due to issues with the structure of Utah’s IR and configuration of their Google Analytics. 4. irDataUnderCount.tsv.zip – TSV file of complete Google Search Console data set containing the 57,087 unique URLs in 413,786 records. 5. irDataUnderCountCiteContentDownloards.tsv.zip = TSV of the Google Search Console records containing the Citable Content Download records that were not counted in google Analytics.
  • Thumbnail Image
    Item
    Measuring Up: Assessing Accuracy of Reported Use and Impact of Digital Repositories
    (2014-02) Arlitsch, Kenning; OBrien, Patrick; Kyrillidou, Martha; Clark, Jason A.; Young, Scott W. H.; Mixter, Jeff; Chao, Zoe; Freels-Stendel, Brian; Stewart, Cameron
    We propose a research and outreach partnership that will address two issues related to more accurate assessment of digital collections and institutional repositories (IR). 1. Improve the accuracy and privacy of web analytics reporting on digital library use 2. Recommend an assessment framework and web metrics that will help evaluate digital library performance to eventually enable impact studies of IR on author citation rates and university rankings. Libraries routinely collect and report website and digital collection use statistics as part of their assessment and evaluation efforts. The numbers they collect are reported to the libraries’ own institutions, professional organizations, and/or funding agencies. Initial research by the proposed research team suggests the statistics in these reports can be grossly inaccurate, leading to a variance in numbers across the profession that makes it difficult to draw conclusions, build business cases, or engender trust. The inaccuracy runs in both directions, with under reporting numbers as much a problem as over reporting. The team is also concerned with the privacy issues inherent in the use of web analytics software and will recommend best practices to assure that user privacy is protected as much as possible while libraries gather data about use of digital repositories. Institutional Repositories have been in development for well over a decade, and many have accumulated significant mass. The business case for institutional repositories (IR) is built in part on the number of downloads of publications sustained by any individual IR. Yet, preliminary evidence demonstrates that PDF and other non-HTML file downloads in IR are often not counted because search engines like Google Scholar bypass the web analytics code that is supposed to record the download transaction. It has been theorized that Open Access IR can help increase author citation rates, which in turn may affect university rankings. However, no comprehensive studies currently exist to prove or disprove this theory. This may be due to the fact that such a study could take years to produce results due to the publication citation lifecycle and because few libraries have an assessment model in place that will help them to gather data over the long term. We plan to recommend an assessment framework that will help libraries collect data and understand root causes of unexplained errors in their web metrics. The recommendations will provide a foundation for reporting metrics relevant to outcomes based analysis and performance evaluation of digital collections and IR.
  • Thumbnail Image
    Item
    Final Performance Report Narrative: Getting Found
    (2014-11) Arlitsch, Kenning; OBrien, Patrick; Godby, Jean; Mixter, Jeff; Clark, Jason A.; Young, Scott W. H.; Smith, Devon; Rossmann, Doralyn; Sterman, Leila B.; Tate, Angela; Hansen, Mary Anne
    The research we proposed to IMLS in 2011 was prompted by a realization that the digital library at the University of Utah was suffering from low visitation and use. We knew that we had a problem with low visibility on the Web because search engines such as Google were not harvesting and indexing our digitized objects, but we had only a limited understanding of the reasons. We had also done enough quantitative surveys of other digital libraries to know that many libraries were suffering from this problem. IMLS funding helped us understand the reasons why library digital repositories weren’t being harvested and indexed. Thanks to IMLS funding of considerable research and application of better practices we were able to dramatically improve the indexing ratios of Utah’s digital objects in Google, and consequently the numbers of visitors to the digital collections increased. In presentations and publications we shared the practices that led to our accomplishments at Utah. The first year of the grant focused on what the research team has come to call “traditional search engine optimization,” and most of this work was carried out at the University of Utah. The final two years of the grant were conducted at Montana State University after the PI was appointed as dean of the library there. These latter two years moved more toward “Semantic Web optimization,” which includes areas of research in semantic identity, data modeling, analytics and social media optimization
  • Thumbnail Image
    Item
    Describing theses and dissertations using Schema.org
    (Dublin Core Metadata Initiative, 2014-10) Mixter, Jeff; OBrien, Patrick; Arlitsch, Kenning
    This report discusses the development of an extension vocabulary for describing theses and dissertations, using Schema.org as a foundation. Instance data from the Montana State University ScholarWorks institutional repository was used to help drive and test the creation of the extension vocabulary. Once the vocabulary was developed, we used it to convert the entire ScholarWorks data sample into RDF. We then serialized a set of three RDF descriptions as RDFa and posted them online to gather statistics from Google Webmaster Tools. The study successfully demonstrated how a data model consisting of primarily Schema.org terms and supplemented with a list of granular/domain specific terms can be used to describe theses and dissertations in detail.
Copyright (c) 2002-2022, LYRASIS. All rights reserved.