Scholarship & Research
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/1
Browse
14 results
Search Results
Item No Such Thing as a Free Lunch: Google Analytics and User Privacy(2016) Young, Scott W. H.; OBrien, Patrick; Benedict, KarlPatron privacy is sometimes the price we pay for free services. This trade-off is part of Google Analytics, the free web tracking tool. This session shares research into analytics implementations of DLF institutions, discusses relevant privacy issues, and offers recommendations for enhancing users' web traffic privacy through configuration and education.Item Data-Driven Improvement to Institutional Repository Discoverability and Use(Institute of Museum and Library Services, 2018-09) Arlitsch, Kenning; Kahanda, Indika; OBrien, Patrick; Shanks, Justin D.; Wheeler, JonathanThe Montana State University (MSU) Library, in partnership with the MSU School of Computing, the University of New Mexico Library and DuraSpace, seeks a $49,998 Planning Grant from the Institute of Museum and Library Services through its National Leadership Grant program under its National Digital Platform project category to develop a sustainability plan for the Repositories Analytics & Metrics Portal that will keep its dataset open and available to all researchers. The proposal also includes developing a preliminary institutional repositories (IR) reporting model; a search engine optimization (SEO) audit and remediation plan for IR; and exploring whether machine learning can improve the quality of IR content metadata.The project team expects work conducted in this planning grant to make the case for advanced research projects that will be high-impact and worthy of funding.Item Protecting privacy on the web: A study of HTTPS and Google Analytics(Emerald, 2018-09) OBrien, Patrick; Young, Scott W. H.; Arlitsch, Kenning; Benedict, KarlThe purpose of this paper is to examine the extent to which HTTPS encryption and Google Analytics services have been implemented on academic library websites and to discuss the privacy implications of free services that introduce web tracking of users. The home pages of 279 academic libraries were analyzed for the presence of HTTPS, Google Analytics services and privacy-protection features. Results indicate that HTTPS implementation on library websites is not widespread, and many libraries continue to offer non-secured connections without an automatically enforced redirect to a secure connection. Furthermore, a large majority of library websites included in the study have implemented Google Analytics and/or Google Tag Manager, yet only very few connect securely to Google via HTTPS or have implemented Google Analytics IP anonymization. Librarians are encouraged to increase awareness of this issue and take concerted and coherent action across five interrelated areas: implementing secure web protocols (HTTPS), user education, privacy policies, informed consent and risk/benefit analyses.Item RAMP - The Repository Analytics and Metrics Portal: A prototype Web service that accurately counts item downloads from institutional repositories(2016-11) OBrien, Patrick; Arlitsch, Kenning; Mixter, Jeff; Wheeler, Jonathan; Sterman, Leila B.Purpose – The purpose of this paper is to present data that begin to detail the deficiencies of log file analytics reporting methods that are commonly built into institutional repository (IR) platforms. The authors propose a new method for collecting and reporting IR item download metrics. This paper introduces a web service prototype that captures activity that current analytics methods are likely to either miss or over-report. Design/methodology/approach – Data were extracted from DSpace Solr logs of an IR and were cross-referenced with Google Analytics and Google Search Console data to directly compare Citable Content Downloads recorded by each method. Findings – This study provides evidence that log file analytics data appear to grossly over-report due to traffic from robots that are difficult to identify and screen. The study also introduces a proof-of-concept prototype that makes the research method easily accessible to IR managers who seek accurate counts of Citable Content Downloads. Research limitations/implications – The method described in this paper does not account for direct access to Citable Content Downloads that originate outside Google Search properties. Originality/value – This paper proposes that IR managers adopt a new reporting framework that classifies IR page views and download activity into three categories that communicate metrics about user activity related to the research process. It also proposes that IR managers rely on a hybrid of existing Google Services to improve reporting of Citable Content Downloads and offers a prototype web service where IR managers can test results for their repositories.Item Undercounting File Downloads from Institutional Repositories(Emerald, 2016-10) OBrien, Patrick; Arlitsch, Kenning; Sterman, Leila B.; Mixter, Jeff; Wheeler, Jonathan; Borda, SusanA primary impact metric for institutional repositories (IR) is the number of file downloads, which are commonly measured through third-party web analytics software. Google Analytics, a free service used by most academic libraries, relies on HTML page tagging to log visitor activity on Google’s servers. However, web aggregators such as Google Scholar link directly to high value content (usually PDF files), bypassing the HTML page and failing to register these direct access events. This paper presents evidence of a study of four institutions demonstrating that the majority of IR activity is not counted by page tagging web analytics software, and proposes a practical solution for significantly improving the reporting relevancy and accuracy of IR performance metrics using Google Analytics.Item Data set supporting study on Undercounting File Downloads from Institutional Repositories [dataset](Montana State University ScholarWorks, 2016-07) OBrien, Patrick; Arlitsch, Kenning; Sterman, Leila B.; Mixter, Jeff; Wheeler, Jonathan; Borda, SusanThis dataset supports the study published as “Undercounting File Downloads from IR”. The following items are included: 1. gaEvent.zip = PDF exports of Google Analytics Events reports for each IR. 2. gaItemSummaryPageViews.zip = PDF exports of Google Analytics Item Summary Page Views reports. Also, included is a Text file containing the Regular Expressions used to generate each report’s Advanced Filter. 3. gaSourceSessions.zip = PDF exports of Google Analytics Referral reports to determine the percentage of referral traffic from Google Scholar. Note: does not include Utah due to issues with the structure of Utah’s IR and configuration of their Google Analytics. 4. irDataUnderCount.tsv.zip – TSV file of complete Google Search Console data set containing the 57,087 unique URLs in 413,786 records. 5. irDataUnderCountCiteContentDownloards.tsv.zip = TSV of the Google Search Console records containing the Citable Content Download records that were not counted in google Analytics.Item Introducing the "Getting Found" Web Analytics Cookbook for Monitoring Search Engine Optimization of Digital Repositories(ISAST, 2015-12) Arlitsch, Kenning; OBrien, PatrickA new toolkit that helps libraries establish baseline measurements and continuous monitoring of the search engine optimization performance of their digital repositories is one of the products of research funded by the Institute of Museum and Library Services. The “Getting Found” Cookbook includes everything necessary for implementing a Google Analytics dashboard that continuously monitors SEO performance metrics relevant to digital repositories. While the Cookbook has been created for use with Google Analytics, the principles and practices described can be applied to any page tagging analytics software.Item Measuring Up: Assessing Accuracy of Reported Use and Impact of Digital Repositories(2014-02) Arlitsch, Kenning; OBrien, Patrick; Kyrillidou, Martha; Clark, Jason A.; Young, Scott W. H.; Mixter, Jeff; Chao, Zoe; Freels-Stendel, Brian; Stewart, CameronWe propose a research and outreach partnership that will address two issues related to more accurate assessment of digital collections and institutional repositories (IR). 1. Improve the accuracy and privacy of web analytics reporting on digital library use 2. Recommend an assessment framework and web metrics that will help evaluate digital library performance to eventually enable impact studies of IR on author citation rates and university rankings. Libraries routinely collect and report website and digital collection use statistics as part of their assessment and evaluation efforts. The numbers they collect are reported to the libraries’ own institutions, professional organizations, and/or funding agencies. Initial research by the proposed research team suggests the statistics in these reports can be grossly inaccurate, leading to a variance in numbers across the profession that makes it difficult to draw conclusions, build business cases, or engender trust. The inaccuracy runs in both directions, with under reporting numbers as much a problem as over reporting. The team is also concerned with the privacy issues inherent in the use of web analytics software and will recommend best practices to assure that user privacy is protected as much as possible while libraries gather data about use of digital repositories. Institutional Repositories have been in development for well over a decade, and many have accumulated significant mass. The business case for institutional repositories (IR) is built in part on the number of downloads of publications sustained by any individual IR. Yet, preliminary evidence demonstrates that PDF and other non-HTML file downloads in IR are often not counted because search engines like Google Scholar bypass the web analytics code that is supposed to record the download transaction. It has been theorized that Open Access IR can help increase author citation rates, which in turn may affect university rankings. However, no comprehensive studies currently exist to prove or disprove this theory. This may be due to the fact that such a study could take years to produce results due to the publication citation lifecycle and because few libraries have an assessment model in place that will help them to gather data over the long term. We plan to recommend an assessment framework that will help libraries collect data and understand root causes of unexplained errors in their web metrics. The recommendations will provide a foundation for reporting metrics relevant to outcomes based analysis and performance evaluation of digital collections and IR.Item Final Performance Report Narrative: Getting Found(2014-11) Arlitsch, Kenning; OBrien, Patrick; Godby, Jean; Mixter, Jeff; Clark, Jason A.; Young, Scott W. H.; Smith, Devon; Rossmann, Doralyn; Sterman, Leila B.; Tate, Angela; Hansen, Mary AnneThe research we proposed to IMLS in 2011 was prompted by a realization that the digital library at the University of Utah was suffering from low visitation and use. We knew that we had a problem with low visibility on the Web because search engines such as Google were not harvesting and indexing our digitized objects, but we had only a limited understanding of the reasons. We had also done enough quantitative surveys of other digital libraries to know that many libraries were suffering from this problem. IMLS funding helped us understand the reasons why library digital repositories weren’t being harvested and indexed. Thanks to IMLS funding of considerable research and application of better practices we were able to dramatically improve the indexing ratios of Utah’s digital objects in Google, and consequently the numbers of visitors to the digital collections increased. In presentations and publications we shared the practices that led to our accomplishments at Utah. The first year of the grant focused on what the research team has come to call “traditional search engine optimization,” and most of this work was carried out at the University of Utah. The final two years of the grant were conducted at Montana State University after the PI was appointed as dean of the library there. These latter two years moved more toward “Semantic Web optimization,” which includes areas of research in semantic identity, data modeling, analytics and social media optimizationItem Getting Found: Search Engine Optimization for Digital Repositories(2011-02) Arlitsch, Kenning; OBrien, PatrickLibraries and archives have been building digital repositories for over a decade, and, viewed in total, have amassed collections of considerable size. The use of the scholarly and lay content in these databases is predicated on visibility in Internet search engines, but initial surveys conducted by the University of Utah across numerous libraries and archives has revealed a disturbing reality: the number of digital objects successfully harvested and indexed by search engines from our digital repositories is abysmally low. The reasons for the poor showings in Internet search engines are complex, and are both technical and administrative. Web servers may be configured incorrectly, and may lack sufficient speed performance. Repository software may be designed or configured in a way that is difficult for crawlers to navigate. Metadata are often not unique or structured as recognizable taxonomies, and in some cases search engines prefer other schemas. Search engine policies change, and some commonly accepted standards in the library community are not being supported by some search engines. Google Scholar, for instance, has recently recommended against Dublin Core as a metadata schema in institutional repositories in favor of publishing industry schemas, a recommendation that comes as a shock to most librarians who learn of it. The problem lies less with search engines than with the content that search engines have to work with. This proposal will result in improvements to the way the content is presented so that search engines can parse, organize, and serve more relevant results to researchers and other users. The search engine market is fluid and intensely competitive. While Google retains the majority of direct search engine traffic, Bing is making progress quickly, and social media engines are changing the face of search itself, putting more emphasis on content that is popular and frequently refreshed. These changes will further affect the visibility of the content in our digital repositories, and must be investigated. With our formal partner, OCLC, Inc., and with help from informal partners the Digital Library Federation and the Mountain West Digital Library, we plan to expand our research, and then develop and publish a toolkit that will help libraries and archives make their database content more accessible and useful to search engines. The toolkit will include recommendations to web server administrators, repository software developers, and to repository managers. It will include reporting tools that will help measure and monitor effectiveness in achieving visibility in search engines, metrics that in turn will be useful to administrators in demonstrating the value proposition of their repositories. The sea of information available on the Internet is constantly growing, and library and archival content risks invisibility. We believe search engine optimization for digital repositories is a real and crucial issue that must be addressed, not only to improve our return on investment, but also to help us remain relevant in the age of electronic publishing.