Scholarly Work - Library
Permanent URI for this collectionhttps://scholarworks.montana.edu/handle/1/320
Browse
Item An analysis of use and performance data aggregated from 35 institutional repositories(2020-11) Arlitsch, Kenning; Wheeler, Jonathan; Pham, Minh Thi Ngoc; Parulian, Nikolaus NovaPurpose This study demonstrates that aggregated data from the Repository Analytics and Metrics Portal (RAMP) have significant potential to analyze visibility and use of institutional repositories (IR) as well as potential factors affecting their use, including repository size, platform, content, device and global location. The RAMP dataset is unique and public. Design/methodology/approach The webometrics methodology was followed to aggregate and analyze use and performance data from 35 institutional repositories in seven countries that were registered with the RAMP for a five-month period in 2019. The RAMP aggregates Google Search Console (GSC) data to show IR items that surfaced in search results from all Google properties. Findings The analyses demonstrate large performance variances across IR as well as low overall use. The findings also show that device use affects search behavior, that different content types such as electronic thesis and dissertation (ETD) may affect use and that searches originating in the Global South show much higher use of mobile devices than in the Global North. Research limitations/implications The RAMP relies on GSC as its sole data source, resulting in somewhat conservative overall numbers. However, the data are also expected to be as robot free as can be hoped. Originality/value This may be the first analysis of aggregate use and performance data derived from a global set of IR, using an openly published dataset. RAMP data offer significant research potential with regard to quantifying and characterizing variances in the discoverability and use of IR content.Item Being Irrelevant: How Library Data Interchange Standards Have Kept Us Off the Internet(Routledge, 2014-10) Arlitsch, KenningConversations about the future of libraries invariably raise questions about “relevance.” One way to define relevance is to evaluate how well library “products” integrate into the popular information ecosystem, i.e. the Internet. It is in this ecosystem that libraries have struggled. To use library products our customers must deliberately move into another information ecosystem built by libraries and library vendors, when they should be able to discover and have seamless access in the ecosystem where they already conduct their business. Libraries force customers to use technological tools to which they are not accustomed, which in turn spawns an instruction mini industry.Item Committing to research: librarians and grantsmanship(Routledge, 2014-01) Arlitsch, KenningThe process of grant writing remains a mystery for many and the library literature remains surprisingly limited on the subject. Writing that first grant can be daunting, particularly if making a solo attempt. In this column I hope to make it a little less so by describing what I know about the process and the players. The description that follows is based on my experience applying for federal funding from academic libraries at research universities, but the principles are transferrable to other kinds of libraries.Item Data set supporting study on Undercounting File Downloads from Institutional Repositories [dataset](Montana State University ScholarWorks, 2016-07) OBrien, Patrick; Arlitsch, Kenning; Sterman, Leila B.; Mixter, Jeff; Wheeler, Jonathan; Borda, SusanThis dataset supports the study published as “Undercounting File Downloads from IR”. The following items are included: 1. gaEvent.zip = PDF exports of Google Analytics Events reports for each IR. 2. gaItemSummaryPageViews.zip = PDF exports of Google Analytics Item Summary Page Views reports. Also, included is a Text file containing the Regular Expressions used to generate each report’s Advanced Filter. 3. gaSourceSessions.zip = PDF exports of Google Analytics Referral reports to determine the percentage of referral traffic from Google Scholar. Note: does not include Utah due to issues with the structure of Utah’s IR and configuration of their Google Analytics. 4. irDataUnderCount.tsv.zip – TSV file of complete Google Search Console data set containing the 57,087 unique URLs in 413,786 records. 5. irDataUnderCountCiteContentDownloards.tsv.zip = TSV of the Google Search Console records containing the Citable Content Download records that were not counted in google Analytics.Item Data-Driven Improvement to Institutional Repository Discoverability and Use(Institute of Museum and Library Services, 2018-09) Arlitsch, Kenning; Kahanda, Indika; OBrien, Patrick; Shanks, Justin D.; Wheeler, JonathanThe Montana State University (MSU) Library, in partnership with the MSU School of Computing, the University of New Mexico Library and DuraSpace, seeks a $49,998 Planning Grant from the Institute of Museum and Library Services through its National Leadership Grant program under its National Digital Platform project category to develop a sustainability plan for the Repositories Analytics & Metrics Portal that will keep its dataset open and available to all researchers. The proposal also includes developing a preliminary institutional repositories (IR) reporting model; a search engine optimization (SEO) audit and remediation plan for IR; and exploring whether machine learning can improve the quality of IR content metadata.The project team expects work conducted in this planning grant to make the case for advanced research projects that will be high-impact and worthy of funding.Item Data-Intensive Science and Campus IT(EDUCAUSE, 2015-09) Sheehan, Jerry; Arlitsch, Kenning; Mannheimer, Sara; Knobel, Cory; Llovet, PolMontana State University developed the Research Data Census to engage local research communities in dialogue about their data: size, sharing resources and behaviors, and interest in services. The census confirmed the need for a tight coupling of IT infrastructure to data and curation services in order to make those resources useful to the research community.Item Dataset supporting the dissertation “Semantic Web Identity of Academic Organizations: Search engine entity recognition and the sources that influence Knowledge Graph Cards in search results”(Montana State University ScholarWorks, 2016-11) Arlitsch, KenningThis dataset supports the dissertation “Semantic Web Identity in Academic Organizations: Search engine entity recognition and the sources that influence Knowledge Graph Cards in search results,” for a Ph.D. granted by the Institut für Bibliotheks- und Informationswissenschaft (IBI), Humboldt Universität zu Berlin. This dataset contains more than 1400 screen capture files of search results conducted in Google, Google My Business, Google+, Wikipedia, DBpedia, and Wikidata. The subjects of the searches were the 125 member organizations of the Association of Research Libraries (ARL). Searches were also conducted for the eleven colleges of Montana State University and for three libraries that served as case studies. Screenshots were captured in 2015 and 2016 to support the dissertation “Semantic Web Identity for Academic Organizations.” The dataset also includes the spreadsheet file (CSV format) that was used to record results of the searches, as well as the source files with statistical analysis equations used in R.Item Demonstrating library value at network scale: leveraging the Semantic Web with new knowledge work(Routledge, 2014-08) Arlitsch, Kenning; OBrien, Patrick; Clark, Jason A.; Young, Scott W. H.; Rossmann, DoralynLibrarians may enjoy new roles as trusted facilitators who can develop effective and replicable optimization services by delivering measurable value based on metrics that matter to each organization’s leadership. The Montana State University (MSU) Library is engaged in Semantic Web research on several fronts, which we will describe in this article. Our concept of “new knowledge work” encompasses the discoverability, accessibility, and usability of content and services in the Semantic Web. In this article, we survey the following new services that libraries can offer their users and campus partners to aid discovery and understanding of resources at the network scale: 1. Establishing semantic identity for content and entities. 2. Structuring metadata for machine ingest and leveraging external search mechanisms. 3. Centralizing management of faculty activity data for efficient population of Institutional Repository (IR) and other reporting outlets. 4. Developing programmatic social media strategies to connect communities and content. 5. Advancing the role of the library as publisher to include the creation of open extensible book softwItem Describing theses and dissertations using Schema.org(Dublin Core Metadata Initiative, 2014-10) Mixter, Jeff; OBrien, Patrick; Arlitsch, KenningThis report discusses the development of an extension vocabulary for describing theses and dissertations, using Schema.org as a foundation. Instance data from the Montana State University ScholarWorks institutional repository was used to help drive and test the creation of the extension vocabulary. Once the vocabulary was developed, we used it to convert the entire ScholarWorks data sample into RDF. We then serialized a set of three RDF descriptions as RDFa and posted them online to gather statistics from Google Webmaster Tools. The study successfully demonstrated how a data model consisting of primarily Schema.org terms and supplemented with a list of granular/domain specific terms can be used to describe theses and dissertations in detail.Item Digitizing the Ivan Doig Archive at Montana State University: a rise to the challenge illustrates creative tension(Taylor & Francis, 2017-01) Arlitsch, Kenning; Hawks, Melanie; McKelvey, Hannah; Gollehon, Michelle; Zauha, JanelleThis article contextualizes the leadership concept of creative tension by describing the acquisition, processing and digitization of the Ivan Doig Archive at the Montana State University Library. The project is framed as an illustration of strategies that can generate and sustain momentum toward achieving ambitious goals while building staff confidence. Perspectives from library staff and faculty who worked on the project are included alongside the view of the dean and an external organizational development manager.Item The Espresso Book Machine: A change agent for libraries(Emerald, 2011) Arlitsch, KenningLibrary users can derive immediate benefit from a machine that prints books for them in only a few minutes. The EBM's impact on collection development in libraries may change a decades‐old model of speculative buying to one of buying on demand. The EBM can also help libraries bring high‐quality facsimiles of their unique special collections books to the public, and perhaps even generate a revenue stream that might offset costs.Item Final Performance Report Narrative: Getting Found(2014-11) Arlitsch, Kenning; OBrien, Patrick; Godby, Jean; Mixter, Jeff; Clark, Jason A.; Young, Scott W. H.; Smith, Devon; Rossmann, Doralyn; Sterman, Leila B.; Tate, Angela; Hansen, Mary AnneThe research we proposed to IMLS in 2011 was prompted by a realization that the digital library at the University of Utah was suffering from low visitation and use. We knew that we had a problem with low visibility on the Web because search engines such as Google were not harvesting and indexing our digitized objects, but we had only a limited understanding of the reasons. We had also done enough quantitative surveys of other digital libraries to know that many libraries were suffering from this problem. IMLS funding helped us understand the reasons why library digital repositories weren’t being harvested and indexed. Thanks to IMLS funding of considerable research and application of better practices we were able to dramatically improve the indexing ratios of Utah’s digital objects in Google, and consequently the numbers of visitors to the digital collections increased. In presentations and publications we shared the practices that led to our accomplishments at Utah. The first year of the grant focused on what the research team has come to call “traditional search engine optimization,” and most of this work was carried out at the University of Utah. The final two years of the grant were conducted at Montana State University after the PI was appointed as dean of the library there. These latter two years moved more toward “Semantic Web optimization,” which includes areas of research in semantic identity, data modeling, analytics and social media optimizationItem From Acquisitions to Access: The Changing Nature of Library Budgeting(Taylor & Francis, 2015-07) Rossmann, Doralyn; Arlitsch, KenningThe cost of building library collections continues to increase, forcing librarians to think differently about their budget models. Increasing costs of IT infrastructure needed to connect to information resources also adds to budget concerns. The idea of changing the emphasis of collections budgets to one of broader access is not new, but formally acknowledging the need to support local technology infrastructure and other means of access may offer a new way of promoting the collections budget to university administrators. We propose a budget model that acknowledges these broader requirements and includes concepts of surfacing and discovery, provision, creation, and acquisition.Item Getting Found: Search Engine Optimization for Digital Repositories(2011-02) Arlitsch, Kenning; OBrien, PatrickLibraries and archives have been building digital repositories for over a decade, and, viewed in total, have amassed collections of considerable size. The use of the scholarly and lay content in these databases is predicated on visibility in Internet search engines, but initial surveys conducted by the University of Utah across numerous libraries and archives has revealed a disturbing reality: the number of digital objects successfully harvested and indexed by search engines from our digital repositories is abysmally low. The reasons for the poor showings in Internet search engines are complex, and are both technical and administrative. Web servers may be configured incorrectly, and may lack sufficient speed performance. Repository software may be designed or configured in a way that is difficult for crawlers to navigate. Metadata are often not unique or structured as recognizable taxonomies, and in some cases search engines prefer other schemas. Search engine policies change, and some commonly accepted standards in the library community are not being supported by some search engines. Google Scholar, for instance, has recently recommended against Dublin Core as a metadata schema in institutional repositories in favor of publishing industry schemas, a recommendation that comes as a shock to most librarians who learn of it. The problem lies less with search engines than with the content that search engines have to work with. This proposal will result in improvements to the way the content is presented so that search engines can parse, organize, and serve more relevant results to researchers and other users. The search engine market is fluid and intensely competitive. While Google retains the majority of direct search engine traffic, Bing is making progress quickly, and social media engines are changing the face of search itself, putting more emphasis on content that is popular and frequently refreshed. These changes will further affect the visibility of the content in our digital repositories, and must be investigated. With our formal partner, OCLC, Inc., and with help from informal partners the Digital Library Federation and the Mountain West Digital Library, we plan to expand our research, and then develop and publish a toolkit that will help libraries and archives make their database content more accessible and useful to search engines. The toolkit will include recommendations to web server administrators, repository software developers, and to repository managers. It will include reporting tools that will help measure and monitor effectiveness in achieving visibility in search engines, metrics that in turn will be useful to administrators in demonstrating the value proposition of their repositories. The sea of information available on the Internet is constantly growing, and library and archival content risks invisibility. We believe search engine optimization for digital repositories is a real and crucial issue that must be addressed, not only to improve our return on investment, but also to help us remain relevant in the age of electronic publishing.Item Heeding the signals: applying Web best practices when Google recommends(Routledge, 2014-11) Askey, Dale; Arlitsch, KenningGoogle is the single largest driver of traffic to library websites and digital repositories, and librarians would do well to listen when the search giant reveals information about its practices or makes recommendations. Recently, Google announced that it would begin to favor websites that use the secure hypertext transfer protocol (HTTPS) in its search results rankings. HTTPS encrypts data transmission and one of Google’s stated reasons for this change is to help make the Web safer and minimize data theft. Similar announcements by Google have sometimes been ignored by librarians, to the peril of the visibility and use of library products and services on the Web.Item Introducing the "Getting Found" Web Analytics Cookbook for Monitoring Search Engine Optimization of Digital Repositories(ISAST, 2015-12) Arlitsch, Kenning; OBrien, PatrickA new toolkit that helps libraries establish baseline measurements and continuous monitoring of the search engine optimization performance of their digital repositories is one of the products of research funded by the Institute of Museum and Library Services. The “Getting Found” Cookbook includes everything necessary for implementing a Google Analytics dashboard that continuously monitors SEO performance metrics relevant to digital repositories. While the Cookbook has been created for use with Google Analytics, the principles and practices described can be applied to any page tagging analytics software.Item Invisible institutional repositories: addressing the low indexing ratios of IRs in Google Scholar(Emerald Group Publishing Ltd, 2012-03) Arlitsch, Kenning; OBrien, PatrickGoogle Scholar has difficulty indexing the contents of institutional repositories, and the authors hypothesize the reason is that most repositories use Dublin Core, which cannot express bibliographic citation information adequately for academic papers. Google Scholar makes specific recommendations for repositories, including the use of publishing industry metadata schemas over Dublin Core. This paper aims to test a theory that transforming metadata schemas in institutional repositories will lead to increased indexing by Google Scholar.Item Making Sense of Researcher Services(2016-03-25) Shanks, Justin D.; Arlitsch, KenningResearcher services have proliferated in recent years and numerous free or fee-based sites now promise increased visibility and impact for authors or contributors of publications and other research products. Not all services have the same goals, however, and it can be difficult to know with which services researchers should engage. In this article we establish three categories (author/researcher identification, academic/professional networking, and reference/citation management) and examine nineteen services that fit into those categories.Item Managing Search Engine Optimization: An introduction for library administrators(Routledge, 2013) Arlitsch, Kenning; OBrien, Patrick; Rossmann, BrianThis article is aimed at giving library administrators a high-level perspective of SEO so that they may be equipped to ask the right questions of their technical staff, software vendors and content suppliers. It stresses the importance of aligning SEO with institutional priorities and integrating it into the strategic plan. SEO is most effective when it is an organizational priority and when it is understood and driven by administrative teams.Item Measuring Up: Assessing Accuracy of Reported Use and Impact of Digital Repositories(2014-02) Arlitsch, Kenning; OBrien, Patrick; Kyrillidou, Martha; Clark, Jason A.; Young, Scott W. H.; Mixter, Jeff; Chao, Zoe; Freels-Stendel, Brian; Stewart, CameronWe propose a research and outreach partnership that will address two issues related to more accurate assessment of digital collections and institutional repositories (IR). 1. Improve the accuracy and privacy of web analytics reporting on digital library use 2. Recommend an assessment framework and web metrics that will help evaluate digital library performance to eventually enable impact studies of IR on author citation rates and university rankings. Libraries routinely collect and report website and digital collection use statistics as part of their assessment and evaluation efforts. The numbers they collect are reported to the libraries’ own institutions, professional organizations, and/or funding agencies. Initial research by the proposed research team suggests the statistics in these reports can be grossly inaccurate, leading to a variance in numbers across the profession that makes it difficult to draw conclusions, build business cases, or engender trust. The inaccuracy runs in both directions, with under reporting numbers as much a problem as over reporting. The team is also concerned with the privacy issues inherent in the use of web analytics software and will recommend best practices to assure that user privacy is protected as much as possible while libraries gather data about use of digital repositories. Institutional Repositories have been in development for well over a decade, and many have accumulated significant mass. The business case for institutional repositories (IR) is built in part on the number of downloads of publications sustained by any individual IR. Yet, preliminary evidence demonstrates that PDF and other non-HTML file downloads in IR are often not counted because search engines like Google Scholar bypass the web analytics code that is supposed to record the download transaction. It has been theorized that Open Access IR can help increase author citation rates, which in turn may affect university rankings. However, no comprehensive studies currently exist to prove or disprove this theory. This may be due to the fact that such a study could take years to produce results due to the publication citation lifecycle and because few libraries have an assessment model in place that will help them to gather data over the long term. We plan to recommend an assessment framework that will help libraries collect data and understand root causes of unexplained errors in their web metrics. The recommendations will provide a foundation for reporting metrics relevant to outcomes based analysis and performance evaluation of digital collections and IR.