A Characterization of Search Engine Results

dc.contributor.advisorJohn Pannellen
dc.contributor.authorMcShane, Elizabeth ; Pannell (Faculty Mentor), Johnen
dc.date.accessioned2023-08-17T17:12:27Z
dc.date.available2023-08-17T17:12:27Z
dc.date.issued2022en
dc.descriptionCopyright Elizabeth McShane 2023en_US
dc.description.abstractBackground: According to a Pew Internet Survey, 91% of online adults use some form of web search. While search engine optimization studies are commonly employed by companies to gauge their visibility in search results, few studies have been done to characterize results from the user’s perspective. We wanted to explore the impact search engine choice may have on search results by characterizing top results from several search engines. Aim: Previous research has relied on manual review of search results. Instead of taking this approach, we began developing and testing a set of tools to gather, analyze, and characterize search engine results automatically. Approach: Selenium will be used to run searches and record the top ten organic results. The URLs of the search results will be stripped down to their domains in a python-based program, then categorized using a URL Lookup API. Finally, the results will be analyzed using a python-based program. Results and Conclusions: To date, we have succeeded in gathering search results from Bing, Google, and DuckDuckGo for 50 random search terms and stripped the URLs, leaving the domains. We have also identified a service that provides website categorization, using IAB taxonomy. The development we have done so far has allowed us to identify the following targets for future development. Data Gathering: Some search engines, such as Google, proved difficult to scrape and some irregular results, such as null values, were returned. We would like to explore other methods of web scraping in addition to Selinium and develop several methods that may be able to overcome unique scraping challenges that come with different search engines. In addition, we want to expand the search engines scraped to other, lesser-known search engines. Due to time constraints, the categorization API has not been fully integrated into the program. Thus, automated API integration is another target for future development. We would also like to identify any data, such as advertisements, that we could gather while scraping search results. Data Handling and Storage: In conjunction with the automated API integration, we would like to develop code that removes already-categorized URLs before handling them off to the API for categorization. Additionally, we want to develop error handling for any unusual search results that may pass through the data collection phase. As a final feature, we would like to develop an algorithm that performs basic analysis of the search results.en_US
dc.description.abstracten
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/18078en
dc.language.isoen_USen_US
dc.language.isoenen
dc.publisherMontana State University Billingsen_US
dc.publisherMontana State University - Billingsen
dc.rightsCopyright Elizabeth McShane 2023en_US
dc.rights.holderCopyright 2023 by Elizabeth McShaneen
dc.subjectsearch engineen_US
dc.subjectsearch engine resultsen_US
dc.subject.lcshAutomated methoden
dc.subject.lcshSearch engine resultsen
dc.titleA Characterization of Search Engine Resultsen
dc.typePosteren
mus.citation.conferenceResearch, Creativity & Community Involvement Conferenceen_US
mus.citation.extentfirstpage1en_US
mus.relation.collegeCollege of Letters & Scienceen_US
mus.relation.departmentMathematical Sciences.en_US
thesis.degree.genrePosteren
thesis.format.extentfirstpage1en

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
McShane_poster.pdf
Size:
955.2 KB
Format:
Adobe Portable Document Format
Description:
McShane-characterization-poster

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.