A Characterization of Search Engine Results
McShane, Elizabeth ; Pannell (Faculty Mentor), John
MetadataShow full item record
Background: According to a Pew Internet Survey, 91% of online adults use some form of web search. While search engine optimization studies are commonly employed by companies to gauge their visibility in search results, few studies have been done to characterize results from the user’s perspective. We wanted to explore the impact search engine choice may have on search results by characterizing top results from several search engines. Aim: Previous research has relied on manual review of search results. Instead of taking this approach, we began developing and testing a set of tools to gather, analyze, and characterize search engine results automatically. Approach: Selenium will be used to run searches and record the top ten organic results. The URLs of the search results will be stripped down to their domains in a python-based program, then categorized using a URL Lookup API. Finally, the results will be analyzed using a python-based program. Results and Conclusions: To date, we have succeeded in gathering search results from Bing, Google, and DuckDuckGo for 50 random search terms and stripped the URLs, leaving the domains. We have also identified a service that provides website categorization, using IAB taxonomy. The development we have done so far has allowed us to identify the following targets for future development. Data Gathering: Some search engines, such as Google, proved difficult to scrape and some irregular results, such as null values, were returned. We would like to explore other methods of web scraping in addition to Selinium and develop several methods that may be able to overcome unique scraping challenges that come with different search engines. In addition, we want to expand the search engines scraped to other, lesser-known search engines. Due to time constraints, the categorization API has not been fully integrated into the program. Thus, automated API integration is another target for future development. We would also like to identify any data, such as advertisements, that we could gather while scraping search results. Data Handling and Storage: In conjunction with the automated API integration, we would like to develop code that removes already-categorized URLs before handling them off to the API for categorization. Additionally, we want to develop error handling for any unusual search results that may pass through the data collection phase. As a final feature, we would like to develop an algorithm that performs basic analysis of the search results.
Copyright Elizabeth McShane 2023