This readme.txt file was generated on 2016-11-19 by Kenning Arlitsch --------------------------------------------------- GENERAL INFORMATION --------------------------------------------------- Title of Dataset: Data set supporting Ph.D. dissertation ÒSemantic Web Identity in Academic Organizations: Search engine entity recognition and the sources that influence Knowledge Graph Cards in search resultsÓ Principal Investigator Contact Information Name: Kenning Arlitsch Institution: Montana State University Address: P.O. Box 173320, MSU Library, Bozeman, MT 59717, USA Email: kenning.arlitsch@montana.edu Degree-granting institution: Institut fŸr Bibliotheks- und Informationswissenschaft (IBI) Humboldt UniversitŠt zu Berlin Address: Dorotheenstra§e 26, Berlin, Germany Date of data collection (single date, range, or approximate date): 2015-2016 Geographic location of data collection: Bozeman, MT 59717, USA Date files were created: 2016 Are there multiple versions of the dataset? No Information about funding sources that supported the collection of the data: None File Information: Filename: Arlitsch-dissertation-dataset-metadata_2016-11-19.docx Short description: Metadata required for submission of the dataset to Montana State University ScholarWorks data repository Filename: SWI-survey_2016-10-16.csv Short description: Main spreadsheet containing recorded observations for 125 Association of Research Libraries (ARL) members. 125 primary names and 94 alternate names were searched for evidence of Knowledge Graph Cards (KC) in Google search results, and for evidence of records or articles in Google My Business, Google+, Wikipedia, DBpedia, and Wikidata. Filename: SWI-survey-subset_2016-10-16.csv Short description: This smaller spreadsheet was used to run statistical analysis in R for the parent institution of each of the 125 ARL member libraries, rather than the primary and alternate names of the libraries Filename: SWI-analysis-final_2016-11-17.R Short description: R source file with equations and commands used to analyze ÒSWI-survey spreadsheetÓ file. Filename: SWI-subset-analysis-final_2016-11-17.R Short description: R source file with equations and commands used to analyze ÒSWI-survey-subsetÓ spreadsheet file. Filename: SWI-DBpedia-screenshots.zip Short description: Zipped archive containing 84 screen capture files in PNG format from DBpedia. Filename: SWI-G+-screenshots.zip Short description: Zipped archive containing 288 screen capture files in PNG format from Google+. Filename: SWI-GMB-screenshots.zip Short description: Zipped archive containing 179 screen capture files in PNG format from Google My Business. Filename: SWI-Google-search-screenshots.zip Short description: Zipped archive containing 245 screen capture files in PNG format from Google search results. Filename: SWI-Wikidata-screenshots.zip Short description: Zipped archive containing 230 screen capture files in PNG format from Wikidata. Filename: SWI-Wikipedia-screenshots.zip Short description: Zipped archive containing 223 screen capture files in PNG format from Wikipedia. Filename: SWI-Wikidata-screenshots.zip Short description: Zipped archive containing 230 screen capture files in PNG format from Wikidata. Filename: SWI-MSU-Colleges-screenshots.zip Short description: Zipped archive containing 72 screen capture files in PNG format from Google searches of eleven Montana State University colleges. Filename: SWI-casestudy-CNI-screenshots.zip Short description: Zipped archive containing 34 screen capture files in PNG format collected during case study development for the Coalition for Networked Information (CNI). Filename: SWI-casestudy-McMaster-screenshots.zip Short description: Zipped archive containing 21 screen capture files in PNG format and browser exports in PDF format, which were collected during case study development for McMaster University Libraries. Filename: SWI-casestudy-MSU-library-screenshots.zip Short description: Zipped archive containing 28 screen capture files in PNG format and browser exports in PDF format, which were collected during case study development for Montana State University Library. If data set includes multiple files related to one another, include relationship here: Screenshot files support the data recorded in the spreadsheet files. R source files contain statistical analysis commands and equations that were used to analyze the spreadsheet data. --------------------------------------------------- METHODOLOGICAL INFORMATION --------------------------------------------------- Description of methods used for collection/generation of data: The Action Research methodology guided this research. Data collection methods included screen captures of search results conducted in Google, Google My Business, Google+, Wikipedia, DBpedia, and Wikidata. Results of searches were also recorded in two spreadsheets. The Chrome web browser was used in Icognito mode for most searches. The Safari web browser was used for Google+ searches. Methods for processing the data: The R statistical software was used to analyze the data. Two R source files are included in this package. Instrument-specific information needed to interpret the data: None Standards and calibration information, if appropriate: None Environmental/experimental conditions: None Describe any quality-assurance procedures performed on the data: Data integrity checks were conducted with R to find and correct spreadsheet errors. Errors were checked against screen capture files and spreadsheets notations were adjusted accordingly. Codes or symbols used to note or characterize low quality/questionable outliers that people should be aware of: Code/symbol: None Definition: None People involved with sample collection, processing, analysis and/or submission: None --------------------------------------------------- DATA-SPECIFIC INFORMATION --------------------------------------------------- The following information applies to the two spreadsheet files included with this dataset. Column headings for tabular data: PrimORAltKC Full name: Primary or Alternate Knowledge Graph Card Definition: Google Knowledge Graph Card appeared in search results for primary or alternate names of ARL libraries. Units of measurement: Binary. Yes=1, No=0 Column headings for tabular data: ParentInstitution Full name: Parent Institution Definition: Name of the university or parent institution to which the ARL library belongs. Column headings for tabular data: ARL Library Name Full name: Association of Research Libraries Library Name Definition: The primary and alternate name (where an alternate name exists) of the ARL member library. The primary name is derived from the ARL membership directory (http://www.arl.org/membership/list-of-arl-members) and is the official name submitted by the library organizations. Column headings for tabular data: Primary Full name: Primary Definition: Column indicates in binary format (1,0) which of the names in the ARL Library Name column is defined as the primary (official) name of the library organization, as listed in the ARL membership directory (http://www.arl.org/membership/list-of-arl-members). A value of 1 indicates that the row contains the primary name; a value of 0 indicates the row does not contain the primary name. Column headings for tabular data: KC Full name: Knowledge Graph Card Definition: Column indicates whether a Google Knowledge Graph card appeared in the search results for the name of the library being searched. 0 indicates no KC was found; 1 indicates a KC was found. Column headings for tabular data: GMB Full name: Google My Business Definition: Knowledge base searched to determine whether a business had been claimed and verified for the primary or alternate name of the ARL library. 0 indicates no claimed and verified record could be found; 1 indicates a claimed and verified record was found. Column headings for tabular data: Gplus Full name: Google+ or Google Plus. Definition: Name of the knowledge base that was searched to determine whether a verified or unverified profile existed for the primary or alternate name of the library organization. In this column, 0 indicates no profile was found, 1 indicates an unverified profile was found; 2 indicates a verified profile was found. Column headings for tabular data: Wikipedia Full name: Wikipedia Ð the Free Encyclopedia Definition: Name of the knowledge base that was searched to determine whether an article had been published for the primary or alternate name of the library organization. 0 indicates no article was found; 1 indicates an article was found. Column headings for tabular data: WikipediaInfobox Full name: Wikipedia Infobox Definition: This column recorded whether a Wikipedia article existed for the primary or alternate name of the library organization being searched, and whether the article (if found) included an infobox. 0 indicates no article was found; 1 indicates an article without infobox was found; 2 indicates an article with infobox was found. Column headings for tabular data: DBpedia Full name: DBpedia Definition: Knowledge base that was searched to determine whether a structured data record had been generated from Wikipedia for the primary or alternate name of the library organization. This search was conducted on the dataset last made available by DBpedia in the spring of 2015. 0 indicates no record was found; 1 indicates a record was found. Column headings for tabular data: Wikidata Full name: Wikidata Definition: Knowledge base that was searched to determine whether a structured data record existed for the primary or alternate name of the library organization. Records that contained fewer than two populated fields were not considered viable records. 0 indicates no record was found; 1 indicates a record was found. Column headings for tabular data: AccurateKC Full name: Accurate Knowledge Graph Card Definition: This column indicates whether the KC that displayed for the primary or alternate name of the library was accurate for the library organization being searched. 0 indicates the KC was inaccurate; 1 indicates it was accurate. Column headings for tabular data: AccurateKCInst Full name: Accurate Knowledge Graph Card for the Institution Definition: The Google Knowledge Graph Card that appeared in search results was accurate for the parent institution of the library organization being searched. 0 indicates the KC was inaccurate; 1 indicates it was accurate. Column headings for tabular data: SameAs Full name: Same As Definition: When Google Knowledge Graph Cards appeared for both primary and alternate library names being searched, it was the same card that appeared, indicating that Google has a semantic understanding of the relationship of the two names to the same organization. 0 indicates that a different KC appeared for primary and alternate names; 1 indicates the same KC appeared whether the primary or alternate names were searched. Column headings for tabular data: Logo Full name: Logo or Map Definition: This column captured whether a logo appeared in the KC as an information variable. 0 indicates no logo appeared; 1 indicates a logo appeared. Column headings for tabular data: Img Full name: Image or Photograph Definition: This column captured whether an image or photograph appeared in the KC as an information variable. 0 indicates no image appeared; 1 indicates an image appeared. Column headings for tabular data: Type Full name: Type of organization Definition: This column captured whether the type of organization was indicated in the KC as an information variable. 0 no organization type was indicated; 1 an organization type was indicated. Column headings for tabular data: Appearance Full name: Appearance grouping Definition: This column categorized the information variables Logo, Img, and Type as a single group. The value for each row in the Appearance column was calculated as a product of the three variables. If any of the variables had indicated a 0 then the entire Appearance group for that name was also recorded as a 0. This grouping was created because it was observed that these three variables almost always appeared together, i.e. if one appeared then it was rare for the other two to not appear. Column headings for tabular data: Address Full name: Physical address of the organization Definition: This column captured whether an address for the library organization appeared in the KC. 0 indicates no address appeared; 1 indicates an address appeared. Column headings for tabular data: Phone Full name: Telephone number Definition: This column captured whether telephone number for the library organization appeared in the KC. 0 indicates no phone number appeared; 1 indicates a phone number appeared. Column headings for tabular data: Directions Full name: Clickable button for directions to the physical address, provided by Google Maps. Definition: This column captured whether a clickable button appeared in the KC that linked to directions to the library organization in Google Maps. 0 indicates no button appeared; 1 indicates a button appeared. Column headings for tabular data: Website Full name: Clickable button for the website Definition: This column captured whether a clickable button appeared in the KC that linked to the library organizationÕs website. 0 indicates no button appeared; 1 indicates a button appeared. Column headings for tabular data: Contact Full name: Contact grouping Definition: This column categorized the prior four information variables (Address, Phone, Directions, Website) as a single group. The value for each row in the Contact column was calculated as a product of the four variables. If any of the variables had indicated a 0 then the entire Contact group for that name was also recorded as a 0. This grouping was created because it was observed that these three variables almost always appeared together, i.e. if one appeared then it was rare for the other two to not appear. Column headings for tabular data: Hours Full name: Operating hours of the library organization Definition: This column captured whether the operating hours appeared in the KC for the library organization. 0 indicates no button appeared; 1 indicates a button appeared. While this information was collected, it was discarded from the statistical analysis because the appearance of hours on the KC was too variable and thus did not seem to fit with the Contact group. Column headings for tabular data: Description Full name: Textual description field on the KC Definition: This column captured whether a brief textual description about the library organization appeared on the KC. 0 indicates no description appeared; 1 indicates a description appeared. This information variable became a group of one as Google explicitly indicates its source as Wikipedia. Column headings for tabular data: Comment Full name: Comment Definition: This column captured free text notes and observations made during data collection. --------------------------------------------------- SHARING/ACCESS INFORMATION --------------------------------------------------- Licenses/restrictions placed on the data: CC BY 4.0 https://creativecommons.org/licenses/by/4.0/ This data set is published from the United States. --------------------------------------------------- CREDITS --------------------------------------------------- Based on a template by University of Minnesota Libraries: http://lib.umn.edu/datamanagement