Final Performance Report Narrative: Getting Found Authors: Kenning Arlitsch, Patrick OBrien, Jean Godby, Jeff Mixter, Jason A. Clark, Scott W.H. Young, Devon Smith, Doralyn Rossmann, Leila Sterman, Angela Tate, & Mary Anne Hansen This is the final report narrative from November 30, 2014. Made available through Montana State University’s ScholarWorks scholarworks.montana.edu 2   Final  Performance  Report  Narrative   “Getting  Found:  Search  Engine  Optimization  for  Digital  Repositories”   IMLS  Award  Number:  LG-­‐07-­‐11-­‐0345-­‐11   November  1,  2011  –  October  31,  2014   PI:  Kenning  Arlitsch,  Dean  of  the  Library,  Montana  State  University   Co-­‐PI:  Patrick  OBrien,  Semantic  Web  Research  Director,  Montana  State  University   Key  Personnel:     Jean  Godby,  Senior  Research  Scientist,  OCLC  Research   Jason  A.  Clark,  Head  of  Library  Informatics  &  Computing,  Montana  State  University   Scott  W.H.  Young,  Digital  Initiatives  Librarian,  Montana  State  University   Jeff  Mixter,  Research  Support  Specialist,  OCLC  Research   Other  contributors:     Devon  Smith,  Consulting  Software  Engineer,  OCLC  Research   Doralyn  Rossmann,  Head  of  Collection  Development,  Montana  State  University   Leila  Sterman,  Scholarly  Communication  Librarian,  Montana  State  University   Angela  Tate,  Program  Coordinator,  Montana  State  University   Mary  Anne  Hansen,  Research  Commons  Librarian,  Montana  State  University     Project  Goals  and  Objectives   Goals  and  objectives  from  the  proposal  aligned  along  three  general  tracks:   1. Expand  search  engine  optimization  (SEO)  research. 2. Make  recommendations  for  SEO  and  publish  these  as  a  toolkit.  The  toolkit  will include  metadata  transformation  mechanisms  and  tools  for  monitoring  and reporting. 3. Disseminate  findings  and  provide  training  to  the  community. Summary  of  Accomplishments   • Achieved  measurable  improvements  to  visibility  of  digital  repositories • Created  SEO  Toolkit  to  measure  and  monitor  digital  repository  performance • Developed  a  data  model  for  institutional  repositories  using  Schema.org • Developed  a  citation  matching  process  for  structured  metadata  for  IR • Tested  Social  Media  Optimization  techniques • Conducted  research  resulting  in  new  directions  and  future  deliverables • Communicated  achievements  in  numerous  publications  and  presentations   3     THE  FOLLOWING  PAGES  DESCRIBE  ACCOMPLISHMENTS  FOR  ALL  THREE  YEARS  OF  THE  GRANT.   Introduction   The  research  we  proposed  to  IMLS  in  2011  was  prompted  by  a  realization  that  the   digital  library  at  the  University  of  Utah  was  suffering  from  low  visitation  and  use.  We   knew  that  we  had  a  problem  with  low  visibility  on  the  Web  because  search  engines   such  as  Google  were  not  harvesting  and  indexing  our  digitized  objects,  but  we  had   only  a  limited  understanding  of  the  reasons.  We  had  also  done  enough  quantitative   surveys  of  other  digital  libraries  to  know  that  many  libraries  were  suffering  from  this   problem.     IMLS  funding  helped  us  understand  the  reasons  why  library  digital  repositories   weren’t  being  harvested  and  indexed.  Thanks  to  IMLS  funding  of  considerable   research  and  application  of  better  practices  we  were  able  to  dramatically  improve  the   indexing  ratios  of  Utah’s  digital  objects  in  Google,  and  consequently  the  numbers  of   visitors  to  the  digital  collections  increased.  In  presentations  and  publications  we   shared  the  practices  that  led  to  our  accomplishments  at  Utah.   The  first  year  of  the  grant  focused  on  what  the  research  team  has  come  to  call   “traditional  search  engine  optimization,”  and  most  of  this  work  was  carried  out  at  the   University  of  Utah.  The  final  two  years  of  the  grant  were  conducted  at  Montana  State   University  after  the  PI  was  appointed  as  dean  of  the  library  there.    These  latter  two   years  moved  more  toward  “Semantic  Web  optimization,”  which  includes  areas  of   research  in  semantic  identity,  data  modeling,  analytics  and  social  media  optimization.   Deliverables   Deliverable  #1:  SEO  Toolkit    (Getting  Found  Web  Analytics  Toolkit)   We  developed  a  toolkit  to  help  libraries  establish  baseline  measurements  of  the  SEO   performance  of  their  digital  repositories.    The  toolkit  includes   everything  necessary  for  implementing  a  Google  Analytics  dashboard  that   continuously  monitors  SEO  performance  metrics  relevant  to  digital  repositories.    The   toolkit  will  soon  be  made  available  in  a  Council  on  Library  and  Information  Resources   (CLIR)  Report  and  on  its  Vimeo  channel.     Baseline  Measurements   The  toolkit  offers  a  process  to  inventory  and  coordinate  hardware  systems,  online   services  such  as  institutional  Facebook  accounts,  and  web  properties  such  as  various   digital  repositories  or  collections.    It  helps  identify  stakeholders  as  well  as  the  various   roles  people  play  and  the  skills  required  to  implement  and  maintain  a  library  SEO   effort.     4   Dashboard   Based  on  Google  Analytics,  the  dashboard  allows  library  staff  and  administrators  to   monitor  common  baseline  performance  metrics  relevant  to  library  repositories.    It   further  enables  administrators  to  ask  pertinent  questions  when  changes  are   observed.    For  instance,  a  sudden  drop  in  the  number  of  items  indexed  by  Google  in  a   given  collection  could  prompt  an  administrator  to  question  the  appropriate  manager   or  staff  person  responsible  for  that  collection.   Toolkit  Abstract   Implemented  and  managed  correctly,  Web  Analytics  is  an  invaluable  tool  for  helping   library  administrators  and  staff  make  informed  decisions  about  internal  resource   allocations  and  demonstrate  the  value  of  library  services  to  external  stakeholders.           Our  goal  for  this  "Cookbook"1  is  to  provide  a  series  of  basic  recipes  to  integrate  the   necessary  analytics  for  consistently  evaluating  and  monitoring  both  general  and   specific  library  objectives  as  a  single  library  institution.  Each  recipe  includes  a  list  of   “ingredients,”  such  as  emails,  forms,  spreadsheets,  dashboards,  and  other  information   necessary  to  implement  a  baseline  SEO  metrics  relevant  to  library  administrators  and   collection  managers.    Each  recipe  also  includes  short  “How  To”  videos  that  cover  the   following  areas.     1. How  to  use  the  Getting  Found  Web  Analytics  Cookbook   2. How  to  Start  an  SEO  Program   3. How  to  create  a  Master  Google  Account  for  the  Library  Institution.   a. How  to  Improve  Google  Account  Security   4. How  to  Activate,  Configure  and  Maintain   a. Google  Analytics  v3  Web  Service   b. Google  Webmaster  Tools  Web  Service     These  first  recipes  are  critical  and  form  the  foundation  for  establishing  the   organization  as  a  single  semantic  entity  in  the  Linked  Open  Data  cloud  with  baseline   analytics  relevant  to  a  diverse  library  customer  base.    Google  Analytics  can  be  set  up   and  maintained  by  one  or  more  individuals,  depending  on  the  organization’s  goals   and  resource  constraints.    To  improve  flexibility  and  accommodate  more  libraries,  we   have  defined  library  SEO  Roles  that  most  library  organizations  considering  Web   Analytics  will  have  already,  formally  or  informally,  incorporated  into  one  or  more   individuals’  job  responsibilities.     We  recommend  the  “kitchen”  have  a  single  individual  accountable  as  the  Analytics   Program  Lead.    This  person  should  know  the  customers  who  want  web  analytics   information,  and  he/she  should  have  the  necessary  backing  from  library   Administrator  Sponsorship  to  set  up  and  run  a  Web  Analytics  kitchen.  This  program                                                                                                                   1  Google’s  web  services  are  a  complex  set  of  systems  designed  to  collect  and  report  information  that  improves   marketing  insight  and  business  profitability.    The  majority  of  Google  features  and  capabilities  require  time,   resources  and  a  level  of  technical  marketing  sophistication  that  most  libraries  do  not  have.    We  have  used  a   “cookbook”  metaphor  to  aid  communication  with  typical  library  staff.       5   lead  will  interpret  and  modify  the  recipes  provided  in  the  Getting  Found  (GF)  "Web   Analytics  Cookbook"  to  develop  processes  to  configure  the  kitchen  and  build  a  basic   Analytics  menu  that  works  locally.    The  GF  "Cookbook"  includes  recipes  for  producing   solutions  that  we  believe  should  be  found  on  every  library  Analytics  menu.   Deliverable  #2:  IR  data  model   In  2014  we  developed  a  model  for  institutional  repositories  (IR)  that  focuses  on   modeling  the  various  types  of  materials  typically  found  in  IR  as  well  as  the  various   types  of  entities  that  are  related  to  the  materials.    We  used  popular  vocabularies   (Schema.org)  to  model  high-­‐level  entities  and  relationships  to  develop  the  Resource   Description  Framework  (RDF)  ontology.  Using  existing  Schema.org  vocabularies   helps  ensure  interoperability  of  the  model  both  within  the  Library  Science  domain,   and  more  importantly,  on  the  Web.     The  model  was  developed  with  instance  data  from  the  Montana  State  University   ScholarWorks  IR  as  reference.    Using  the  sample  data,  we  developed  a  rich  vocabulary   that  allows  for  materials,  people,  universities,  colleges,  and  departments  to  be   identified  and  connected  in  semantically  meaningful  ways.    This  model  is  currently   being  tested  against  the  ScholarWorks  dataset.  The  RDF  data  that  is  generated  by  the   model  will  then  be  used  to  conduct  sample  SPARQL  queries  that  can  demonstrate   how  RDF  data  can  be  used  to  extract  unique  and  important  analytics.    These  analytics   can  help  university  administrators  highlight  academic  strengths  as  well  as  identify   areas  where  future  research  and  collaboration  can  be  most  beneficial.       The  need  to  make  IR  more  visible  is  a  direct  reflection  and  reaction  to  the  changing   ways  in  which  people  search  for  and  interact  with  library  resources.    The  effort  to  use   Schema.org  as  the  primary  vocabulary  helped  us  align  the  Montana  ScholarWorks  IR   metadata  with  popular  and  web-­‐focused  vocabulary  standards  and  also  provided  us   with  a  data  model  that  is  understood  and  consumed  by  the  major  search  engines.    We   were  able  to  successfully  map  the  theses  and  dissertations  metadata  into  Schema.org   and,  when  needed,  supplement  existing  Dublin  Core  fields  with  terms  we  created  as   part  of  an  extension  vocabulary  for  Schema.org.    The  extension  terms  followed  the   same  standards  and  practices  as  those  in  Schema.org  and  every  attempt  was  made  to   position  extension  terms  as  sub-­‐classes  or  sub-­‐properties  of  existing  Schema.org   terms.  After  we  finished  developing  the  extension  vocabulary  we  successfully   demonstrated  how  to:     1. Integrate  the  vocabulary  into  the  existing  metadata     2. Use  OpenRefine  to  clean-­‐up  and  add  semantic  richness  to  the  data.     The  modeling  and  its  implementation  using  existing  metadata  served  as  a  proof  of   concept  that  this  type  of  work  is  possible  and  is  consumed  by  commercial  search   engines.  Google  has  crawled  and  indexed  the  pages,  and  it  is  recognizes  the   structured  data  that  is  encoded  in  RDFa.     To  test  the  RDFa  markup,  the  sample  pages  were  run  against  both  the  Google  Rich   Snippets  Tool  and  the  W3C  RDFa  Validator.  It  demonstrated  that  Schema.org  can  be     6   used  to  describe  institutional  repository  content  with  the  precision  necessary  for  a   subject  matter  expert  to  find  what  they  seek,  and  at  the  same  time  provide  a  layer  of   abstraction  that  allows  commercial  search  engines  to  help  a  novice  discover  new   information.    This  evidence  supports  the  efforts  already  conducted  by  OCLC  to  use   Schema.org  markup  to  describe  materials  accessible  from  WorldCat.org.    The  use  of   Schema.org  not  only  provides  a  gateway  to  search  engine  consumption  and  indexing   but  also  provides  a  way  for  libraries  to  easily  and  effortlessly  share  their  data  on  the   Web  with  groups  and  organizations  outside  of  the  library  domain.    Conversely,   adopting  a  Schema.org  vocabulary  will  also  make  it  much  easier  for  libraries  to  use   and  leverage  data  published  outside  of  their  domain  of  practice.    Using  Schema.org   vocabulary  will  help  library  organizations  increase  the  amount  of  structured  data  that   they  share  with  search  engines.  Additionally,  the  use  of  Schema.org  will  allow   organizations  to  connect  isolated  data  silos  to  the  rapidly  expanding  scope  of  the   Semantic  Web  to  improve  discovery,  access  and  value  of  their  intellectual  output.       Deliverable  #3:  Citation  Parsing  and  Matching  Process   The  problem  faced  by  many  IR  is  that  their  rich  metadata  is  locked  in  single  string   citations.    This  causes  problems  when  there  is  a  need  to  provide  data  aggregators   such  as  Google  Scholar  with  metadata  that  is  parsed  into  specific  fields,  such  as   author,  article  title,  date,  volume,  issue  number,  etc.  In  order  to  make  this  metadata   more  accessible  and  consumable  by  aggregators,  a  method  is  needed  for  converting   string  citations  into  parsed  metadata  fields.   In  our  2012  journal  article  titled  “Invisible  Institutional  Repositories:  Addressing  the   Low  Indexing  Ratios  of  IRs  in  Google  Scholar”  we  demonstrated  that  IR  are  not  being   harvested  and  indexed  by  search  engines  very  well,  and  in  particularly  not  by   academic  search  engines  like  Google  Scholar.    We  showed  in  our  paper  that  one  of  the   main  reasons  for  this  problem  is  that  the  metadata  schema  used  by  many  IR  is  Dublin   Core  (DC),  and  because  DC  does  not  have  a  field  for  each  part  of  a  citation  Google   Scholar  guidelines  state  that  Dublin  Core  should  be  used  “only  as  a  last  resort.”     Scholarly  search  engines,  (i.e.,  Google  Scholar,  Microsoft  Academic  Search,  etc.)  must   be  able  to  parse  a  citation  and  deliver  it  in  any  style  users  want,  and  since  citations  in   IR  metadata  tend  to  be  entered  as  text  blocks  in  the  dc:source  or  dc:citation  field   search  engines  can’t  read  or  comprehend  the  textual  strings.   We  demonstrated  through  a  pilot  study  that  using  metadata  schema  recommended   by  Google  Scholar  (Highwire  Press,  PRISM,  ePrints,  or  BePress)  results  in  greatly   improved  harvesting  and  indexing.    The  problem  now  is  that  IR  administrators  can’t   easily  convert  their  Dublin  Core  metadata  to  one  of  the  above-­‐mentioned  schema,  and   there  is  currently  no  automated  method  for  doing  that  conversion,  which  in  turn   means  that  there  will  likely  be  no  large-­‐scale  improvement.    To  help  remedy  this   problem  we  have  been  working  on  automated  citation  parsing  processes  using  open   source  software.    To  date  we  have  developed  a  prototype  methodology  that   incorporates  citation  matching  and  metadata  extraction  from  OCLC  MARC  records.   This  methodology  could  bring  the  citation  conversion  problem  for  most  IR  down  to  a   manageable  number  that  have  to  be  converted  manually.         7   Methodologies:       Citation  Parsing   The  initial  idea  for  converting  string  citations  into  individual  metadata  fields  was  to   use  a  citation  parsing  service.    There  are  a  variety  of  open  source  services  available   that  use  entity  recognition  algorithms  to  parse  citations  and  convert  them  into  fielded   metadata.  For  the  purposes  of  this  study  we  chose  to  evaluate  three  different  citation   parsing  services.    Each  of  the  services  was  tested  and  then  the  best  was  chosen  for  a   production  style  test  of  converting  750  citations  compiled  from  Montana  State   University  faculty  resumes.  The  citation  service  was  evaluated  based  on  how  many   citations  were  parsed  and  then  a  second  phase  of  evaluation  was  conducted  to  test   the  completeness  and  accuracy  of  the  parsing  service.   Citation  Matching   After  the  citation  parsing  service  was  tested,  a  second  method  for  citation  parsing   was  developed  and  evaluated.    This  method  involved  matching  the  citation  to  a   MARC21  record  and  then  extracting  metadata  from  the  record  to  produce  fielded   metadata.    The  algorithm  matched  strings  within  the  citation  to  strings  in  the   MARC21  record.    A  combination  of  OCLC  databases  was  used  to  compile  a   comprehensive  set  of  records  to  match  against.  The  set  included  traditional   bibliographic  records,  article  records  and  records  from  institutional  repositories,   harvested  through  the  OCLC  Digital  Collections  Gateway.  Once  the  string  citations   were  matched  to  a  specific  OCLC  record,  the  MARC21  record  was  then  used  to  create   a  fielded  metadata  record  for  the  item.  The  services  were  tested  using  a  set  of  100   citations  from  the  University  of  Utah.    In  order  to  evaluate  the  citation  matching   methodology,  two  metrics  were  tested.    The  first  was  how  accurately  the  matching   algorithm  worked.  The  second  round  of  evaluation  looked  at  the  quality  of  the   extracted  metadata  after  a  match  was  determined.   Results   The  initial  results  of  the  citation  parsing  services  were  encouraging  but  upon  further   review,  we  found  numerous  errors  that  required  manual  correction.  Of  the  750   citations  that  were  run  against  the  FreeCite  service,  457  were  positively  identified  as   having  been  parsed.    The  additional  293  were  identified  as  not  parsed  and   consequently  the  fielded  metadata  was  only  a  best  guess.  Although  this  initial  statistic   seemed  to  indicate  that  ~61%  of  the  citations  were  accurately  parsed,  we  decided  to   conduct  a  more  detailed  analysis  of  the  fielded  output  in  order  to  make  an  accurate   judgment  of  the  service.  Two  random  samples  of  100  citations  were  selected  for   meticulous  manual  review.  With  the  first  set,  we  asked  an  IR  cataloger  from  Montana   State  University  to  manually  parse  the  citations  into  fielded  metadata.  We  then  used   the  parsed  citations  to  generate  metadata  for  the  second  set  of  100  and  asked  the   same  cataloger  the  go  through  and  clean  up/correct  the  parsed  data.  The  manual   parsing  took  13  hours  while  the  time  it  took  to  clean  up  the  algorithmically  parsed   data  was  20  hours.  This  suggested  that  it  is  more  time  intensive  to  clean  up  metadata   generated  by  the  parsing  service  then  it  is  to  simply  create  the  metadata  by  hand.   The  citation  matching  and  extraction  process  was  successful  and  demonstrated  how   fielded  metadata  could  be  assembled  from  citations  without  having  to  rely  on  the  use     8   of  a  citation  parsing  algorithm.  We  are  expecting  that  improvements  to  the  algorithms   will  help  increase  both  matching  accuracy  and  metadata  extraction.  The  graph  below   shows  the  frequency  of  extracting  metadata  for  the  title,  journal,  author,  volume,  date,   issue  and  page  fields.     The  difficulty  in  parsing  the  773/g  (Host  Item  Entry/Related  parts)  field  caused  the   frequency  of  volume,  date,  issue  and  pages  to  be  slightly  lower  than  the  other  fields.   We  believe  that  refining  the  extraction  algorithm  will  help  improve  the  recall  of  the   data  found  in  the  773/g.  Expanding  the  extraction  from  one  matched  record  to  all  of   the  records  found  in  the  matched  Work  cluster  should  also  help  improve  the  recall  of   fielded  metadata.  We  also  hypothesize  that  extracting  over  multiple  records  will  help   improve  the  quality/accuracy  of  the  fielded  metadata.   In  addition  to  evaluating  the  total  number  of  fields  that  were  extracted  we  also  had  an   interest  in  understanding  how  representative  the  extracted  metadata  was  of  the   original  citation.  This  statistic  can  be  used  to  better  understand  how  much  structured   data  was  mined  from  the  matched  record.  In  order  to  understand  the  coverage  of   extracted  data,  the  resulting  fielded  metadata  was  compared  to  the  original  citation   and  a  coverage  percentage  was  calculated.  The  table  below  illustrates  the  results.     9     Although  there  were  very  few  that  had  90%  -­‐  100%  coverage,  it  was  very  promising   that  nearly  half  of  the  test  records  had  >  60%  coverage.  We  think  that  making   changes  to  the  extraction  code  and  including  multiple  records  in  the  extraction   process  will  help  increase  the  coverage.   Conclusion   The  process  of  converting  a  string  citation  into  fielded  metadata  is  a  challenge  that   many  institutional  repositories  face.  Without  fielded  metadata,  institutional   repositories  are  not  able  to  make  their  metadata  visible  to  search  engines  (either   traditional  search  engines  such  as  Google  or  academic  search  engines  such  as  Google   Scholar).  This  study  evaluated  two  methods  for  converting  string  citations  into   parsed  metadata.  Traditional  citation  parsing  worked  reasonably  well  but  the  time   required  to  clean  up  the  parsed  data  far  exceeded  the  amount  of  time  it  would  take  to   manually  create  the  fielded  metadata  from  a  citation.  This  suggests  that  using  open   source  parsing  and  manual  review  is  not  a  scalable  option  for  most  institutional   repositories.  The  second  method  for  creating  metadata  was  to  match  the  citation   against  an  OCLC  MARC  record  and  then  extract  fielded  metadata  from  specific  MARC   tags.  This  method  resulted  in  well-­‐structured  metadata.  In  order  to  better  compare   the  two  methods  more  comprehensive  testing  of  the  citation  matching  process  will   need  to  be  conducted.  After  the  algorithms  for  both  the  citation  matching  and   metadata  extraction  are  improved  a  second  round  of  testing  can  be  conducted  and   the  results  can  be  more  accurately  compared  to  those  of  the  citation  parsing  method.     This  process  may  soon  be  offered  as  a  service  for  libraries  that  wish  to  structure  their   IR  metadata  for  versatility  and  improved  machine  readability.  Harvesting  and   indexing  by  Google  Scholar  is  an  immediate  confirmed  result  of  such  structure.     10     Deliverable  #4:  SEO  Improvements  at  University  of  Utah  and  MSU   On  October  27,  2011  Google  Scholar  had  indexed  less  than  1%  (422  items)  of  the   University  of  Utah's  8,000+  scholarly  papers  housed  in  its  open  access  IR,  known  as   USpace.    As  of  November  26,  2014  that  indexing  ratio  had  increased  to  approximately   48%  (4,960  items)  of  the  scholarly  papers  in  USpace  due  to  changes  we  implemented   during  work  with  the  repository  managers.    Google  Scholar  has  indexed  ~2,160   additional  digital  collection  items  as  scholarly  works.    For  example,  digital  items  from   The  Neuro-­‐Ophthalmology  Virtual  Education  Library  (NOVEL)  hosted  by  the   University  of  Utah.       The  result  is  that  ~7,120  full  text  "scholarly  papers”  in  PDF  format  are  visible,   accessible  and  free  to  the  public.    Without  the  SEO  effort  these  papers  would  be  more   difficult  to  find  or  require  a  fee  to  access.   At  Montana  State  University  Library  we  implemented  the  recommendations  from  our   paper  "Invisible  institutional  repositories:  Addressing  the  low  indexing  ratios  of  IRs   in  Google  Scholar"  and  have  achieved  nearly  100%  indexing  by  Google  Scholar  (i.e.,   ~2,240  of  2,249  items).   • Other  improvements  at  University  of  Utah   o Increased  Google  Index  Ratio  among  all  digital  collections  from  an  average   of  12%  to  80%   o Increased  Google  Index  Ratio  of  USpace  (Utah’s  institutional  repository)   from  13%  to  98%     o Increased  referrals  from  Google  domains  by  500%     o Increased  visitors  to  digital  collections  by  130%     o Implemented  scalable  tools  and  repeatable  processes     § Developed  search  engine  friendly  content  sitemaps     § Institutionalized  issue  monitoring  with  Webmaster  Tools   § Optimized  server  configurations  for  search  engines   § Transformed  IR  metadata  and  reloaded     § Implemented  Google  Analytics  to  capture  visitation  traffic  across  all   Utah  library  domains  (previous  approach  was  siloed).     § Influenced  vendor  product  development     • OCLC’s  CONTENTdm  digital  asset  management  software   • Ex  Libris’  Primo  discovery  layer  and  Rosetta  software     Deliverable  #5:  Influencing  standards  used  by  search  engines   Our  early  research  on  IR  indexing  by  Google  and  Google  Scholar  helped  identify  the     11   need  to  extend  Schema.org  for  bibliographic  citations.    On  September  13,  2011  the   W3C  established  the  Schema  Bib  Extend  group  "for  extending  Schema.org  schemas   for  the  improved  representation  of  bibliographic  information  markup  and  sharing.”2   The  W3C  Schema  Bib  Extend  group  was  led  by  OCLC  staff  with  whome  we  worked   closely  to  help  establish  requirements.    We  also  shared  our  efforts  from  Deliverable   #3  (IR  Data  Modeling)  to  help  influence  solutions  for  extending  Schema.org  for   bibliographic  citations.    Schema.org  officially  adopted  the  W3C  Schema  Bib   Extend  proposal  for  improving  bibliographic  citations  and  released  Schema.org   version  1.9  on  August  8,  20143   Ongoing  Research   While  most  of  the  deliverables  described  above  developed  from  our  research  goals,   we  have  uncovered  additional  areas  of  research  that  are  incomplete  but  may  lead  to   additional  deliverables  that  other  libraries  will  find  useful.     Web  Analytics   We  have  examined  a  sample  of  visitation  and  use  statistics  that  libraries  report  for   their  websites,  digital  collections,  and  particularly,  their  institutional  repositories.     We  have  evidence  that  suggests  libraries  are  both  over-­‐counting  and  undercounting   visits  to  and  downloads  from  their  IR  due  to  inappropriate  configuration  of  web   analytics  software.       For  example,  by  analyzing  a  five-­‐day  period  of  log  file  and  Web  analytics  data  from   the  USpace  IR  at  the  University  of  Utah  we  determined  that  Utah’s  Google  Analytics   did  not  count  at  least  125  unique  Google  Scholar  users  who  downloaded  at  least  200   scholarly  papers  in  PDF  format.  Given  what  we  know  about  the  cyclical  usage  of   library  digital  assets,  our  best  estimate  is  that  USpace  is  failing  to  record  between   8,000  and  11,000  PDF  downloads  annually  –  with  the  caveat  that  our  estimates  are   based  upon  a  very  small  data  sample.    These  undercounted  PDF  downloads  are  only   from  Google  Scholar  visitors;  other  file  types  and  visitors  referred  from  other  search   engines  must  also  be  considered.    We  also  verified  all  the  major  Web  analytics   services  would  not  capture  this  high  value  metric,  i.e.  direct  downloads  of  open  access   IR  PDF  files  by  Google  Scholar.     Over-­‐counting  visits  can  be  a  problem,  too,  if  libraries  are  using  inappropriately   configured  log  file  analysis  methods  and  are  not  filtering  out  known  crawlers,  spiders   and  scrapers,  or  the  countless  unknown  usage  that  is  clearly  non-­‐human  behavior   (e.g.,  requesting  three,  or  more,  web  pages  at  the  same  time).   This  phenomenon  can  lead  to  grossly  inaccurate  reporting  to  institutional   administrators,  funding  organizations  like  IMLS,  and  governance  or  association   bodies.  We  think  we  understand  some  of  the  reasons  why  this  problem  occurs  and  we   believe  there  is  significant  additional  research  that  needs  to  be  conducted  in  this  area   by  analyzing  additional  data  sets  from  other  libraries,  by  developing  training                                                                                                                   2  http://www.w3.org/community/schemabibex/   3  http://schema.org/docs/releases.html#v1.9     12   programs,  by  developing  standardized  configurations  for  the  implementation  of  web   analytics  software,  and  by  publishing  and  presenting  on  this  topic.  With  partners  at   OCLC  Research,  the  Association  of  Research  Libraries  and  the  University  of  New   Mexico  we  submitted  a  grant  proposal  to  IMLS  in  late  January  2014  and  were   awarded  a  new  three-­‐year  National  Leadership  Grant  to  investigate  this  phenomenon   more  fully.  The  new  grant  officially  kicked  off  on  December  1,  2014.   Semantic  Identity   In  late  2012  a  Google  search  for  “Montana  State  University  Library”  produced  a   surprising  result  in  Google’s  Knowledge  Card,  the  display  that  now  commonly   appears  to  the  right  of  search  results  to  provide  immediate  information  about   organizations  and  people.  Instead  of  displaying  the  flagship  MSU  in  Bozeman,  MT,   Google’s  Knowledge  Card  in  2012  displayed  a  branch  campus  in  Billings,  MT.    Further   research  revealed  that  the  Bozeman  library  “property”  had  not  been  claimed  in   Google+,  there  was  no  article  describing  the  library  in  Wikipedia,  and  the  entry  in   Freebase  was  incomplete.    All  three  of  these  situations  have  been  remedied  and  the   library  now  appears  in  its  proper  place  in  Google’s  Knowledge  Card,  but  this   discovery  opened  yet  another  avenue  for  our  SEO  research.       Librarians  have  been  late  to  embrace  Wikipedia,  and  in  fact  have  often  actively   discouraged  engagement  with  what  has  become  the  world’s  largest  encyclopedia.    But   Wikipedia  is  much  more  than  an  encyclopedia  of  information  for  humans.  Wikipedia   establishes  the  legitimacy  of  entities  and  concepts  for  search  engines,  and  a  lack  of   presence  in  the  online  encyclopedia  often  means  an  organization  simply  don’t  exist   for  search  engines.  While  we  have  not  yet  conducted  a  systematic  search  of  academic   libraries,  extensive  spot-­‐checking  reveals  that  most  libraries  are  either  not   represented  at  all  in  Google’s  Knowledge  Card,  or  the  entry  is  not  nearly  as  robust  as   it  could  be.  This  is  an  area  where  we  believe  significant  research  must  be  conducted,   not  only  for  better  representation  of  library  organizations,  but  also  because  many   library  concepts  and  services  are  poorly  represented  on  the  Semantic  Web.  This   limits  search  engine  comprehension  of  libraries  and  results  in  fewer  user  referrals.   Impact  of  Structured  Data  Practices  in  Discovery  and  Use  of  Digital  Collections   Montana  State  University  Library  has  started  to  implement  HTML5  semantic  tagging   in  our  digital  collections.    Specifically,  we  are  looking  at  how  structured  data  practices   (e.g.,  RDFa  markup  applying  Schema.org  vocabularies  and  linking  to  DBpedia  Topics)   create  new  understandings  of  digital  collection  content  for  software  agents  &   machines.    Three  sample  collections  with  the  new  markup  are  linked  below.     http://arc.lib.montana.edu/schultz-­‐0010/   http://arc.lib.montana.edu/msu-­‐photos/   http://arc.lib.montana.edu/book/home-­‐cooking-­‐history-­‐409/     A  current  thread  of  this  research  builds  on  the  search  engine  optimization  (SEO)   work  at  Montana  State  University  (MSU)  Library  and  considers  a  control  digital   collection  that  has  not  been  optimized  (http://arc.lib.montana.edu/brook-­‐0771/)   versus  a  digital  collection  that  has  been  built  with  semantic  topics  &  machine-­‐   13   actionable  markup  (http://arc.lib.montana.edu/schultz-­‐0010/).  To  this  end,  we  are   also  redesigning  our  optimized  digital  library  application  around  three  types  of  web   pages  as  defined  by  Schema.org:  about  pages,  collection  pages,  and  item  pages.  Our   community  has  an  understanding  of  how  to  implement  structured  data;  this  research   looks  more  closely  at  the  question  of  why  we  should  (or  shouldn't)  do  it.  We  are   starting  to  understand  what  can  be  gained  by  applying  these  structured  data   practices:     1. Allowing  for  machine-­‐readable  interpretations  of  our  collections   2. Learning  how  to  apply  a  web-­‐scale  classification  system  (Schema.org)  and  linked   data  topics  to  our  collections   3. An  ability  to  expose  our  collections  data  as  an  API  or  web  service  based  on  the   structured  data  in  the  pages.   4. Discerning  the  impact  that  structured  data  has  in  creating  sessions  and  page   views  of  our  collections  through  monitoring  specific  metrics  within  Google   Analytics     In  our  preliminary  findings,  we  are  seeing  spikes  of  engagement  on  our  collection  that   has  been  optimized  with  structured  data  practices  when  compared  with  previous   year’s  data  and  our  “un-­‐optimized”  collection.  See  the  figure  below.               As  this  research  continues,  further  testing  and  longitudinal  analysis  into  the  Return   on  Investment  (ROI)  of  this  RDFa  and  semantic  HTML5  markup  will  need  to  be   verified.  The  goal  will  be  to  derive  best  practices  for  SEO  related  to  the  markup  and   semantic  tagging  of  digital  collections.         Social  Media  Optimization  Best  Practices   The  IMLS  grant  project  included  a  component  on  Social  Media  Optimization  (SMO),   directed  by  the  MSU  Library  Social  Media  Group  (SMG).  SMO  is  a  set  of  practices  and   principles  that  aims  to  increase  the  shareability  of  Web  content  through  online  social   networks  with  the  overall  goal  of  raising  the  awareness  and  usage  of  services  and   products.  The  practice  of  SMO  is  built  on  a  foundation  of  social-­‐focused  metadata,     14   guided  by  the  principle  of  enabling  user-­‐friendly  sharing  capabilities.  Libraries  can   optimize  web  content  for  Twitter  and  Facebook,  for  instance,  through  the  use  of   Twitter  Cards  and  Facebook  Open  Graph  tagging.  Both  offer  the  opportunity  to   provide  descriptive  information  about  Web  content,  which  will  then  be  included   within  the  display  of  the  Tweet  or  Facebook  post.  Twitter  offers  options  to  share   images,  audio,  and  video  in  the  Twitter  stream  through  Card  tagging.  As   demonstrated  in  the  images  below,  the  information  presented  when  a  page  has   Twitter  Cards  is  much  more  robust  and  eye-­‐catching  and,  consequently,  likely  to  be   shared.  The  Twitter  Card  data  surfaces  a  preview  of  the  image,  a  title,  an  author,  and  a   description  as  opposed  to  only  the  Tweet  text  and  a  link  to  the  resource.           Figure  1:  Tweet  of  page  without  Twitter  cards     15     Figure  2:  Tweet  of  page  with  Twitter  cards:     Likewise,  Facebook  Open  Graph  tags  pull  images,  descriptions  and  titles  of  resources   into  a  Facebook  post.  With  more  information  immediately  visible  in  these  Facebook   and  Twitter  posts,  users  are  more  likely  to  interact  with  the  posted  information  and   share  it  with  their  friends  and  followers.  In  addition  to  creating  metadata  that  can  be   machine-­‐harvested  by  major  social  media  services,  libraries  can  also  offer  user-­‐facing   social  media  share  buttons  on  web  pages  to  encourage  and  enable  sharing  across   major  platforms.           16     Figure  3:  Social  media  share  buttons     A  comparison  of  the  MSU  Library’s  site  before  Twitter  Cards  and  Facebook  Open   Graph  tags  were  added  (January  2013-­‐October  2013)  to  the  same  time  period  one   year  later  when  SMO  was  applied  (January  2014-­‐October  2014)  showed  a  jump  in   Facebook  traffic  to  our  optimized  pages  of  550  percent  and  Twitter  traffic  to  these   pages  of  84  percent.     Social  Media  Campaigns   In  order  to  understand  more  about  shareability  and  cross-­‐channel  hashtagging,  we   developed  several  social  media  campaigns,  explained  below.     1. #LibraryChamp  –  The  MSU  Library  SMG  created  a  photobooth  event  during  a   major  campus-­‐wide  festival,  Catapalooza,  just  prior  to  the  start  of  fall   semester.    The  purpose  was  to  draw  students  into  this  campaign  while   integrating  the  MSU  Library’s  presence,  and  promoting  awareness  of  the   library  while  also  exploring  targeted  cross-­‐channel  hashtagging.    Students   were  encouraged  to  have  their  photo  taken  with  the  MSU  Mascot,  Champ.   These  photos  were  subsequently  posted  via  the  library’s  Twitter  and  Facebook   accounts  using  the  hashtag,  #LibraryChamp.    According  to  Facebook’s  internal   analytics  tool,  the  reach  of  our  #LibraryChamp  posts  ranged  from  42  to  631.   Posts  that  included  photos  performed  better  than  those  without.  The   #LibraryChamp  campaign  was  our  most  successful  both  in  terms  of  post  and   tweet  engagement  as  well  as  in  our  overall  approach  to  the  campaign.     2. Video  Campaign  –  SMG  developed  a  video  to  better  understand  content   sharing  across  various  social  media  platforms.  The  video  told  the  story  of  the   library  user  experience  within  and  outside  of  the  confines  of  the  library   building.  With  a  run  time  of  1  minute  30  seconds  and  a  simple  narrative,  we   attempted  to  create  an  engaging  clip  that  is  as  shareable  as  it  is  informative.   We  used  link  tagging  within  Google  Analytics  to  track  views  of  the  video  and   discovered  that  email  is  a  highly  effective  avenue  through  which  to  share   library  content.  We  anticipated  greater  traction  from  Twitter  and  Facebook,     17   but  numbers  indicate  that  56%  of  our  audience  was  directed  to  our  video  from   links  shared  via  email.  Email  is  not  always  included  as  a  social  media  platform,   but  we’ve  learned  that  on  our  campus  email  serves  as  a  way  to  connect  and   share  information  across  departments,  colleges,  and  organizations.  We  also   invested  $5  in  a  promoted  post  to  boost  our  video’s  presence  on  Facebook.   This  resulted  in  only  25  sessions,  whereas  our  non-­‐promoted  Facebook  post   resulted  in  107  sessions.  While  the  promoted  Facebook  post  reached  more   users  than  the  regular  post  did,  this  did  not  result  in  click-­‐through  and  views.     3. On  September  24,  the  library  celebrated  “Hug  Your  Library  Day.”  We  took  a   photo  of  library  lovers  “hugging”  the  library  building  and  posted  it  across   social  media  platforms.  This  yearly,  single-­‐post  event  generated  heavy   engagement  through  shares.  Those  in  the  photo  tagged  themselves  and  shared   it  on  their  personal  social  media  accounts,  generated  numerous  likes.  This   particular  event  demonstrated  the  impact  of  posting  photos  of  people.  They   opted  in  to  sharing  the  library’s  content  with  those  in  their  personal  social   networks.  With  five  shares  and  more  than  1,500  people  reached  organically,   this  was  by  far  our  most  popular  Facebook  post.    We  saw  similar  reach  with   our  corresponding  Twitter  post  with  more  than  2,500  people  reached  (called   “Impressions”  in  Twitter  Analytics)  with  15  retweets,  9  favorites,  and  3  replies   which  is  our  most  popular  Tweet  to  date.     Social  Media  Analytics  Tools   A  number  of  tools  are  available  for  analyzing  social  media  activity.  In  selecting   analytics  tools,  it  is  useful  to  consider  what  insights  the  library  hopes  to  gain  through   the  use  of  these  tools.    For  example,  some  products  may  suggest  that  you  follow   certain  accounts  on  Twitter  because  they  are  highly  influential  in  the  number  of   people  they  reach  when  posting.  This  same  product  might  suggest  that  you  unfollow  a   person  because  they  have  few  followers  and  thus  have  less  influence.    There  is  not  a   one-­‐size-­‐fits-­‐all  product  for  libraries  as  the  information  sought  will  vary  by   institution,  but  having  a  clear  idea  of  what  the  library  hopes  to  learn  through   analytics  tools  will  be  a  useful  exercise  prior  to  investigating  options.     Two  of  the  largest  social  media  platforms  used  by  libraries,  Twitter  and  Facebook,   have  their  own  internal  analytics,  which  are  useful  complements  to  third-­‐party   products.  Twitter  has  an  Application  Programming  Interface  (API)  that  allows  for   querying  and  downloading  of  data  for  local  analysis.  In  July  2014,  Twitter  released  a   new  set  of  analysis  tools  (analytics.twitter.com),  which  conveys  up-­‐to-­‐the-­‐minute   information.  Beyond  the  typical  reporting  of  retweets,  favorites,  and  replies,  Twitter   Analytics  offers  helpful  information,  such  as  how  many  people  viewed  a  given  Tweet   (i.e.  impressions),  how  many  engagements  it  received  (e.g.,  click-­‐throughs  on  links,   viewing  of  photographs  posted,  etc.),  and  breakdowns  of  this  data  by  the   hour.    Facebook  also  has  an  analytics  component  called  Insights,  which  is  built  into   Facebook  Pages  (pages  of  entities  such  as  libraries,  rather  than  personal  pages  of   individuals).    This  area  shows  how  many  people  viewed  the  posts,  how  many  people   liked  and  shared  it,  and  how  many  post  links  were  clicked.  Both  Twitter  and     18   Facebook  offer  longitudinal  considerations  so  that  activity  can  be  compared  over   time.     Third-­‐party  analytics  tools  offer  additional  perspectives  about  social  media  activity   and  the  user  community  beyond  native  social  media  analytics  tools.    For  this  study,   tools  reviewed  included  SocialBro,  ManageFlitter,  BirdSong,  and  Commun.it.    These   tools  were  selected  based  on  a  literature  and  open  web  review  of  highly-­‐ranked  tools   in  the  general  social  media  community.  Each  of  these  tools  includes  basic  analytics,   information  about  those  accounts  following  and  being  followed,  and  engagement.   Some  commercial  products  offer  a  free  account  and  then  more  features  at  an   additional  cost.  Commun.it  (www.commun.it)  is  recommended  because  of  its   simplicity  of  use  and  suggested  accounts  to  follow  based  on  your  existing   activity.    SocialBro  (www.socialbro.com)  is  also  recommended  because  of  its  detailed   analytics.    Both  products  have  a  modest  tiered  subscription  cost  structure.    To   accompany  insights  from  these  products,  we  also  recommend  using  Google  Analytics,   free  standard  web  analytics  software  currently  used  by  many  libraries.    It  includes   social  channel  integration  for  viewing  a  range  of  social-­‐related  Web  traffic,  including   social  referrals  and  social-­‐initiated  user  movements  through  a  website.         Communication  Plan   Our  proposal  included  dissemination  of  the  findings  of  our  research  through   publications,  presentations,  and  webinar  training  sessions.    We  have  made  significant   contributions  in  each  of  these  areas,  and  future  publications  are  planned  as  our   ongoing  research  is  completed.   Publications   • Arlitsch,  Kenning,  Patrick  OBrien,  Jeff  Mixter,  Jason  Clark  and  Leila  Sterman.   “Methods  for  Making  IR  Content  Discoverable”  (Tentative  chapter  title  in   forthcoming  book  Making  Institutional  Repositories  Work),  Purdue  University   Press,  2015.   • Mixter,  Jeff,  Patrick  OBrien  and  Kenning  Arlitsch.  “Describing  Theses  and   Dissertations  using  Schema.org,”  Proceedings  of  the  International  Conference  on   Dublin  Core  and  Metadata  Applications  2014,  Dublin  Core  Metadata  Initiative:  138-­‐ 146.  http://dcevents.dublincore.org/public/dc-­‐docs/2014-­‐Master.pdf     • Arlitsch,  Kenning,  Patrick  OBrien,  Jason  A.  Clark,  Scott  W.H.  Young  and  Doralyn   Rossmann.  “Demonstrating  Library  Value  at  Network  Scale:  Leveraging  the   Semantic  Web  with  New  Knowledge  Work,”  Journal  of  Library  Administration,  54,   no.  5  (2014):  413-­‐425.  DOI:10.1080/01930826.2014.946778   • Arlitsch,  Kenning,  Patrick  OBrien  and  Brian  Rossmann.  “Managing  Search  Engine   Optimization:  An  Introduction  for  Library  Administrators.”  Journal  of  Library   Administration,  vol.  53  no.  2-­‐3,  November  2013,  pp.  177-­‐188.     DOI:10.1080/01930826.2013.853499       19   • Young,  Scott,  Jason  Clark,  Patrick  OBrien  and  Kenning  Arlitsch.  “Metadata  First:   Using  Structured  Data  Markup  and  the  Google  Custom  Search  API  to  Outsource   Your  Digital  Collections  Search  Index.”  Community  Spotlight  blog,  Digital  Library   Federation,  September  5,  2013.    http://www.diglib.org/archives/5027/     • Arlitsch,  Kenning  and  Patrick  OBrien.  “Our  Relationship  with  Internet  Search   Engines,”  CLIR  Issues  no.  92,  March/April  2013.  Available  online  at   http://www.clir.org/pubs/issues/issues92     • Arlitsch,  Kenning  and  Patrick  OBrien.  (2013).  Improving  the  Visibility  and  Use  of   Digital  Repositories  through  SEO:  A  LITA  Guide,  ALA  TechSource.    ISBN-­‐13:  978-­‐1-­‐ 55570-­‐906-­‐8.    http://www.alastore.ala.org/detail.aspx?ID=4256     • Arlitsch,  Kenning  and  Patrick  OBrien.  “The  Importance  of  Being  Found.”   Informed  Librarian  Guest  Forum,  November  2012.   http://www.informedlibrarian.com     • Arlitsch,  Kenning,  and  Patrick  S.  O'Brien.  "Invisible  institutional  repositories:   Addressing  the  low  indexing  ratios  of  IRs  in  Google  Scholar."  Library  Hi  Tech  30,   no.  1  (2012):  60-­‐81.     Presentations  and  Training   • Rossmann,  Doralyn  and  Scott  W.H.  Young.  “Share  and  Share  Alike:  Applying  Social   Media  Optimization  (SMO)  to  Enhance  Web  Content  and  Connect  with   Users.”    LITA  Forum  2014,  Albuquerque,  NM,  November  8,  2014.   • Arlitsch,  Kenning.  “Access  and  Discovery”  (panelist).  New  Media  Consortium   Virtual  Symposium  on  the  Future  of  Libraries,  November  12,  2014.   • Clark,  Jason  A.  “RDFa  Markup,  Schema.org,  and  DBpedia  Topics:  A  Closer  Look  at   the  Holy  Trinity  of  Structured  Data  and  their  Impact  on  the  Findability  of  Digital   Collections.”  Digital  Library  Forum  2014,  October  28,  2014.   • Arlitsch,  Kenning.  “Does  Google  Know  Us?”    Webinar:  Wikipedia  and  Libraries:   Increasing  Your  Library’s  Visibility,  OCLC  Research  Insight  Series,  October  21,   2014.  http://www.oclc.org/research/events/2014/10-­‐21.html     • Mixter,  Jeff.  “Describing  Theses  and  Dissertations  using  Schema.org,”  International   Conference  on  Dublin  Core  and  Metadata  Applications  2014,  Austin,  TX,  October  10,   2014.   • Clark,  Jason  A.,  Patrick  OBrien,  Scott  W.H.  Young  and  Kenning  Arlitsch.  “Search   Engine  Optimization  (SEO)  for  Libraries  [Workshop/Course],  July  17-­‐23,  2014.   http://www.ala.org/lita/search-­‐engine-­‐optimization-­‐seo-­‐libraries-­‐ workshopcourse       20   • Arlitsch,  Kenning.  “Wikipedia  and  Libraries:  increasing  your  library’s  visibility,”   (with  Cindy  Aiden,  Merrilee  Proffitt,  Jake  Orlowitz,  et  al.),  ALA  Annual  2014,  Las   Vegas,  NV,  June  28,  2014.   • Arlitsch,  Kenning,  Patrick  OBrien,  Martha  Kyrillidou  and  Ricky  Erway.  “Accuracy   in  Web  Analytics  Reporting  on  Digital  Libraries,”  CNI  Membership  Meeting,   Washington  D.C.,  December  9,  2013.    http://www.cni.org/topics/assessment/f13-­‐ arlitsch-­‐accuracy/   • Arlitsch,  Kenning.    “Search  Engine  Optimization:  Why  it  Matters  to  Library   Leaders.”  ILEAD  USA,  Utah  State  Library,  Salt  Lake  City,  UT.    October  23,  2013.     http://tinyurl.com/lmx8yxz     • Clark,  Jason  and  Scott  Young.  “Metadata  first:  Using  structured  data  markup  and   the  Google  Custom  Search  API  to  outsource  your  digital  collections  search  index.”   Digital  Library  Federation  Forum,  Austin,  TX,  November  4,  2013.   • Arlitsch,  Kenning  and  Patrick  OBrien.  “Google  Scholar  and  Institutional   Repositories:  Improving  IR  Discoverability,”  ACRL  E-­‐learning  Webinar,  June  6,   2012.   • Arlitsch,  Kenning  and  Patrick  OBrien.  “SEO  for  Digital  Repositories.”   o Utah  Library  Association  Annual  Conf.,  Salt  Lake  City,  UT,  April  27,   2012   o CNI  Spring  Membership  Meeting,  Baltimore,  MD,  April  2,  2012   o OCLC  TAI  CHI  Webinar,  March  16,  2012   http://www.youtube.com/watch?v=190D6QCk2ok     o CONTENTdm  Users  Group,  American  Library  Association  Midwinter   Conference,  Dallas,  TX,  January  23,  2012   o Western  Archival  Network  (IMLS  planning  grant)  meeting,  University   of  New  Mexico,  Albuquerque,  NM,  January  12,  2012.   o CNI  Spring  Forum,  San  Diego,  CA,  April  5,  2011   • Arlitsch,  Kenning  and  Patrick  OBrien.  “Improving  Institutional  Repository   Visibility  in  Google  and  Google  Scholar,”  Digital  Library  Federation  Forum,   Baltimore,  MD,  October  31,  2011.