Column Editor: Donald T. Hawkins (Freelance Conference Blogger and Editor)
Column Editor’s Note: Because of space limitations, an abridged version of my report on this conference was published in the print issue of volume 27 #6 on page 67. A PDF of the print article can be downloaded here.
The National Information Standards Organization (NISO, http://www.niso.org) held a Forum on the future of discovery services on October 5-6 at Johns Hopkins University’s beautiful suburban Mount Washington Conference Center in Baltimore, MD. There were about 100 attendees at this Forum as well as a number who attended via a live stream.
NISO’s Discovery to Delivery (D2D) Committee had commissioned a white paper by Marshall Breeding, an independent library consultant, on the future of discovery services, which formed the basis for the Forum.
NISO White Paper
Breeding opened the Forum with a summary of his white paper, “The Future of Library Resource Discovery”, (available at http://www.niso.org/apps/group_public/download.php/14487/ future_library_resource_discovery.pdf), which has proven to be very popular, with over 35,000 downloads since it was published in February 2015.
Discovery has come a long way since the publication of simple lists of volumes held in a library. Online catalogs appeared around 30 years ago, and some still exist. Web-based index discovery became available in 2009. Now,
- Libraries make large investments in content; discovery helps them leverage those investments (which continue to increase as prices rise).
- Discovery services have increasingly become part of the infrastructure of most academic libraries, although some libraries have opted out and rely on Google Scholar and other general search engines.
- Discovery services have a central index of a broad range of content, and a delivery system that provides relevant results to users in several different ways.
- The interfaces have improved over time, but they are not perfect.
- Libraries still have problems with access to materials not included in the central indexes.
- Coverage of material from abstracting and indexing (A&I) services is still an issue. It is important to understand how discovery services incorporate these resources.
The state of the art continues to advance:
- Non-textual material is beginning to appear in discovery systems.
- Relevancy is improving as a result of more sophisticated search and retrieval technology, but improved and more transparent relevancy rankings are still needed.
- Socially-powered discovery (i.e. incorporating usage data in the search engines) is starting to appear.
- Scholarly communications are shifting rapidly towards open access (OA) content. Access to such content by users not affiliated with a university can be difficult. So far, no OA discovery indexes exist—and they may never exist because of the large resources necessary to create and maintain an index of billions of items from millions of providers.
- Gaps still remain in indexed content, especially for non-English language materials. Better support for multi-lingual search is still needed.
- Special collections and archives are valuable to libraries and need to be exposed in broad-based discovery systems.
- Linked data is a major trend, but many sources cannot be treated with linked data because they are proprietary.
- Interoperability of discovery services with learning management systems is needed.
Trends beyond index-based discovery will grow. Better search and retrieval technologies will appear outside of the library world and will have an impact; discovery vendors need to think about how to incorporate these advances to improve access to all forms of content. Most users typically do not start their research with a library’s website or discovery service, so discovery must become part of the general information infrastructure.
Breeding closed on an optimistic note, saying that discovery services will remain one of the essential components in libraries, and investments made in them will provide an entrance into future phases of the information infrastructure. We are now at a critical point; the current systems may be only an interim step in discovery. It is important for the stakeholders to engage cooperatively in defining the future. Breeding recommended that the next development phase of discovery include improving participation from the A&I providers, improving data exchange mechanisms through an increase in the quality of the metadata, and enhancing interoperability with resource management systems. Opportunities for discovery are directly dependent on the future of scholarly publishing and communication.
Vendor Panel Discussion
Breeding’s keynote was followed by a panel discussion by representatives of the four major discovery services. Scott Bernier, Sr. Vice President of Marketing at EBSCO, wondered how we can optimize the value of our resources. Discovery is about a library’s entire collection and all of its users and is at the center of everything. It is critical that a user’s first experience with a system be successful. EBSCO’s goal is to surface the right content to the right user at the right time using precision, relevancy ranking, and indexing technologies; its system design principles include extensive and reliable coverage, democratic delivery and access regardless of the source of the resources, and designing an experience that makes research easier and seamless. When the right item is found, it must be delivered to the user with the library’s goals in mind.
Steve Guttman, Senior Director of Project Management, ProQuest, said that design principles for its discovery product, Summon, include:
- Democratic discovery: guiding the user to the best products regardless of their source,
- Transparency: understanding why results were obtained, and
- Fairness: allowing each piece of content to have an equal chance of being found in a search.
The same item can have different metadata if it comes from different sources, and the record with the richest metadata is the one seen most frequently by discovery service users. ProQuest enriches the metadata from each provider using a “match-and-merge” technology, creating a merged record from duplicates and combining the metadata. The merged record is the one that is indexed; it points to all of the original duplicate records, ensuring content neutrality. ProQuest is committed to the Open Discovery Initiative (ODI) to ensure collaboration with all content providers, democratic discovery with fair and unbiased indexing, and full transparency and detailed disclosure.
Mike Showalter, Executive Director, End-User Services, OCLC, said that with 347 million records, OCLC represents the collected holdings of everyone. Its WorldCat discovery service contains over 1.9 billion electronic, digital, and physical items from all major publishers.
Ido Peleg, Vice President, Solutions and Marketing, ExLibris, said that today’s systems are mobile, personalized, and explorative, and responsive design is necessary. Typically, libraries define explorative systems as those in which item titles are known. We need to understand users and how they use content, which can be derived from analytic data. Linked data will allow us to provide resources from all sources. Peleg cited the example of Lego as a modern company that interacts with its customers; on its ideas website (https://ideas.lego.com/), people can suggest new sets they would like to see created.
Following their presentations, the panelists were asked to discuss three questions:
- How is your organization narrowing the gap between content participation and those not participating?
- Services are created differently and leverage data in different ways. We need to understand the different goals of those not participating. ODI has brought the awareness of discovery to the public and content providers.
- It takes a lot of work to build indexes; we need to decide who we want to work with and the most content most important to get into the database.
- We must move down the long tail. Many small publishers have never heard of discovery systems.
- The vendors do not always know what is right; there are hundreds of options available in the systems.
- Partnerships are critical. Building discovery systems is a very ambitious undertaking, and it is not surprising that there only a few organizations with the resources to be able to do it.
- Does your organization have a use for linked data, and how will you use it in a discovery system?
- We should be asking about how to bring improvements into the search process, and the answer might or might not involve linked data.
- We cannot expect each library to undertake the task of creating the linked data. The concept of “deploy once and everyone use” is important.
- There is a great demand in the commercial world to know about library holdings. Everything focuses on solving the end user’s problem. In some places, linked data is very helpful.
- Are you making discovery your primary product and are your products available in smaller packages?
- OCLC focuses on a modular approach to retrieving specific content. It has 24 APIs and tries to cooperate with users as much as possible to make it easy for readers.
- All of ProQuest’s content is now exposed through Google Scholar, so it can be accessed by students whether they access it through the library’s website or not.
- The main thing is whether we solve the user’s need. We must build products with an eye towards flexibility and let the user take control of the products they have.
- We need to provide a clear interface for all types of users.
“A Billion Lessons Learned”
Karen McKeown, Director, Product Discovery at Gale Cengage Learning, titled her talk “A Billion Lessons Learned on Ways to Make Discovery Better”. She noted that Gale was one of the first users of library discovery services. Students feel a value for the library; in a recent survey, nearly 90% of them said that the library provided value to them and 75% said they wished they had taken more advantage of the library and its resources. But 70% of them said that they do not ask campus librarians for help with their assignments. To address this problem, the “MindTap” app (http://www.cengage.com/mindtap/) that combines library resources with tools to make courses more engaging was developed. MindTap has been very popular; one system reached 90% usage in a few weeks. Gale’s content is also being integrated into the Google Classroom education app (https://classroom.google.com) so that resources retrieved through a discovery system can be saved directly to Google Drive.
McKeown said that the lessons learned are described by the “4 Cs”:
- Content: Reaching full coverage of all databases is not easy.
- Coverage varies across partners.
- Communication must be open and visible; partnership lists should be available on systems’ websites.
- Collaboration and continuous improvement are important.
The top issues in discovery systems are linking and metadata. Direct linking improves the user experience, so Gale developed a short “Gale Direct Link” to facilitate it, which has been very popular. McKeown also suggested that discovery must be approached from a broad perspective: the right place (library, classroom, open web) and the right time. In 2015, Gale is issuing KBART title lists to replace all those it previously provided, which will improve the user experience. (KBART, “Knowledge Base and Related Tools”, is a NISO initiative of specifications for title lists.)
Gregg Gordon, President, Social Science Resource Network (SSRN), discussed serendipitous discovery, a topic on which he has written in ATG (Vol. 22, Issue 4, page 18, Fall 2010). It facilitates finding information that previously the searcher did not know existed. SSRN levels the playing field by providing a platform for authors around the world to publish their work, even if it has not been peer reviewed. Submissions have come from 153 countries. SSRN subscribers receive alerts to new content and web access to its entire database.
Because vendors have a lot of content and expose it in different ways, researchers in countries such as India are using the platform to share information and are finding it outside of traditional access paths. Authors can revise and update their articles based on feedback from readers; on some days, SSRN receives more revisions than new submissions. Scarcity of information is no longer an issue; we are flooded with it.
Citations are an excellent metric for finding what is important; many people look at them first to decide whether they want to read an article or not. Relationship methodologies allow readers to see how articles link together. One can find classics, papers by experts, what’s hot, or just browse through and discover items by serendipity. Gordon suggested that perhaps scholarly research has a bad user interface; we may be too focused on searching and discovering rather than on the data.
A Publisher’s Long-Term Commitment to Improving Discovery Services
IEEE, the world’s largest technical membership organization (it has over 415,000 members in 160 countries) publishes 174 journals, over 1,400 titles of conference proceedings, over 800 product and technology standards, over 300 educational courses, and 3 e-book collections. Julie Zhu, Discovery Services Relations Manager, said that IEEE was among the first publishers to become ODI compliant: it sends its records to all four discovery services providers.
A publisher’s tasks are to generate metadata and full-text feeds of its content and send them to repositories, send Digital Object Identifiers (DOIs) to CrossRef, generate title lists, and send the data to vendors’ knowledge bases. The workflow is very complex (see the flow diagram below) and cannot be done by one person; Zhu needed a 9-person group to handle it.
IEEE fully cooperates with the discovery service providers through a program of site visits and participation in many industry activities (conferences, etc.). As a result of these efforts, many gaps in IEEE’s content were identified and filled, over 6,000 missing DOIs were added, 31 KBART title lists were compiled, and many ISBNs and ISSNs were corrected. After these improvements were completed, the complete database was re-delivered to the providers. A program of library visits was recently launched to identify issues, train librarians, and conduct user surveys.
IEEE’s future plans include:
- Deepening relationships with discovery service providers by providing coverage analysis tools, tracking usage, and ranking results,
- Improving metadata and content delivery, and
- Deepening relationships with libraries by conducting more workshops and webinars and undertaking collaborative research projects.
Where Do We Go From Here? Assessing the Value and Impact of Discovery Systems
Michael Levine-Clark, Professor, University of Denver Libraries, and Jason Price, Director of Licensing Operations, Statewide California Electronic Library Consortium (SCELC), said that libraries’ goals differ widely and include:
- Improving the user experience and to provide a Google-like experience,
- Providing one-stop shopping for many resources, primarily articles and books, in all disciplines,
- Replacing the OPAC,
- Reducing the number of individual A&I databases to which they subscribe, and
- Increasing the number of users starting their research with the library’s resources.
Levine-Clark noted that referrals to a publisher come from discovery services, resolvers, database searches, and OPACs:
- Are those that lead to full text use more important than others?
- Are users finding and accessing more relevant content?
- Now that libraries have installed discovery services, are users who had previously given up on the library motivated to return?
- Is Google Scholar a viable alternative to a library discovery system?
Narrower and more analytic questions might be:
- To what extent do a library’s e-resource management and linking configurations limit discovery system effectiveness?
- Does the configuration of a discovery system affect the user experience? (One difficulty in answering this question is that libraries often reconfigure their services gradually based on usage, so it is difficult to link a specific configuration to its impact on usage.)
- Is there a future of search and content for libraries?
- Is OA a threat to libraries?
- Should discovery be ceded to Google Scholar?
Future of Resource Discovery from a UK Perspective
The second day of the forum opened with a keynote presentation by Neil Grindley, Head of Resource Discovery at Jisc (formerly the Joint Information Systems Committee—JISC), who discussed the future of resource discovery from a UK perspective. He began by noting that a huge amount of work is involved in compiling the indexes of a discovery services, and discovery ends up being more about data than resource discovery.
Jisc provides the network backbone for about half of UK universities and colleges and works with networks and technology, digital resources, advice and engagement, and research and development. It receives about 80% of its funding from the government and research councils and the rest from universities and colleges. Because of Jisc’s coordination activities, UK libraries tend to be more collaborative and willing to share data than US libraries. Consortia are more numerous, more embedded, and more mature in the US than the UK, and more US libraries are using and supporting open source software. Some US libraries are far ahead of those in the UK in terms of implementing discovery systems because they have more resources.
Here are some of the issues that Grindley sees with a “one-stop shop”:
- How much can we make available in one place?
- How do we convert information into knowledge? Does it reflect the user journey?
- Can users get to the appropriate item if they access the discovery service by different routes? We need to find out more about how tools suit searcher behavior, get more quantitative information so that the experience can be more personalized, and make content discoverable where people are actually searching.
- The overriding concern is data quality.
In an academic library staff survey of the use of IT systems conducted in 2015, users said they use general library search systems, Wikipedia, popular websites such as Flickr and YouTube, and an A-Z list of databases and resources. They do not use union catalogs, A&I databases, full-text databases, or reading lists.
Trends and research in scholarly discovery behavior:
- Should libraries play a role in discovery? They tend to overestimate the extent to which users understand the library concept, tools, and even basic bibliographic formats and relationships.
- Online activity is pervasive across all age groups and categories of users.
- While some are looking for ways to make library services more effective, others are challenging the idea that libraries should play a role in discovery.
- More could be done to ensure seamless access across services.
- There is a developing focus on understanding what library and alternative discovery tools each do well.
Major areas of concern to UK academic libraries include print and collection management, collaboration to reduce duplication, data quality, metadata and persistent identifiers. Libraries should refocus on collections management.
Jisc is not in the business of competing with discovery services. Libraries want to demonstrate their value and want their holdings to be broadly discoverable. Openly licensed metadata will enable discoverability across the information ecosystem. New emerging trends for discovery include:
- Specialized apps for discovery,
- Streaming services similar to music discovery systems,
- Increasing demand for access via mobile devices,
- A hidden economy of user-curated scholarly discovery,
- Rapidly changing online trends of social media usage, and
- Next generation expectations for search.
The who, what, when, where, and why of library discovery
Wearing his jester’s hat, Peter Murray, Library Technologists and blogger at The Disruptive Library Jester (http://dltj.org/) asked what a discovery layer might look like five years from now and showed a video clip (https://www.youtube.com/watch?v=KkOCeAtKHIc) of a recent ad for Amazon’s new Echo System, (http://www.amazon.com/Amazon-SK705DI-Echo/dp/B00X4WHP5E), a voice-activated command and information system, which is one form of discovery. Murray asked the audience to consider how present-day discovery services are different from Echo.
Who is our most challenging person to support? Do they know how to navigate the web? Operate a mouse? Understand user interface clues? Do they have a speech, mobility, or visual impairment? Can they even form the question they are asking? The people we want to serve with our discovery layers have a wide range of skills and knowledge. Is there any way for us to get that context?
The “what” should be rooted in the tradition of the reference interview: find the answer or provide instruction on how to find the answer. Reference interviews guide the user through a maze of information. Do our discovery layers lead the user to the answer or are they just mimicking the single search box? (In the Amazon Echo video, most of the information provided came from simple factual questions with simple answers.)
Do we envision black cylinders in an office, on the reference desk, or in a dorm room, like the Echo? Discovery services do not need to be on a watch. They must be responsive to mobile interfaces, but mobile devices are not normally used by users to conduct their in-depth research. Can we integrate the layers into the labs, performance spaces, etc. where the user could have a question to which they are seeking the answer? That is more important than the mobility of a watch.
When do undergraduates do their research? Some of the contextual clues the discovery layer could have could be time of day, time of year, or day of week, so that it could ask whether the user is just looking for three best articles or doing an in-depth study. These are signals; Google uses over 200 signals when a user does a search so that it can tailor the results to their needs.
“Why” is a special signal and requires special handling. It has significant privacy implications; for example, we do not like to be followed by ads after asking a question. Libraries must respect user privacy. What can we infer from the questions users have asked over the past month? The “why” signal distinguishes discovery services from Amazon Echo, Siri, and other personal assistants.
Maybe some of the ideas discussed at the Forum will make a real difference in the discovery layers and related services used by our patrons. Here are some comments that Murray found significant:
- You should not have to educate your user but if you could get better results after five minutes of training one of them, what would you do?
- Embedded librarians should not be thinking about competing with Mendeley, Google, etc. We should be working with those services for the benefit of our users.
- We should spend effort on realizing where users are when they want more information. How useful are discovery services for our students?
- Links for searching Wikipedia or Google are on many websites. Why don’t we have one for searching the library’s resources? Users should not need to go to a library and set up access to the discovery service before using it. (For example, the link to Wikipedia from within the Digital Public Library of America website works very well.)
- Think hard about what young people are doing when they’re on Instagram, etc.
- Where do electronic resources turn up in the electronic health record?
- Can we construct a “privacy when desired” feature or have a “do not track me” button for some searches? Privacy is important, but users expect libraries to use their personal data in processing their searches.
- Walking through the stacks is great serendipitous browsing, but we must not forget that there are always books not in the open stacks.
- How do we learn what users want while retaining serendipity? What is the balance between serendipity and finding the answer that the user wants? Do we risk alienating users if the system allows for serendipity but then gives them things they don’t want? We need to broaden our idea of what serendipity means and expand beyond the idea of libraries as holders of monographs, serials, and other materials.
- Librarians have mixed needs in discovery. Quality discovery user interfaces do not always result in increased usage. How do we measure the value of our systems? Is rising usage good or bad? How do we answer the question “Did the user find what they needed?”
A conscious effort was made by the Forum organizers to stimulate group discussions of the presentations, so two “roundtable discussion” periods were included on the program. Here is a brief summary of the topics discussed, taken from the reports presented by each group.
- Standardized vocabularies. As we put more content together in a single environment, users and producers face more vocabularies that have to work well together. There are no standards for doing this.
- Direct transfer of bibliographic data to discovery systems. Currently, discovery service vendors must produce XML data for discovery systems and MARC records for OPACs: a duplication of effort. We should get rid of the OPAC and use the discovery service as the source for all information. The traditional library will become an asset management system, and the discovery system will become the way the users find all types of content.
- Interoperability of non-textual data, specifically sound and images. Many universities have collections and archives of images, music scores, sounds, etc. that have little textual data. Metadata for these items, which may be incomplete and hence will not work well as a finding aid, must be cleaned up before we can think about making it available in discovery systems. No single library will have the capacity to do the cleanup work, but collaboration may provide a way for them to do it.
- OA discovery. We need to find a way to easily figure out what an OA platform has in it and how open it actually is. Could NISO develop a means to judge quality in institutional repositories?
- Coexistence of linked open data and vendor business models. Linked open data cannot coexist with vendor business models. Publishers can make money even if they are giving their data away at no cost by means of ads, partially linked open data (only for things not central to the business model), or a “gold” model to pay for the linked open data (as Getty does, for example.) Publishers want to monetize their product, and if we are working with truly open data, we cannot avoid this fact.
- Discovery of unique library data in multiple formats and making it permanent. Items without metadata may appear at the bottom of the search results, which may be overcome by making systems interoperable and deciding how to deal with archival materials. Today’s discovery services are biased towards textual content. Anything a publisher produces that is not an article or a book is very difficult to get into a discovery system.
- Metadata quality. Transparency is the theme. Many content providers are not following standards. Is there a better way to create the metadata?
- Personalization. If everybody personalizes items, they will get trapped in their own little world and miss connections between disciplines. How can we build in randomness and serendipity? Why can’t discovery systems use student data like the registrar’s office does?
- Improving relevancy ranking in discovery systems. There is much diverse content available, and good relevant results are important to users. All systems have a baseline ranking algorithm based on the use of clicks on results, downloads of articles, and user ratings. Other ranking criteria could include campus borrowing history, citations and links to related content, “best bets” or human generated “trigger words”, and expert systems to prompt the user for information about themselves by asking them qualifying questions, then narrow the results based on their responses. In all these ways there must be a way for the user to opt in or out and transparency to understand how the system is using their information to change the results. Metadata quality is very important.
- Library relevance in a world of Google Scholar and Mendeley. What are we doing, for what end, and why are we doing it? Librarians will become ethnographers of our users, and we need a deeper understanding of them. We should demystify library use. Google Scholar might be a model, but only two or three of Google’s employees work on Scholar—it is a passion project and could disappear tomorrow! NISO should conduct a symposium on this.
- Non-textual discovery using a text-based interface. What are the ways to find these materials with today’s interfaces? What is the metadata for non-text materials? Is the metadata for them being adjusted to the discovery system? We must discuss alternative ways of searching, know the users, develop unique identifiers, and establish interoperability between the current discovery systems and other types of systems.
- Understanding identifiers across systems. When publishers issue data, they also supply identifiers. How do we track metadata across ecosystems? How do we identify versions? NISO could develop a versioning standard for DOIs and ISBNs. Who is responsible for ensuring metadata quality? What is a minimal set of metadata? Are facts about an object copyrightable? Some A&I producers work on the premise that basic metadata cannot be copyrighted. The Digital Public Library of America has required that all metadata be governed by a CC0 license.
- Discovery system interoperability with internet systems. Interoperability allows systems to interact with each other. Is it practical to interoperate many systems? Will that work practically in libraries? The more systems we try to interoperate, the harder the task. Synchronizing knowledge bases is very hard because there is no standardization of names of resources, and major vendor’s knowledge bases are each structured differently. Is the knowledge base a commodity? Knowledge bases are for knowledge management, not access management.
- User interfaces. We want vendors to tell us how they do their relevancy ranking, but that probably will not happen because it is proprietary. NISO should create a “sandbox” or forum so that people could test different interfaces. Vendors could put their ideas into the sandbox and let users play with them so that libraries can give feedback to the vendors on what is most useful.
Slides from the Forum presentations are available at http://www.niso.org/news/events/2015/October_discovery/agenda_discovery_forum/#agenda.
Donald T. Hawkins is an information industry freelance writer based in Pennsylvania. In addition to blogging and writing about conferences for Against the Grain, he blogs the Computers in Libraries and Internet Librarian conferences for Information Today, Inc. (ITI) and maintains the Conference Calendar on the ITI website (http://www.infotoday.com/calendar.asp). He is the Editor of Personal Archiving: Preserving Our Digital Heritage (Information Today, 2013). He holds a Ph.D. degree from the University of California, Berkeley and has worked in the online information industry for over 40 years.