Information Discovery and the Future of Abstracting and Indexing Services: An NFAIS Workshop

by | Aug 6, 2013 | 0 comments

Out and About: Reports of Information Industry Meetings
By Donald T. Hawkins

A common misperception in today’s era of widely available information of every type is that there is no longer any need for abstracting and indexing (A&I) services.  This NFAIS workshop in Philadelphia on March 15, sequel to a previous workshop on information discovery (see ATG, January 2013, p.66), made it very clear that A&I services still have a valuable role to play.  It began with a review of history and then explored these services’ role from a librarian’s view, looked at two emerging players, examined how existing A&I service providers are adapting to the current information environment, and concluded with some predictions for the future.

History, Mission, and Current Status of A&I Services

Bonnie Lawlor, Executive Director of NFAIS, traced the role of A&I services throughout history.  We are acutely aware that information overload is a significant issue today. But it is not new. Overload began as soon as printing presses started producing publications; in fact, the first journal (Journal des Scavans) published in 1665 contained abstracts of articles.  Formal abstract journals began in 1820 with the Pharmacopeia of the United States, and their number has grown exponentially since then.  According to an article published in Learned Publishing,[1] in 2009, an estimated 1.5 million articles were published that year.  A&I services are an answer to this explosion of information.

During the 1950s, a huge increase in the number of journals significantly impacted A&I services.  Chemical Abstracts fell 3 years behind, for example.  The only way to cope was for the services to embrace computer technology, and they became very early adopters of it, which gave them an early competitive advantage in the digital age. Indirect benefits of computerization included production efficiencies, increased currency of content, and opportunities to develop new products. As a result of the move to electronic processing, business models changed:  print sales declined, and licensing of databases became more common. Electronic products are intangible, so product loyalty was hard to maintain.  Staff had to be retrained in new skills such as customer service, and markets became global.  And digital content requires an ongoing investment in system upgrades.

Price increases have resulted in the emergence of alternative publishing models, leading to today’s emphasis on open access publishing and the mindset that information should be freely available, which has been encouraged by funding agencies’ mandates that research results must be deposited in freely accessible repositories.

Searching behavior has dramatically changed, and now a search engine is the first choice of many users who rely on themselves to select results.  Many of today’s students therefore equate research with using Google.  Users now want convenience, linking, interactive search systems, easily discoverable supplemental material, and analytic tools, all packaged in a pleasurable and reasonably-priced (or free) search experience. Mobile phones and social media have had similar far-reaching impacts on A&I services; delivery of information to hand-held devices has now become the norm.

In the face of these major changes, will A&I services survive?  As long ago as 2003, F.W. Lancaster asked this question and concluded, “the viability of a vast network as an information resource must depend upon the imposition of quality filters similar to those in a print-on-paper world.”[2]  These filters will be A&I services and will continue to have a strong role.  The ultimate decision lies with the user; whatever serves their needs will go forward.

Role of A&I Services in Information Discovery: The Librarian’s Perspective

Lawlor’s opening presentation was followed by a session in which three librarians discussed how their users search for information and the role that A&I services play.  (Librarians are major users of A&I services.) Andrew Asher from Bucknell University conducted a study of 86 students who used a discovery service.  Students have become used to Google’s single search box, and they generally do only simple searches with one or two keywords.  If they do not find what they are looking for after a cursory evaluation of the search results, they tend to start another search with different terms rather than refining the one they began with.  Most students do not understand how searching works and tend to use poorer quality terms, assuming that if something was not found, it does not exist.

Students become very loyal to a tool that works for them, even if it does not contain the most appropriate databases for their search.  They like discovery services and trust them to retrieve the best results. They believe that the first 5 to 10 results are the best (or only) ones the library has to offer.  Developers and librarians must therefore be careful in setting up the defaults for discovery tools because of their effect on search results.  Detailed results of this survey will be published in a forthcoming article in the July 2013 issue of College & Research Libraries.

According to Chris Strauber from Tufts University, full text is ultimately the point of the search process, but how one gets there is also important.  A&I services can be superior resources because their indexes are compiled by people who understand the subject area, and the metadata is in the language of the discipline.  Discovery services are good for exploratory questions, but they have limited metadata. Indexes and abstracts add human expertise to what computers can do.  The browsing function is just as important as search, particularly for the humanities; lots of questions can be answered simply by browsing.

The final presentation in this session was a report on a white paper produced for Sage Publications and presented at the ALA Midwinter conference in 2012.[3]  It discussed best practices for access and discovery of content in libraries, as well as problems that libraries, publishers, and vendors need to solve.  Cross-sector collaboration is necessary in the discovery of scholarly content, and collaborative groups such as the NISO Open Discovery Initiative (ODI) are being formed to develop standards and best practices for pre-indexed library discovery services.  Linked open data has become very prominent, and the open metadata concept is growing in popularity.

Librarians and publishers can add value for learning by integrate their expertise into user workflows.  For example, Purdue University has created a “data services librarian” to help with grant writing and meeting funders’ requirements. But libraries and publishers have not provided a unified user experience because of different fulfillment options, metadata models, etc.  Legacy databases must satisfy users’ needs for content on a variety of devices.  A&I services cannot rest on their laurels and continue to depend on growth in their markets.  They must invent new ways of explaining their value proposition and of participating in the semantic web.

Emerging Players in Information Discovery

Representatives of two new players in the discovery services area, Molecular Connections and Mendeley, described their products.  Molecular Connections aggregates content from web sources into a coherent database.  It is the largest A&I company in India and operates in three major areas: mining, representation, and creation of content, particularly in biological and pharmaceutical areas.  Its product, MC-Outlink (see http://www.molecularconnections.com/publishing/en/home/publishing-offerings/mc-outlink) obtains relevant information that is dispersed across web sources, current news, videos, etc., in addition to published scientific content and creates a report in a standard format.  Jignesh Bhate, CEO, estimates that to gather all the relevant information related to a single drug would require 2 to 3 hours or more; thus, MC-Outlink can be a major time saver for researchers.

Jan Reichelt, Co-Founder and President of Mendeley, described how his service extracts data and full text information from a wide variety of sources, annotates it, and aggregates the information in the cloud, thus creating a social layer between people and their research interests. The most relevant content is then pushed to the researcher.  For groups of researchers, relevant articles can be sent to members of the group who can then discuss them in a manner similar to Facebook’s news feeds.  Information is anonymously aggregated and combined into users’ social environments without breaking their privacy.

What would happen if there were no boundaries around social sharing, so that information from the internet was made visible to others?  This might create additional revenue sources for information providers.  Value is created through sharing and embedding, enabling the community to connect.  Some services, such as kleenk.com and openSNP, have begun developing products using Mendeley’s data.

The A&I Services’ Perspective of the New Information Landscape

In a presentation entitled “A&I Services: Enhanced Relevance through Aggregation and Discovery”, Craig Emerson, VP, Publishing, ProQuest, said that much of the A&I business model has not changed, but it goes through times of disruption.  Indexing and tagging must be very good to deal with these times, which are caused by a rapid increase in the number of publications, open access content, personal repositories, and article-by-article publishing.  The A&I services must also compete in the face of published articles saying that Google Scholar is good enough for literature reviews.  Comprehensiveness is important; a recent article noted that the cost of missing an article could be up to 76 days and $10,000.

Because A&I services are the starting point for much academic research and remain the first choice of many researchers, they are holding their own and are still being heavily used.  Editorial relevancy adds significant value.  The changes in content such as new fields (companies, people pictures, materials, document types, etc.) and the need for deep indexing of figures, tables, and datasets are challenging but necessary to add search precision.  Summarization is attractive to many users do not have time to read complete articles and simply scan through them.  Video curation is also becoming more significant.

Libraries are widely recognized as a superior source of quality content, but there is a general lack of awareness of such resources. Discovery services turn the complexity of a library site with lists of databases into a Google-type approach with a single search box.  Discovery services and A&I services are serving two separate needs: A&I services provide precision discipline-specific searching for expert researchers, and discovery services provide quick access to full text.  Both approaches are necessary; however, convenience will always trump content. ProQuest’s Summon service has increased overall use of library resources significantly, and 60% of student users said that it “improved their ability to do research.”  It also had a major impact on usage of A&I databases.

Lynn Willis, Content Development Manager, American Psychological Association (APA), concurred with Emerson and described how similar steps have been taken to improve APA’s PsycINFO database.  New fields, such as Access URLs, DOIs, author email addresses, and cited references, were added.  To cope with the explosion of content, machine-aided indexing technology has reduced indexing time by about 50%, and new databases for grey literature, psychological tests, questionnaires, and computer programs were developed.

Roger Schenk, Content Planning Manager, Chemical Abstracts (CA), noted that CA is now almost exclusively delivered electronically and is rapidly moving into mobile delivery.  It currently covers 63 patent authorities and over 10,000 journals.  Much of the current literature growth comes from patents, and China is a major driver.  Challenges include currency, timeliness, budgetary stresses of customers, and competition from free government databases and search engines like Google.  A&I services must balance their challenges with those of their customers.

Content innovation and technology have significantly simplified scientific literature searching and have provided a new area of opportunity: evaluation.  So CA has added more context and relevance to every patent and journal article abstract.  Recently, graphical abstracts (chemical structures and illustrations) were added, as well as links to the full text.

Ryan Bernier, Director, Subject Indexes, EBSCO Publishing, emphasized that EBSCO’s business is subject indexes, and their full text products mostly began as A&I databases.  Subject indexes are a necessity, and the quality of searches depends directly on the quality of the indexes.

EBSCO is looking at several ways to index and abstract non-textual content because the use of additional materials such as datasets and images is growing.  The Associated Press is working with EBSCO to add images, graphics, and sound bites to the EBSCO Discovery Service (EDS). Open access journals are also being aggressively added, and more records are linking directly to freely available full text.

EBSCO does not contribute to any discovery service, which is standard practice for many subject index providers. It has its own discovery service, but that is only a small part of EBSCO’s business.  Customers of both EBSCO’s search service and EDS can opt for platform blending, making both services work seamlessly together. The main focus is to ensure that subject indexes thrive.

Information Discovery and Future Players

Carl Grant, Associate Dean, Knowledge Services, University of Oklahoma, concluded the workshop with an excellent presentation on the future of information discovery and the roles that A&I services might play in the future.  He listed some of the top trends in the information industry today:

  • Mobile apps will increase.
  • More and more data will be stored in the cloud.
  • Private enterprise app stores will appear and will exert more control over the data.  How will we interface with them?
  • The Internet of Things will emerge, with sensors in mobile phones. Not many apps are available yet in this area.
  • Single warehouses of big data will be abandoned in favor of silos of big data.
  • In-memory computing will enable real-time analysis and transactions.

Discovery interfaces have a large market share; only 11 ARL institutions are not using them.  Libraries have lost a lot of ground, and as the number of librarians decreases, the need for discovery will grow.  Delivery has become the core business of libraries, not discovery.  Our territory is being lost even as we think we are defending it.  An excellent analysis of discovery in the world of libraries by Lorcan Dempsey appeared in Educause Review and should be read by all librarians.[4]

Usage of mobile devices as access devices has now passed that of PCs; in fact, one author predicts that many of today’s young people will never own a PC because a tablet will be all they need. Is all of your content available in the cloud?  Do not underestimate the importance of the unbundling of education and the appearance of MOOCs (massive open online courses) as another point where your data must be available.  Content must be deliverable through a variety of platforms—HTML5, APIs, and web services.  Users are now constantly connected to the web; messages have become shorter; and so have attention spans.  Learning styles are 29% visual, 34% auditory, and 37% tactile, which has implications for content delivery.  Unfortunately, we tend to ignore all but the visual.

Change will continue, so what is the role of A&I services?  We must think bigger!  Some ideas:

  • Create learning courses out of abstracts so people can take a quick course from them.
  • Think about multiple languages.
  • For your content to be discoverable, you must provide support for APIs and web services so it can be widely integrated in numerous delivery platforms.
  • Index and abstract far more than just printed works.
  • Be sure to index open access journals—there is a huge move toward them.
  • Realize the amount of content created by individuals—it will account for nearly 70% of the digital universe in the near future according to IDC.
  • Build user profiles so you can deliver services based on their needs.
  • Video is growing rapidly.
  • Add datasets into current practices.
  • Give students a pathway to deep content by indexing deeper into the web.
  • Think mobile.  Base delivery of information on sensors in mobile phones. Don’t try to dumb down the interface and just squeeze everything on to a smaller screen.
  • If you are not directly facing the user, make sure that your APIs can do that.  Indexing is the key for filtering.
  • Find your unique value contribution—the days of the average are over.[5]
  • How can all these enhancements be done without hiring all the necessary staff?  Some libraries have enlisted users to help in content creation:  creating tags, for example
  • China is coming at us like a freight train!  They are starting to build digital libraries from the ground up.

 

Donald T. Hawkins is an information industry freelance writer based in Pennsylvania. He blogs the Computers in Libraries and Internet Librarian conferences for Information Today, Inc. (ITI) and maintains the Conference Calendar on the ITI website (http://www.infotoday.com/calendar.asp). He holds a Ph.D. degree from the University of California, Berkeley and has worked in the online information industry for over 40 years.



[1] Jinha, Arif E., Learned Publishing 23(3): 258-263 (July 2010) (available at http://alpsp.publisher.ingentaconnect.com/content/alpsp/lp/2010/00000023/00000003/art00008)

[2] Lancaster, F.W., “Does Indexing and Abstracting Have a Future?” Anales de Documentation, No 6, 137-144 (2003).

[3] “Improving the Discoverability of Scholarly Content in the Twenty-First Century,” http://www. sagepub.com/repository/binaries/librarian/discoverabilitywhitepaper

[4] Dempsey, Lorcan, “Thirteen Ways of Looking at Libraries, Discovery, and the Catalog: Scale, Workflow, Attention,” EDUCAUSE Review Online, January/February 2013.  Available at http://www.educause.edu/ero/article/thirteen-ways-looking-libraries-discovery-and-catalog-scale-workflow-attention

[5] Friedman, Thomas L, “Average is Over”, The New York Times, Editorial, January 25, 2012. Available at http://www.nytimes.com/2012/01/25/opinion/friedman-average-is-over.html?_r=0, and “Average is Over, Part II”, August 7, 2012, Available at http://www.nytimes.com/2012/08/08/opinion/friedman-average-is-over-part-ii-.html.

Sign-up Today!

Join our mailing list to receive free daily updates.

You have Successfully Subscribed!

Pin It on Pinterest