Earlier this year, researchers from the University of Southampton unveiled a new 5D optical data storage system “capable of surviving for billions of years, with each disc holding 360TB/disc data capacity.” In “opening a new era of eternal data archiving,” the new nanostructured glass system is described as “a very stable and safe form of portable memory,” that would be of value to “national archives, museum and libraries, to preserve their information and records.” 360 terabytes is equivalent to approximately the full-text and contents of 5 billion books. The discs are expected to last, at room temperature for a “virtually unlimited lifetime,” or at 190 degrees Celsius for 13.8 billion years. At this rate, it will soon be possible (if not probable) that all information will eventually be easily available soon …and for eternity.
In a recent New York Times article, George Washington University Law Professor Jeffrey Rosen noted that “the web means the end of forgetting,” leaving us with the problem of finding out “how best to live our lives in a world where the internet records everything and forgets nothing—where every online photo, status update, Twitter post and blog entry by and about us can be stored forever.”
“For some technology enthusiasts,” Rosen continued, “the web was supposed to be the second flowering of the open frontier, and the ability to segment our identities with an endless supply of pseudonyms, avatars and categories of friendship was supposed to let people present different sides of their personalities in different contexts. What seemed within our grasp was a power that only Proteus possessed: namely, perfect control over our shifting identities.” However, that is not what has happened.
Forgetting and Forgiving
Today people are finding some innocent Facebook pictures or Twitter comments can make the job search more challenging. Erroneous information can follow people throughout their lives. This is a concern so great that in Europe various “Right to Be Forgotten” laws have given individuals a process for having such information erased from search results.
In his book Delete: The Virtue of Forgetting in the Digital Age (Princeton University Press, 2010) Viktor Mayer-Schönberger asks his readers “should everyone who self-discloses information lose control over that information forever, and have no say about whether and when the internet forgets this information? Do we want a future that is forever unforgiving because it is unforgetting?”
“But the demise of forgetting has consequences much wider and more troubling than a frontal onslaught on how humans have constructed and maintained their reputation over time. If all our past activities, transgressions or not, are always present, how can we disentangle ourselves from them in our thinking and decision-making? Might perfect remembering make us as unforgiving to ourselves as to others?” Mayer-Schönberger asks. “We know and assume that search engines know a great deal of the information that is available through web pages on the global Internet. Over the years, such easy-to-use yet powerful searches have successfully uncovered information treasures around the globe for billions of users.”
Operating Behind a Curtain of Proprietary Rights
It’s worthwhile to pause and consider the most recent statistics on web searching. As of October 2015, the monthly average of subject searches in Google alone was more than 100 billion—per second that’s 2.3 million ‘search’ clicks. As of May 2016, 60 trillion websites were indexed through this single web index alone. Google’s search index data is estimated at 100 million gigabytes. There is no way to compare this to any previous reference indexing system in human history.
In a recent article in US News & World Report entitled The New Censorship, the news magazine reported that “Google, Inc., isn’t just the world’s biggest purveyor of information; it is also the world’s biggest censor. The company maintains at least nine different blacklists that impact our lives, generally without input or authority from any outside advisory group, industry association or government agency. Google is not the only company suppressing content on the internet. Reddit has frequently been accused of banning postings on specific topics, and a recent report suggests that Facebook has been deleting conservative news stories from its newsfeed, a practice that might have a significant effect on public opinion—even on voting. Google, though, is currently the biggest bully on the block.” With these companies refusing to allow even researchers access to their algorithms and other key structural information on their search engines, libraries (which face severe financial problems today) need to be even more vigilant about the reliance on Google and Google Scholar as alternatives to traditional scholarly databases.
Mayer-Schönberger cautions that behind the curtain of the search page, “search engines remember much more than just what is posted on web pages…..By keeping the massive amount of search terms—about 30 billion search queries reach Google every month—neatly organized, Google is able to link them to demographics. …Details we have long forgotten, discarded from our mind as irrelevant, but which nevertheless shed light on our past: perhaps that we once searched for an employment attorney when we considered legal action against a former employer, researched a mental health issue, looked for a steamy novel, or booked ourselves into a secluded motel room to meet a date while still in another relationship. Each of these information bits we have put out of our mind, but chances are Google hasn’t. Quite literally, Google knows more about us than we can remember ourselves.”
A Role for Information Professionals
The information age has opened up a can of worms for even library professionals. Most major institutions now have copyright librarians, scholarly communication departments, and licensing departments to deal with all of the legal aspects of information acquisition and use today. Just as libraries are starting to look at offering Open Access publishing services to their campuses and open repositories for their work, perhaps it is time for professionals to become more involved in helping our users better understand the complexities of online storage in the age of cloud computing and the difficulty that we all have trying to discern meaning from the rights and privileges sections of user agreements for the major online services that provide resources for storing and maintaining our digital files—and who else may have access to these.
Since this is all in the hands of private companies, the rights of individuals as well as the privileges and rights of these companies is an area that has yet to be defined by law, by case law, or by professional standards. Information professionals’ commitments to the freedom to read and the rights to information could provide a much needed source of expertise and advice in this area—as well as a platform to work for needed legal oversight in the interests of individual users.
Preferences & Prejudices
Research studies are identifying serious issues in terms of how search engine algorithms are using search preferences in developing their systems. In a recent New York Times article, Kate Crawford noted “the very real problems with artificial intelligence today, which may already be exacerbating inequality in the workplace, at home and in our legal and judicial systems. Sexism, racism and other forms of discrimination are being built into the machine-learning algorithms that underlie the technology behind many ‘intelligent’ systems that shape how we are categorized and advertised to.” And recent articles have certainly found significant examples of this.
A study by ProPublica found that “risk scores” obtained on 7,000 people in Broward County, Florida, between 2013 and 2014, were, in fact, “remarkably unreliable” in forecasting actual future violent crime, and, more seriously, were clearly biased against African Americans. The formula, the report concluded, “was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.”
At the March 2016 South by Southwest Interactive conference, algorithmic searching and bias was covered: “While the internet has been heralded as the great equalizer, we are just beginning to learn exactly how discriminatory software can be. Every week seems to unearth a new example of computers betraying us, from ‘smart’ cameras that cannot see dark skin, to Google showing women ads for lower paying jobs than men.” The session looked at “how big data is being used against minorities, and the social biases we are teaching our learning algorithms.” The very basis for algorithmic searching is based on the ability of computers to ‘learn’ about tastes and interests by incorporating user feedback into their future searching strategies. However, we are learning that this user feedback can also reflect, create, or harden societal biases.
A 2013 study by Harvard University’s Latanya Sweeney found that when someone searched in Google for a name normally associated with a person of African-American descent, an ad for a company that finds criminal records was more likely to turn up. This led Sweeney to conclude that we should be “raising questions as to whether Google’s advertising technology exposes racial bias in society and how ad and search technology can develop to assure racial fairness.” It’s possible that the original search algorithms, from Google and other search engines, may have been created to give equal treatment to all requests, but over time, the biases of the people who did the search may have been factored into the ‘learning’ that the algorithm is programmed to do to better refine the results, leading to information that is not only erroneous, but deeply biased and potentially discriminatory.
As Sweeney concludes: “navigating the terrain requires further information about the inner workings of Google AdSense. Google understands that an advertiser may not know which ad copy will work best, so an advertiser may give multiple templates for the same search string and the ‘Google algorithm’ learns over time which ad text gets the most clicks from viewers of the ad. It does this by assigning weights (or probabilities) based on the click history of each ad copy. At first all possible ad copies are weighted the same, they are all equally likely to produce a click. Over time, as people tend to click one version of ad text over others, the weights change, so the ad text getting the most clicks eventually displays more frequently. This approach aligns the financial interests of Google, as the ad deliverer, with the advertiser.” The reliability of the information and the interests of the searcher are compromised at best.
Google’s business and dominance has been built on its algorithms that sort through the websites in order to provide users with a fast and reasonably satisfying set of results according to their searches. The business model is supported by the company’s monitoring of user online behavior developed to learn about user preferences and interests. This information is used by their business partners to place ads most strategically to draw in customers. These ads, placed along the sides or top of search results have been incredibly successful and profitable for the company. However, users have long wondered about this treasure trove of information, based on their online activities, that is gathered this way.
Google Opening User Controls with My Activity?
Over time, due to pressure from users or the official actions of courts, Google has had to factor privacy and the interests of their users into their policies and database development. For many years Google has allowed its users to impose limits on how much data is accumulated about them and how many customized ads they are presented. In 2015, Google opened a new My Activity hub as a method for users to set their privacy and security controls and reduce criticism about their intent and business practice. Google states that the data collection serves to collect data that “helps make Google services more useful for you. Sign in to review and manage your activity, including things you’ve searched for, websites you’ve visited, and videos you’ve watched.”
An article on zdnet.com reports that Google will now be opening up anyone’s “searchable history of almost everything you do online, including Netflix programs you’ve watched, sites you’ve visited, things and places you’ve searched for, as well as activity on each of its products. Users can drill down into certain items to reveal details like search terms, the time a site was visited or search was made, and for example, the browser and device it was done on. Users also have an option to delete items as well.”
“If the user used Chrome to, say, watch a Netflix program,” zdnet.com’s report continues, “Google explains the activity would be saved to Google Account because the Web & App Activity setting was on while using Chrome. Likewise, if video was viewed on YouTube, it explains it was saved to Google Account because the YouTube Watch History setting was on.” My Activity will reportedly include activity on links to Ads, use of Android, Books, Chrome searches, Developers pages, as well as your activity using Finance, Help, Image Search, Maps, News, Now, Play, Search, Shopping, Video Search, and YouTube.
A Google support page, provides a sign-in option in order to see Google’s records on your personal activity from searching/browsing activity on computers, smartphones, or other devices. You are also offered options to opt out of certain types of data collection.
Forgetting & Remembering in the Age of the Internet
In a 2015 study by University of Birmingham psychologist Mary Wimber and colleagues (Alink A, Charest I, Kriegeskorte N, Anderson MC (2015). Retrieval induces adaptive forgetting of competing memories. Nature Neuroscience 18,582-589. DOI:10.1038/nn.3973) studied “how remembering adaptively changes our memories. Retrieving a memory has been shown to have two sides. On the positive side, memories become more stable and permanent each time we reactivate them. On the other hand, remembering can also induce forgetting of related memories. This forgetting is in fact a highly adaptive capacity of human memory: Our brains appear to function on a ‘use it or lose it’ basis, retaining the information that is frequently reactivated, and discarding irrelevant, competing information.” With the huge amount of information that confronts us every day, is it possible for us to rationally rely on our memories, our inherent mental abilities, to store and retrieve key information, experiences, and knowledge needed to interact with our world, our work, and provide context to our world?
Rosen concludes his NYT piece with this observation: “Our character, ultimately, can’t be judged by strangers on the basis of our Facebook or Google profiles; it can be judged by only those who know us and have time to evaluate our strengths and weaknesses, face to face and in context, with insight and understanding. In the meantime, as all of us stumble over the challenges of living in a world without forgetting, we need to learn new forms of empathy, new ways of defining ourselves without reference to what others say about us and new ways of forgiving one another for the digital trails that will follow us forever.” And forever is a very long time indeed.
In a detailed literature review published in Neuroscientist last year, researchers noted that “over the past two decades, a substantial body of work has unraveled important impacts of the internet environment on our cognitive behaviors and structures. In terms of information processing, we are shifting toward a shallow mode of learning characterized by quick scanning, reduced contemplation and memory consolidation…another factor contributing to the shift toward shallow learning is the ease of online information retrieval that reduced the need for deep processing to commit information to memory. Relying on technology as an external memory source can result in reduced learning efforts as information can be easily retrieved later.”
Or, Is Technology Changing Our Minds for the Better?
In Smarter Than You Think: How Technology is Changing Our Minds for the Better (Penguin Press, 2013), Clive Thompson rhetorically asks his readers “is the internet ruining our ability to remember facts?” Thompson notes that “our brains have always been terrible at remembering details. We’re good at retaining the gist of the information we encounter. But the niggly, specific facts? Not so much.” He believes that instead of ruining our memory, we are turning to the internet just as we used to rely on friends or other surrogates for details we are not able to retain. However, Thompson notes that search engine companies are “for-profit firms that guard their algorithms like crown jewels. And this makes them different from previous forms of transactive machine memory. A public library—or your notebook or sheaf of—keeps no intentional secrets about its mechanisms. A search engine keeps many. We need to develop literacy in these tools the way we teach kids how to spell and write; we need to be skeptical about search firms’ claims of being ‘impartial’ referees of information.”
University of Rhode Island biologist Frank Heppner recently posted a paper, “Scientific Literature Searching is a Disaster Today,” on Tomorrow’s Professor blog. His concern was for the fate of older scientific articles, those that form the basis for many of the key philosophical and research methodologies today. “As the old references disappear, or become difficult to find, we will lose the basis upon which many lines of investigation are based, and the opportunity to make spectacular errors will increase.”
“To be fair,” Heppner continues, “I get the impression that if you are working in a very, very narrow field of research, and there is a common consensus about vocabulary, the data base literature search system is wonderful, and much faster than wandering through the library stacks. But if you’re just following a hunch, or think that maybe an area that seems to be peripheral to yours might really be useful, the probability of not making a serendipitous hit, or being drowned in irrelevant specialty papers seems high.” Heppner also questions the ‘black box’ nature of search engines and the rush to accept technological options today. “Younger colleagues I have shared drafts of this paper with suggest that this issue is part of a larger sea change in the way research is performed, documented, and evaluated, with selection of sites for publication decided on the basis of increasingly sophisticated metrics, many of which seem to be manipulable by ‘gaming the system’ one way or another (or being gamed by it, depending on circumstances).”
In her new book on Magic & Loss (Simon & Schuster, 2016) Virginia Heffernan presents her view that what we see in doing a web search isn’t necessarily reality, but what algorithms want us to see. With the internet, things changed dramatically: “While there was still achievement and pleasure in the old media, it was clear too that the dogs had barked; the great caravan that bring the knowledge and ideas that shore up man enterprises had moved on.”
Rather than being an alien invasion, or something “outside human civilization,” Heffernan explains, “it is a new and formidable iteration of that civilization…the internet responds, often with great sensitivity, to critical methodologies. Sense can be made of it. Logic can be divined in it. Politics can be derived from it. Pleasure can be taken in it. Beauty can be found in it. Pain too—and loss. Agony and ecstasy is what I mean: the internet may not be reality, but it’s very real art.”
Heffernan encourages her readers to “risk the pain and scrap our old aesthetics and consider a new aesthetics and associated morality. A new brand of intellectual courage must be brought to envisioning this new symbolic order.”
Taste in the Age of Like
Tom Vanderbilt, in his 2016 book, You May Also Like: Taste in an Age of Endless Choice (Knopf, 2016) looks at the issue of choice in a world that is endlessly offering options and ideas. “We are faced with an ever-increasing amount of things to figure out whether we like or dislike, and yet at the same time there are fewer overarching rules and standards to go by in helping one decide. Online, we swim in the streams of other people’s opinions.” As a part of his research, he interviewed some of the computer scientists who develop the algorithms which create ‘results’ based on all of the data collected from consumer activity on the internet. He notes that the purpose of algorithms is not to generate a list of recommendations based on what we already have or know but to recommend other websites or products based on what others (users or companies) are suggesting or using. Vanderbilt also discusses the user/customer reviews that have become important given the increasing bulk of potential results for customer/reader reviews as another issue of unverified opinions/comments/rants.
In an excellent opinion piece in the New Yorker, Louis Menand describes “What it is like to like: Art and taste in the age of the internet.” In this essay, he suggests that “what makes digitalization different from earlier changes in media, and the reason it is not wrong to call it a revolution, is that a single technology is promising to absorb a huge number of existing technologies, from paper, vinyl, and celluloid to clocks, maps, newspapers, radios, cameras, telephones, lecture halls. If it can be coded, it will end up on the Web or in an app….The internet won’t replace everything, of course, and one day something will replace the internet. By then, we will all be used to it—the analog world will have gone the way of the typewriter and the milkman—and we, or our children, will miss it when it’s gone.”
As we work and live in this new environment, can we better understand the movement from prehistory to the internet, the challenges that we face as a culture transitioning to “the grand challenge facing our world” today: An abundance of information in an age of increasing scarcity of human attention?
In Part 2 of this story, we talk with cultural historian Abby Smith Rumsey, whose recent book, When We Are No More: How Digital Memory is Shaping Our Future, demonstrates that “data storage is not memory; why forgetting is the first step towards remembering; and above all, why memory is about the future, not the past.”
Nancy K. Herther is Librarian for American Studies, Anthropology and Sociology at the University of Minnesota, Twin Cities campus. [email protected]