(Part 1 of a 2 part article. Click here to view part 2)
Collecting data has become so commonplace—virtually anyone can do it—but managing, interpreting and sense-making is the critical component. Today business schools across the country—and across the world—are developing MBA and doctoral level programs in Business Analytics, which combines traditional statistics and predictive modelling with Management Information Systems and marketing.
Today detailed data about each of us floats around the web, captured and ‘owned’ by companies who use this for commercial purposes—and hopefully without major security breaches. This is something that should be of concern to anyone today. And, we can’t avoid it as more and more services move to web-only existence.
In 2012, the New York Times ran a story outlining how Target was able to discern through “Big Data” analytics that a teenager was pregnant before even her parents knew. How did the company find this out? Today we are all being tracked—in our homes as we watch Netflix or use online games; as we walk down the streets of many cities today; as our smartphones track our physical movements and our telephone use patterns and connections; as we walk around in stores stopping to look at merchandise; and as we use online systems to search for information, shop for products, or communicate via Twitter or Facebook. CIO magazine recently estimated that “one-fifth of organizations store more than 1 petabyte of data.” A petabyte—one million gigabytes—is the equivalent in information to what could be contained in 20 million 4-drawer file cabinets, nearly 60,000 movies or more than 13 years of HDTV content.
And this isn’t new. Face recognition and hand geometry were used at the 1996 Atlanta Olympic Games where hand geometry systems were used to both control and protect physical access to the Olympic Village. This year, Intel announced a new app that will use facial recognition instead of passwords to get you into websites to avoid using passwords. Your fingerprints can also be used. Smartwatches connect to the cloud offering fitness tracking, a variety of apps, and calendaring/notification systems—all tied to cloud accounts that gather and store this data. Cameras track our daily movements—including police body cams that are increasingly being used across the country in reaction to Ferguson types of situations. Ebook companies are collecting data on what we read, how long we read, how far into our books we read, and can capture our notations and comments. Today there is little about anyone of us that is not being captured, linked, stored, and used.
Keepers of the Data
Today these incredible amounts of data are being manipulated and studied by ever faster computers and optimization methods. All this makes it possible to apply advanced analytical methods to problems that were impossible 10 or 15 years ago, transforming even minute pieces of data to both more detailed information for better decision-making but also highly detailed private stores of information on every aspect of our lives.
Babson College’s Thomas Davenport is Director of Research at the International Institute for Analytics, a Senior Advisor to Deloitte Analytics, and author of the critical text, Keeping Up with the Quants: Your Guide to Understanding + Using Analytics (Harvard University Press, 2013). “Big data and analytics based on it promise to change virtually every industry and business function over the next decade,” he believes. “The potential of big data is enabled by ubiquitous computing and data-gathering devices; sensors and microprocessors will soon be everywhere. Virtually every mechanical or electronic device can leave a trail that describes its performance, location or state. These devices, and the people who use them, communicate through the Internet—which leads to another vast data source.”
Analytics: The Present & The Future
Analytics is a deeply interdisciplinary field that includes input from mathematics, statistics, management information systems, computer science, psychology, and operations science. Technology is now allowing for much more precise assessments of huge stocks of data that may provide insights and new modes of behavior and methods of analysis never before imagined. In 2011 the McKinsey Global Institute published a major study entitled Big Data: The Next Frontier for Innovation, Competition and Productivity, which noted that “companies churn out a burgeoning volume of transactional data, capturing trillions of bytes of information about their customers, suppliers, and operations. Millions of networked sensors are being embedded in the physical world in devices such as mobile phones, smart energy meters, automobiles, and industrial machines that sense, create, and communicate data in the age of the Internet of Things”:
- Today, for the cost of $600 you are now able to buy a computer hard drive which can hold all of the world’s music
- More than 30 billion pieces of information are being shared on Facebook each month
- In 2010, more than 5 billion cell phones were in use in the U.S. alone
- In 15 of the 17 business sectors in the U.S. today, companies have access to internal data more extensive than the estimated 235 terabytes of data held in the collections of the Library of Congress
Business Professor Bart Baesens, in his Analytics in a Big Data World: The Essential Guide to Data Science and its Applications (Wiley, 2014), expands on the ubiquity of information and data today: “Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90% of the data in the world has been created in the last two years.”
Datafication is a reality today, as explained by Viktor Mayer-Schonberger and Kenneth Cukier in their 2013 Houghton-Mifflin book Big Data. “Because of smartphones and inexpensive computing technology, datafication of the most essential acts of living has never been easier…Getting the data is becoming easier and less intrusive than ever. In 2009 Apple was granted a patent for collecting data on blood oxygenation, heart rate, and body temperature through its audio earbuds. Like Google a gaggle of social media networks such as Facebook, Twitter, LinkedIn, foursquare, and others sit on an enormous treasure chest of datafied information, that, once analyzed, will shed light on social dynamics at all levels, from the individual to society at large.”
At the same time, new technologies are emerging to organize and make sense of this avalanche of data. We can now identify patterns and regularities in data of all sorts that allow us to advance scholarship, improve the human condition, and create commercial and social value. The rise of big data has the potential to deepen our understanding of phenomena ranging from physical and biological systems to human social and economic behavior. And an avalanche it is. AT&T recently opened its Big Data Center of Excellence in Plano, Texas. The company is using this facility to process their business information, which they describe as 10 million columns of structured data for 62,000 services. That’s big data!
Piyanka Jain and Puneet Sharma, in their book Behind Every Good Decision: How Anyone Can Use Business Analytics to turn Data into Profitable Insight, stress the fact that “analytics is not rocket science, [that] not all analytics problems need to be megacomplex projects with complex models built and read by a data scientist. Their methodology encourages the use of intuition (a la Sherlock Holmes) along with data to result in insights that lead to informed solutions, impelling “positive action.”
B-Schools Quickly Adopt Big Data Analytics
Today Business Analytics—combining these stores of data with advanced statistical and quantitative analysis, predictive modeling methods, and optimization—have transformed the business enterprise as well as management education. IBM, for example, has created a highly rated consulting area called IBM Business Analytics & Strategy for which “IBM has invested $24 billion to build its capabilities in Big Data and Analytics through R&D and more than 30 acquisitions. Today, more than 15,000 analytics consultants, 6,000 industry solution business partners, and 400 IBM mathematicians are helping clients use big data to transform their organizations.”
In the past five years formal programs in Business Analytics have been established at 120 universities, according to a listing prepared by the Master’s in Data Science website. The field itself, actually, can be traced to the industrial efficiency models of Frederick Winslow Taylor in the late 19th century and Henry Ford’s work to develop car manufacturing production lines. Today with the advent of computers and the availability of so much information on consumers, markets and preferences easily created and available, the science seems to be coming well into maturity. The overwhelming majority of graduate programs are offered through business or engineering programs across North America and Europe. However, many of the courses themselves could have broader application, such as this program from Brandeis University:
Core required courses
RMGT 110 | Organizational Leadership and Decision Making
RSAN 101 | Foundations of Data Science and Analytics
RSAN 110 | Business Intelligence, Analytics and Decision Making
RSAN 120 | Statistics and Data Analytics
RSAN 130 | Strategic Analytics and Visualization for Big Data
RSAN 140 | Social, Web and Marketing Analytics
RSAN 150 | Data Governance, Security, Quality and Ethics
RCOM 102 | Professional Communications
RPJM 101 | Foundations of Project Management
RSEG 171 | Data Warehousing and Data Mining
Go Daddy’s Michele Ufford, who blogs at sqlfool, is a respected and well-known thought leader in the data industry. She sees not only the potential for Big Data for business, but also some of the key issues companies need to address: “There are many reasons why a company may choose not to share their applications of data science with the public. Consider, for example, a hypothetical bookseller who undertakes a project to improve their customer experience. This bookseller analyzes a customer’s book purchase patterns to make suggestions about new authors or recently published books that the customer may be interested in, based on past purchases. Although the algorithms used may generally work well, an analysis identifies some unexpected behaviors—such as a sci-fi and fantasy enthusiast who unexpectedly purchases non-fiction history books in June and December.”
“The company’s original algorithms would conclude the customer has an emerging interest in history,” she continues, “and would proceed to email the customer about history books in the months following the purchase. However, correlating the purchase behavior with the customer’s social network reveals that a close friend of the customer has a birthday in June, and the friend’s profile indicates a strong interest in history and documentaries. The company announces an update to their algorithm that predicts an increased interest in alternate genres near birthdays and holidays. The new algorithm predicts the customer’s interest in an appropriate gift for their close friend and emails the customer about popular history books in June and December, providing the customer with timely and relevant information to improve their gift-giving decision. This is definitely an improved experience for the customer and one which does not put the customer’s privacy at risk. However, if publicly shared, customers may feel uncomfortable to learn that the algorithm is using their social data, and the seller’s competitors would quickly follow suit to implement a similar algorithm. Thusly, the company could potentially negatively impact their customer’s sentiment—the very opposite of their objective—while concurrently losing their competitive advantage.”
We Can All Benefit From Big Data—As Well As Respect Its Power
In a 2014 law review article Chris Jay Hoofnagle notes that “free has become the default price of internet services. But focusing on the price rather than the cost of free services has led consumers into a position of vulnerability. By virtue of paying for services with personal information, users may not even qualify as consumers under consumer protection laws.” Clearly the role of education for this new area of scholarship and practice is key. With challenges from hacking, governmental actions and other sources, this new field has the clear need to develop not only models and metrics but clear best practices and standards to insure the future potential for this profession and provide meaning to all this Big Data being generated.
Brandeis economist Benjamin Shiller urges a degree of caution. “I do not believe there are ethical standards widely employed in industry. This article about a flashlight app provides strong anecdotal evidence that small companies (like app producers) will collect all sorts of data for the sole purpose of selling it. They then typically sell it to data aggregators like Acxiom, who sell and employ the data for all sorts of purposes. As long as they disclose their use of it (even if buried deeply in terms of service), it is perfectly legal. Since so many firms are doing this, it is nearly impossible to steer completely clear.”
“But you do see self-monitoring from large established companies when there are concerns over how privacy scandals might hurt their core businesses if consumers revolt,” Shiller continues. “Google and Facebook, for example, work very hard to avoid intruding on their consumer’s privacy, and be transparent. Google does use all sorts of data for personalizing advertisements. But this is presumably a win-win. Better targeted ads are worth more to companies and may be less annoying to consumers. Google, to my knowledge, has not used their data to the detriment of consumers. For example, they have not used their data to charge some consumers higher than normal prices, or to suggest some consumers who should be denied insurance coverage because they had searched for ‘cancer.’ They also allow app producers to collect all sorts of data, but do force them to transparently list data collected by reporting ‘permissions’ categories.”
“I am not sure if there is research showing consumers punishing firms who intentionally use their data for certain purposes,” Shiller concludes. “President Obama’s proposed Consumer Privacy Bill of Rights would establish a better framework for self-monitoring by (1) creating rules for clear and transparent disclosure of exactly how their data would be used, and (2) providing a clearer/stronger framework of enforcement for violators. While some companies have been targeted by the FTC for using data in a way which was not disclosed, Obama’s proposal would suggest that there are some holes in the law.”
Bentley University Professor of Mathematical Sciences Dominique Haughton agrees that this is an area for more attention from Big Data practitioners. “While these ethical issues are attracting more attention (as they should), I don’t think the profession really has a handle on the matter of ‘data ethics.’ To some extent we do—statements of ethics by the various statistical societies for example—but the jury is still out on what constitutes ethical use of data. Where to draw the line is likely to depend on whom you ask.”
Librarians have been involved with user data since their beginnings and also have much to learn about the ever increasing amounts of data on their users now being generated. In the second part of this series, we turn our attention to how Big Data and its tools are being applied in the academy and public arena as well.
Nancy K. Herther is Librarian for American Studies, Anthropology, Asian American Studies & Sociology at the University of Minnesota, Twin Cities campus. [email protected]