Sharing Research Data—New figshare For Institutions
By Paula J. Hane
While studying for his PhD in stem cell biology at Imperial College, London, Mark Hahnel was frustrated and longed for a way to organize and publish all his scientific research, including all supporting materials, such as raw data, figures, videos, etc. So, in his “spare” time, he taught himself to code and built a concept site to handle his own research data management. Then, he talked to friends and they were interested in his new “figshare” solution. Hosting the tool himself had definite limits so he looked to interested parties for support. figshare first launched in January 2011 and relaunched in January 2012, following support (announced September 2011) from Digital Science, a division of Macmillan Publishers. According to Hahnel, figshare made 200,000 files publicly available during its first year of operation and now has about 1 million research objects available. It has just introduced an institutional version.
The figshare platform enables researchers to make their figures, datasets, images, and videos publicly available, providing the researcher with a “citable, searchable, and discoverable endpoint”—all for free. The company says this is important not only for the supplementary data accompanying one’s experiment, but even negative results. figshare allows users to upload any file format so that figures, datasets, and media can be disseminated in a way that the current scholarly publishing model does not allow.
figshare offers unlimited storage space for data that is made publicly available on the site, and 1GB of free storage space for users looking for a secure, private area to store their research. All data is persistently stored online under the most liberal Creative Commons license, waiving copyright where possible. (All figures, media, poster, papers and multiple file uploads (filesets) are published under a CC-BY license. All datasets are published under CC0.) This allows scientists to access and share the information from anywhere in the world with minimal friction. All research made publicly available at figshare gets allocated a DataCite DOI at point of publication.
Anyone can browse the public figshare data, using a simple interface. You can sort by category, file type (figure, media, dataset, poster, paper, presentations, fileset), and by most recent, most shared, or most viewed.
Asked about discoverability, Hahnel says that, “We spend a lot of time and effort on SEO. The discoverability of figshare content was quantified in January when the DOI provider DataCite started measuring the most accessed of their DOIs each month. Even though DataCite mines DOIs for hundreds of institutions, publishers, and repositories, it was found that figshare had 8 of the top 10 in January, 7 in February, and 9 in March. At which point they stopped measuring. While we do have collaborations with systems like Symplectic and all of the altmetric solutions, we don’t yet work with Mendeley, even though we are good friends with the folks there as they’re just down the road.
figshare is also on Facebook, Google +, Twitter, and Vimeo. An API enables other programs to make use of figshare’s functionality and content. The API is open so that it can be accessed and used by anyone.
figshare is hosted in the cloud using Amazon Web Services to ensure the highest level of security and stability for research data. Amazon S3 stores multiple redundant copies of information so users don’t have to worry about ever losing the master copy. figshare uses Amazon servers in the U.S., U.K, and other locations, such as Australia. This allows for multiple site backups, or the ability to select the location where clients want data to be stored. Hahnel also says that using Amazon brings economies of scale with a lower cost per unit of data stored. And, best of all, “figshare is an out of the box solution so there is only ever conversations with the figshare team, no technical knowledge is needed.”
In addition, figshare and the CLOCKSS archive have partnered to preserve figshare’s publically available content in CLOCKSS’s geographically and geopolitically distributed network of redundant archive nodes, located at 12 major research libraries around the world.
In early 2013, figshare announced a partnership with Public Library of Science (PLOS), an open access publisher. figshare hosts the supplemental data for all seven PLOS journals. In May 2013, it launched portals for two publishers, PLOS and F1000 Research. Hahnel says a portal will be launched soon for Nature and several other deals are in process. It works with publishers to aid visualization, interactivity, and discoverability of research outputs.
In early September 2013, Hahnel announced that figshare was now ready to expand beyond serving individual researchers and publishers and help institutions as well. figshare for Institutions is a simple, user-friendly and cost-effective solution for academic and higher education establishments to both securely host and make publicly available its academic research outputs. figshare allows academic institutions to publish, share and get credit for their research data, hosting videos, datasets, posters, figures and theses in a cost-effective way.
Included are the following features:
- Large amounts of secure, private storage plus unlimited public space
- Simple, institution-wide management and monitoring of all research outputs for institution staff with subject categorization per department
- Access controlled team sharing and collaborative spaces with the ability to add notes and comments to files
- An institutional dashboard with detailed metrics on the impact of publicly available data
- All research outputs can be made citable, visualizable, embeddable, and trackable with one click
- The ability to push research to any internal repository
- Institution–wide compliance with open data requirements of funding bodies
- Dedicated support team
Hahnel explains, “Academics can struggle to organize their research outputs, as I once did. figshare for Institutions integrates into their existing workflow so that their data management requirements are complied with subconsciously. The institutions also benefit by seeing the full reputational impact of all of the research they generate, a huge step up from the silo-ed system that exists within many research organizations at the moment.”
Hahnel says the funder mandates that require institutions to provide self-archiving are the “stick.” But, figshare “is concentrating on the carrots.” Hahnel is passionate about open science and the potential it has to revolutionize the research community.
Since the spring, figshare has been piloting its institutional service at Imperial College in London and it also has development partners in the U.S. and Australia.
I asked Hahnel about a role for librarians. He said, “With regards to librarians, we love them at figshare. These are the people who are innovators within the institutions, who have experience in working with repositories. Ideally, we would like to work with the librarians who could be the champions of open research and research data management in the institutions. They can provide us with the feedback and the new ideas that have already helped us get to the stage we are at.” (Check out the attached image of its last swag giveaway at a conference.)
Pricing for institutions consists of an annual license based on the size of institutions and storage needs. Hahnel says the cost is less than it would be for institutions to deal directly with Amazon. For those institutions already hosting their own institutional repositories for documents, figshare would like to provide storage and management for all other research output.
If research data is contributed by multiple people or a group, it can be privately accessible to those in the group before they are ready to make it available publicly. Hahnel says, “This functionality is provided by the private collaborative spaces, whereby only collaborators selected by the private space owner have access.”
Researchers might find that their research is already on figshare. In order to make figshare a useful tool for researchers, it took research objects from open access publications in order to seed the database. Researchers can claim these articles by clicking the ‘claim this’ button. All claimed articles and their associated metrics will be added to their profile.
Hahnel’s own PhD thesis is currently under a 2-year embargo. He notes in a blog post:
Because of this, at figshare we set about giving users their own private repository to store their research objects. These objects can be uploaded in seconds and all objects are initially held in the private space, from where they can be made publicly available when the user decides. All research is easily tagged and categorizable, so that researchers can filter through their many files to find the one they were looking for in no time at all. Another feature of the institutional offering is unlimited collaborative spaces for users with unlimited collaborators. This folder structure makes sure that the academics are in complete control over who they share their research with.
figshare is a “portfolio company” of Digital Science—a technology company launched in December 2010 to serve the needs of scientific research. Operated by Macmillan Science & Education (headquartered in London with offices around the world), it offers a range of range of scientific technology and content solutions, from intelligent knowledge discovery tools to software applications for the laboratory and decision support systems for managers. Other products for the research community from Digital Science include Labguru, Readcube (from Labtiva), Altmetric, and SureChem.
figshare is an independent body that receives support from Digital Science. The announcement noted that: “Digital Science’s relationship with figshare represents the first of its kind in the company’s history: a community based, open science project that will retain its autonomy whilst receiving support from the division.”
Hahnel stresses that Digital Science provides advice, “but, we have complete autonomy. The buck stops with me.” And, even in a worst case scenario, “the data will persist under the insurance policy of CLOCKSS. I’m so happy we went the commercial route—it’s now a sustainable model.”
Neuroscience researcher Erin McKiernan who blogs about neuroscience, open science, and open access publishing—and is an advisor to figshare—blogged that so far her “experiences have only been positive.” She notes that she’s published a variety of research outputs, has studied the reader interest, and found “that sometimes the work other people find most useful isn’t necessarily what ends up as part of a polished article.” She has found however, some lack of receptiveness by some academics. “I have offered to give talks to several academic groups about figshare and to help them set up departmental or personal profiles. The response is usually positive, but lacking in commitment: “Sure, maybe next month.” Let me clarify that it is not their apathy about figshare per say that disappoints me. I don’t want to force academics to use any particular open platform. I just want them to use something to share their work…And I know how well figshare has worked for me so far. It has helped me to be a more open scientist.”
Another blogger, Jill Walker Rettberg, a professor of digital culture at the University of Bergen, wonders about trusting a private company to keep data safe and archived. “[H]ow long will they be around? How independent are they? What is their plan for monetizing this? This is exactly the sort of system universities should be working together to provide, for example through GÉANT, the European network that among other things provides Eduroam, secure roaming wifi for students and employees at hundreds of universities worldwide.”
Peter Murray-Rust, a chemistry researcher at the University of Cambridge, is familiar with Hahnel and what figshare is trying to achieve in the research data management world. He likes the figshare model and wishes the company well. He says figshare is trying to do now what “should have been done 10 years ago by libraries. Universities don’t put much effort into this area—they tend to buy what they are given.” His main concern is that figshare is now part of a commercial enterprise. “I’m not anti-commercial but I don’t trust any commercial organization not to become a monopoly. I like what [figshare] has done but I don’t necessarily trust Macmillan to look after my best interests in the future. I suspect they’ll look after Macmillan’s interests.”
figshare is one of a number of organizations addressing the challenge of managing research data. Thomson Reuters has convened a Forum of industry experts—and Hahnel is a member—to discuss issues and potential solutions for the scholarly challenges ahead. It published its first output in a recent whitepaper titled “Unlocking the Value of Research Data,” where Forum experts discuss the complexity of the issue and offer recommendations for the future. The Forum recommends creating a consortium of publishers and associations to develop standards that will guide new modes of communicating and sharing data. To get an idea of the scope of the issue, the press release noted that, “The volume of scholarly and scientific research data available is projected to grow by a factor of 44 over the decade from 2010 to 2020, from 0.8 zetabytes (ZB) to more than 35 ZB (1 ZB = 1 trillion gigabytes).”
A Publishing Innovation
figshare was one of five impressive finalists for the ALPSP Award for Publishing Innovation 2013. The winner, announced Sept. 12, 2013 was PeerJ, the open access publisher of the journal PeerJ and PeerJ PrePrints. The award recognizes “a truly innovative approach to any aspect of publication. Applications are judged on their originality and innovative qualities, together with their utility, benefit to their community, and long term prospects.”
Paula J. Hane is a freelance writer and editor covering the library and information industries. She was formerly Information Today, Inc.’s news bureau chief and editor of NewsBreaks. Her email address is [email protected]