This annual meeting brings together technical and industry experts; LC IT and subject matter experts; government specialists with an interest in preservation; decision-makers from a wide range of organizations with digital preservation requirements; and recognized authorities, industry leaders, and practitioners of digital preservation.
I was thrilled to receive an invitation to this event since one of the things keeping me up at night is how to manage and preserve a monthly average of 2.5 TB of video; not to mention 20+ GB of 3D scans that are in limbo until we can provide a storage solution. It seems that other librarians have the same experience; as one member of the LC IT team stated, ‘after eight years this group is still discussing ideal strategies for digital storage architectures.’
The conference was well-organized by the talented Jane Mandelbaum and her team from LC IT Services and ran like clock-work with a mix of digital preservation specialists and industry solutions experts. There were 30 presentations over the two-days with a mix of lightening talks, extended presentations, and technology forecasting. Most sessions can be accessed here: http://www.digitalpreservation.gov/meetings/storage14.html
There was a core group of experts who have participated for a number of years: Scott Rife, Trevor Owens, Jane Mandelbaum, and Carl Watts (LC); Henry Newman (Instrumental Inc.); David Anderson (Seagate); Sage Weil (Ceph/Red Hat); Robert Fontana (IBM); Ethan Miller (UCSC); Ken Wood (Hitachi); David Rosenthal (Stanford LOCKSS); and Cliff Lynch (CNI).
The first morning had LC infrastructure overviews from Watts and Rife full of interesting information– like did you know the AV Conservation Center now stores over 12 TB per month and has data embedded in 30 streams! Henry Newman, CEO Instrumental Inc., discussed the “State of the Industry”; followed by Anderson and Fontana of Seagate and IBM, respectively. Newman pondered if the needs of the preservation world are too high of a cost since they are such a small market. David Anderson described how Seagate is developing Flash-based storage systems.
Community presentations included the Museum of Modern Art and National Archives. Ben Fino-Radin of MoMA estimates his film collections will grow to 950 TB over the next five years and stressed the need for cheaper storage costs. Ethan Miller of UC Santa Cruz provided a presentation about Flash media as an economic model stating that flash can be competitive with disk storage as it can be made to ‘live’ longer with minimal cost. Miller emphasized that the more successful model is to plan storage costs over 90 years. During the lively discussion that followed, Cliff Lynch (CNI) cautioned that there is a need for careful precision in cost modelling, especially in funding Research Data Management. Some institutions are ‘pre-paying’ for storage costs which may make it difficult to make long-term estimates.
Kestutis Patiejunas gave an interesting overview about the architecture of the “exabytes of data in Facebook’s cold storage”. Here is a picture of the Robotic Control Unit, with each array consisting of 12 BluRay burner/readers. Facebook’s IT team controls the commands and data to read and write from the cold storage service.
While Patiejunas seemed a quiet, thoughtful person it turns out that he is a super-inventor with over 22 patents for Amazon and Microsoft.
The early afternoon session on developments in object storage triggered much discussion following presentations by Anderson (Seagate) about object interface in the drives eliminating storage servers; Newman (Instrumental), Sage Weil (Ceph) on digital preservation with open source; and Chris MacGown of Piston Cloud comparing Ceph and Swift (for multi-realm storage replication); these two projects now encompass 2000 developers and 200 companies.
Sessions about testing and migrating at scale were presented by National Center for Supercomputing Applications, Lawrence Berkeley National Laboratory, and National Library of New Zealand. These were followed by Miller (UCSC) with an alternative to fixed key pre-indexing. Michele Kimpton, CEO of DuraSpace gave an overview about their scalable cloud services and their new partnership with Amazon Glacier as a secondary cloud storage solution. She also examined some of the continuing business challenges concerning DuraCloud: they still have trouble moving large data sets (2+GB) over networks and there is no transparency on data handling policies.
Kara Van Malssen of AVPreserve demonstrated a new online tool the company had created to evaluate the “cost of inaction” to help with determining the return on investment in storage. This and other useful media preservation and storage tools can be found at www.avpreserve.com. Steve Elliot of Amazon Web Services ended the day with a presentation about public sector storage trends. He described AWS S3 as a cloud storage system designed to make web-scale computing easier for developers, whole Glacier provides low-cost archival storage.
At the start of the second day, Eric Breitung described a new process in LC’s media preservation for a non-destructive method for testing degraded magnetic tape. This process uses an IR spectrometer to verify the chemo-metric output and thus, identify degraded portions on the tape without harming playback equipment. Greg Pine of Cuneiform and Ken Woods, Hitachi with Doug Hansen, Millenniata demonstrated data preservation systems.
David Rosenthal, Stanford LOCKSS presented a case that current storage architecture fail to meet the needs of long-term archives and challenged the industry to build new capabilities (this can be read at his blog: http://blog.dshr.org/2014/09/a-challenge-to-storage-industry.html). He predicts the cost of storage per unit will rise so the portion of lost content will also rise due to limited digital preservation budgets. He discussed how current storage costs are distributed in LOCKSS: half the cost is ingest, one-third preservation storage, one-sixth in access.
He also offered sample cost projections: http://blog.dshr.org/2014/03/the-half-empty-archive.html He provided a good explanation of flash media over disk for running queries to access archives.
The program ended with a panel of the vendors and Carl Watts of LC IT with trends and predictions for 2015-18. Some of these include: they predicted at Linear Tape File Systems will fade but many institutions, including LC are currently installing LTFS; end-to-end checksums will not happen quickly; hardware costs will rise; and Intel’s onmi-scale data center will be prevalent by 2020 http://www.intel.com/content/www/us/en/architecture-and-technology/microarchitecture/latest-microarchitecture.html
This was an excellent conference and I encourage anyone with interest in long-term digital storage to contact Library of Congress to attend future meetings.
There are some related conferences readers may be interested in exploring for more information on digital storage and preservation: the International Digital Curation Conference and the National Digital Stewardship Alliance (NDSA) and National Digital Information Infrastructure and Preservation Program (NDIIPPP)
Note: Corrie will be facilitating three sessions at the upcoming Charleston Conference this year: The Library as Publisher: The SELF-e Pilot Project with BiblioBoard Technology, NC Live, and LJ; Preservation of Audio-Visual Collections and Modern Storage Media With Dr. Fanella France, Chief of AV Preservation, Library of Congress; and the Lively Lunch session Authority Control in the Virtual Library which will include opinions from UNCC and OCLC.
And speaking of the Charleston Conference, please feel free to email us at firstname.lastname@example.org if you are interested in a future Charleston Conference workshop or pre-conference on designing storage architectures for digital collections.
Tom Gilson. Test Bio