The Optical Archive System from Hitachi LG Data Storage fits in a server rack and can contain 10 units, called libraries. Each library
unit contains 100TB of data storage on 500 long-term optical discs. More
information at Rosetta
Users Group 2015: New Sources and Storage Options For Rosetta (slides 13 –
16) or this YouTube video.
Connecting the OAS
and Rosetta Systems:
Once the optical archive was installed in our Library, it
was then connected to our Rosetta system, which was very easy to do and only
took a couple of minutes. In the Rosetta administrative module I created a new File storage
group with the OAS path and the storage capacity. The IE and Metadata storage
groups were left as they were, directed to our library server. The files in those groups are much smaller
and accessed more often than the files. I then added a new storage rule so
Rosetta could determine whether to write the files to our library server, to
our Amazon storage account, or to the OAS.
Write functionality:
When the data is written to the optical discs a fixity check
is done to ensure that the file is 100 % accurate. Once the file is written to
the optical disc, the data is permanent. Even if the system were to go away, the data
discs are permanent and could still be read on any Blu-ray device. I ingested a couple hundred GBs into Rosetta
which were then written out to the OAS discs. (Overall I added over 4 TB of data.) We never encountered any
difficulties with writing data to the OAS. We did try to disrupt or corrupt the
writing process to see if we could get it to fail or to write bad data, but
even our systems engineer with root access was unable to affect the data in any
way.
Normally our test Rosetta system is configured for only a small
number of files, so there is limited processing space, about 45GB. (Our
live production Rosetta system has 2 TB of processing space). Because of the limited
processing space on the test server, I could not run an unrestricted ingest without
filling up the disk space. So I ingested a limited number of items at a time
and then also cleared the processing space before ingesting more. The chart below shows the ingest amounts for
two of the afternoons when the ingest processes were run-each took about 5 hours. (An unrestricted ingest would likely result in at least four
times as many items per day.)
IEs
|
Files
|
GBs
|
8,019
|
50,856
|
352.53
|
6,639
|
48,736
|
346.42
|
Read functionality:
This is an optical device, so I did not know if Rosetta would be able to read the discs. And since
it is an optical device the OAS has to locate the correct disc and load the
disc in a drive to retrieve the data (there are 12 read / write drives for each
library). The retrieval process can take up to 90 seconds. Our Rosetta system is used
as a dark archive, so the retrieval time was not a problem. The question was
whether or not Rosetta would wait while the file was being retrieved or if it
would time out. From the first request, the OAS read functionality worked flawlessly. Rosetta worked
well with the retrieval / access time while the disc was retrieved and the file
read. Once the disc was in the drive, access for any other files on the disc
was about as fast as if it were on spinning disc.
Here is a chart of access times for one of the groups that I checked:
Title
|
Files
|
Item size
|
Access time
|
in Item
|
MB
|
Min:sec
|
|
List of titles of genealogical articles
|
9
|
169
|
1:16
|
Jackson collection image
|
2
|
16
|
1:20
|
Jackson collection image
|
2
|
8
|
0:23
|
John O. Bird children
|
1
|
6
|
1:25
|
Cardston Alberta Temple
|
1
|
5
|
0:18
|
Piano
|
1
|
11
|
1:34
|
F Edwards
|
1
|
2
|
0:14
|
E O Haymond
|
1
|
4
|
0:09
|
Taj Mahal,
|
2
|
14
|
0:27
|
Taj Mahal,
|
2
|
14
|
0:09
|
Millie Gallup
|
1
|
5
|
0:14
|
History of the Lemen family
|
9
|
528
|
0:17
|
The Boynton family.
|
9
|
405
|
0:11
|
Register and almanac
|
9
|
537
|
0:25
|
The crawfishes of the state
|
9
|
214
|
0:14
|
Tank
|
1
|
6
|
0:20
|
Parley D. Thomas
|
1
|
4
|
0:19
|
Blake family : a genealogical history
|
9
|
146
|
0:10
|
From the access time column it is obvious when a new disc is
retrieved, as the time is over 60 seconds. Once the disc has been loaded then
the access time for subsequent files is much lower.These access times are for the master files, which can be
quite large.
The setup process, writing and reading all went extremely
well. The next step was to run an automated fixity check on the OAS files from
within Rosetta.
(Updated to clarify and answer questions.)
(Updated to clarify and answer questions.)
No comments:
Post a Comment