Showing posts with label JPEG 2000. Show all posts
Showing posts with label JPEG 2000. Show all posts

Monday, August 31, 2015

Beyond TIFF and JPEG2000: PDF/A as an OAIS Submission Information Package Container

Beyond TIFF and JPEG2000: PDF/A as an OAIS Submission Information Package Container. Yan Han. Library Hi Tech. 2015.
     PDF/A can be used as a file format, but it can also be used as OAIS SIP containers. The PDF/A open standards can "simplify digitization process, reduce digitization cost, improve production substantially and build more confidence for preservation and access." PDF/A can be used as an Archival Information Package container.

The three main goals of PDF/A are to:
  • provide a way to present the appearance of documents independent of the tools and systems used
  • provide a framework for recording the context and history of electronic documents in the metadata
  • define a framework for representing the logical structure of electronic documents within conforming files

A typical SIP may consist of a directory containing the following information"
  • Content: 
    • Preservation master files (such as TIFF images files). 
    • Access files (such as a PDF or JPG / JPG2000 files).
    • Other content (such as OCR data).
  • Preservation description: 
    • Preservation metadata in the TIFF header
    • Other structural and technical metadata
    • Checksum files.
  • Packaging information: 
    • Directory and File naming, structural metadata.
  • Descriptive information: 
    • Descriptive metadata saved in digital management system, catalog, or textual/XML files.
"The key requirement of PDF/A is that it is self-described and self-contained so that it can bereproduced exactly the same way with different software in various platforms." It will include all information needed to display the content in the PDF/A file (text, images, fonts, and color profiles).

Master file formats should be non-proprietary, open and documented international standards that are  commonly used. The files should be unencrypted, and should be uncompressed or else use lossless compression. The author of the article recommends using PDF/A as the preferred file format for text and image files, and possibly using it as an OAIS SIP container. The author shows how PDF/A is a better file format than the currently preferred TIFF or JPEG2000 formats.

There are several issues with PDF/A naming and implementation. The most critical need is reliable open source software for producing and validating PDF/A files.

Thursday, March 26, 2015

Release of jpylyzer 1.14.1

Release of jpylyzer 1.14.1 versionNational Library of the Netherlands / Open Preservation Foundation. 25 March 2015.
Release of a new version of jpylyze. The tool validates that a JP2 image really conforms to the format’s specifications. It also is a feature (technical characteristics) extractor for JP2 images. Changes include: Improved XML output and Recursive scanning of directory trees.


Thursday, May 17, 2012

A prototype JP2 validator and properties extractor.

A prototype JP2 validator and properties extractor. Johan van der Knijff. The Open Planets Foundation.  14 December 2011, Update March 2012.   
The site includes a discussion of the full-fledged JP2 validator tool, as well as how it validates the object. A prototype is now ready in the form of the jpylyzer tool. The tool is both a validator and a properties extractor.

Thursday, September 08, 2011

Paper on JPEG 2000 for preservation

Paper on JPEG 2000 for preservation. Johan van der Knijff.  National Library of the Netherlands (KB). Johans blog. Open Planets Foundation.  June 2011.  This paper, published in D-Lib Magazine, looks at the suitability of the JP2 format for long-term digital preservation.  He identifies issues will be addressed in an amendment to the standard.  It also provides some practical recommendations that may help in mitigating the risks for existing collections.

Monday, October 26, 2009

Digital Preservation Matters - 23 October 2009

Sidekick Data Restoration Has Started, Microsoft Says. Barry Levine. NewsFactor. October 20, 2009.

Danger, a Microsoft subsidiary using ‘cloud computing’, experienced a system problem that erased all the users' contacts, calendar entries, to-do lists, and photos for those using the Sidekick smart-phone. Much of the data may be eventually recovered, but effective data backup and protection measures were not being followed. It shows the importance of using reliable vendors and have data backups. [This is the first major loss of ‘cloud – data’ that I know of.]

---

Millennial disc guarantees data preservation. Logan Bradford. Daily Universe. September 15, 2009.

Barry Lunt, a BYU information technologies professor, will launch a product with the company, Millenniata, that produces a disc just like a CD or DVD that will last up to 1,000 years. He learned, through his seven years working for IBM in computer data, that data on CDs and DVDs would decay and be lost over just a few years because of optical discs’ ephemeral qualities, such as when they are exposed to sunlight and humidity. [We have been testing these discs and writers.]

---

Wellcome Library to use JPEG2000 image format. Library blog. September 18, 2009

The Wellcome library in London has been using TIFF images as their archival storage format. But, anticipating adding over 30 million images, they wanted to find a way to efficiently store the digital content but still maintain high levels of quality and open standards required for long-term preservation. To do this they have chosen to use the JPEG2000 format in its digitization program. But the difficulty is that the JPEG2000 format has multiple versions. They wanted to know which version is best for long-term storage and access, so they commissioned a study by Kings College: JPEG 2000 as a Preservation and Access Format for the Wellcome Trust Digital Library. Robert Buckley, Simon Tanner.

Based on the study will adopt a "visually lossless" lossy compression to gain at least 75% storage savings in comparison to a TIFF version. “The recommended compression parameters will produce an image with no visible difference in image quality, but the compression is irreversible - i.e. the original bit stream will not be possible to reconstruct. As the Library will be digitising physical items that can (if necessary) be re-digitised, it was considered an acceptable compromise.” Some materials may be candidates for JPEG2000 lossless compression. They are also recommending that “JPEG 2000 be used with multiple resolution levels.”

---

The Swedish Research Council requires free access to research results. Press release. October 8, 2009.

Researchers granted funds by the Research Council should publish their scientific research in publications that are available according to Open Access guidelines within a maximum period of six months. "We consider that publication of research which has been paid for out of public funds should be made freely accessible to all." The Open-Access rules apply so far only to scientifically assessed texts in journals and conference reports, and not to monographs and chapters of books.

---

Sound archive of the British Library goes online, free of charge. Mark Brown. The Guardian News. 3 September 2009.

The British Library has made its archive of world and traditional music freely available on the internet. The Archival Sound Recordings archive contains about 28,000 recordings, estimated at 2,000 hours of sound. These recordings are from around the world and the oldest are from wax cylinders made in 1898. The Library wants to change the perception that “things are given to libraries and then are never seen again – we want these recordings to be accessible."

---

Keeping Research Data Safe2: Data Survey added to project website. Neil Beagrie. Blog. 26 Sep 2009.

Information about the project and link to the website. The project is to identify long-lived datasets for the purpose of cost analysis will be ending soon. It refers to the previous project. In the activity model it mentions it will look at the development of an archive’s selection policy, also staff training and development. One area of concern was of OAIS terminology potentially being a barrier to understanding for some user groups.

Tuesday, April 07, 2009

Digital Preservation Matters - 03 April 2009

Nevada Statewide Digital Initiative. Website. Updated 3 April 2009.

The purpose of the Nevada Statewide Digital Initiative is to: “Increase access to the collections held by Nevada's cultural heritage institutions through digital access to materials by residents of Nevada and scholars and researchers interested in Nevada's culture and history.” The series of activities to build statewide collaboration include:

  1. creating a collection policy;
  2. creating a website that links existing projects;
  3. adopting statewide best practice and standards;
  4. creating local partnerships that would build up to statewide partnerships;
  5. developing a digital pilot project curate and manage their digital materials.

Millenniata continues to make progress with its patent-pending Millennial Disc and Millennial Writer. Press Release. February 2, 2009.

This press release has information about a new optical disc that has been developed. It is designed to be a permanent archiving product that has no degradable components and “safely stores data for 1,000 years”. The technology makes a permanent change to the disc. It is referred to as Write Once Read Forever™ and can be read in a standard DVD drive. [check back for test results.]


Systemwide organization of information resources: a multiscalar environment. Lorcan Dempsey. Higher Education in a global economy: the implications for technology and JISC. 23 March 2009. [pdf presentation]

Interesting presentation that looks at libraries and their environment. Compares core components of companies and libraries. Examines a grid of Uniqueness and Stewardship, from Freely accessible web resources in the low-low quadrant to Special collections in the high-high quadrant, and shows where preservation appears. Moving from the institution to the multiscalar level.


Digital Project Staff Survey of JPEG 2000 Implementation in Libraries. David Lowe, Michael J. Bennett. University of Connecticut. March 20, 2009. [xls]

Preliminary findings of a survey about JPEG 2000, and to understand the community perception of it. JPEG 2000 is the product of efforts for an open standard. The concerns about implementing JPEG 2000 include: limited software tools, lack of functionality, and uncertainty of need. Some survey results of interest:

  • 59.5% said they use the format,
  • 19.7% use for new archival collections,
  • 16. 3% use for converting tiff collections
  • 53.5% use for online access

Other questions discuss the tools used and include comments about them.


Rocks Don't Need to Be Backed Up. Henry Newman. Enterprise Storage Forum. March 27, 2009.

General article about the need for digital preservation. “The first thing we need is a standardized framework for file metadata, backup and archival information.” “The integrity of modern data is not guaranteed except at high cost.” “We have no real framework to change and transcribe formats.”

[This is more about transferring information between computer systems rather than archival metadata. It shows the lack of interaction between digital preservation worlds. Some of the comments about the article are interesting.]


Goodbye, Encarta. A cautionary tale for newspapers? John Yemma. The Christian Science Monitor. March 31, 2009.

An article about how Wikipedia replaced the Encarta digital encyclopedia and what that points to. What Encarta did not do was to embrace the power of the internet, which includes almost instant updating. The “lesson is that general knowledge … can’t withstand an effort that was developed specifically for the Internet and that harnesses gifted amateurs.” There is power in open-source knowledge. Organizations can take their values with them, but it can’t take the old model, nor the old work habits. “The Web is its own universe with its own rules.”


INSIGHT into issues of Permanent Access to the Records of Science in Europe. PARSE.Insight. March 27, 2009. [pdf]

This document is to give an overview and details of technical and non-technical components which would be needed for science data infrastructures. The infrastructure components are aimed at bridging the gaps between areas of functionality, developed for particular projects, separated by either discipline or time. These components should play a unifying role in science data. They are developed within a European wide infrastructure, but there should also be advantages if these components are used more widely. The group has defined four main roles: funding, research, publishing, and storage/preservation.

Science Data Infrastructure: those things, technical, organization and financial which are usable across communities to help in the preservation, re-use and open access of digital holdings.

Preservation: meant in the OAIS sense of maintaining the usability and understandability of a digital object.

Representation Information: the OAIS term for everything that is needed in order to understand a digital object.

The report discusses some major threats. Those who responded marked these as “Important” or “Very Important”:

  1. Users unable to understand or use the data e.g. the semantics, format, etc
  2. Not able to maintain hardware, software or environment to make the information inaccessible
  3. No chain of evidence causing uncertain provenance or authenticity
  4. Access and use restrictions may not be respected in the future
  5. Inability to identify the data location
  6. The current data custodian may cease to exist
  7. Those responsible to look after the digital holdings may let us down

Any of the components must be able to be handed to another organization, and the Persistent Identifiers must transfer and resolve correctly. In general it is not possible to state that an object is authentic, other than providing evidence, such as technical details, to show the provenance of the object, or a social decision of trust.