Error detection of JPEG files with JHOVE and Bad Peggy – so who’s the real Sherlock Holmes here? Yvonne Tunnat. Yvonne Tunnat's Blog. 29 Nov 2016.
Post that describes an examination of the findings of two validation tools, JHOVE (Version 1.14.6) and Bad Peggy (version 2.0), which scans image files for damages, using the Java Image IO library. The goal of the test is to compare the findings from these validation tools and know what to expect for digital curation work. There were 3070 images for the test, which included images from Google's publicly available Imagetestsuite. Of the images, 1,007 files had problems.
The JHOVE JPEG module can determine 13 different error conditions; Bad Peggy can distinguish at least 30 errors. The results of each are in tables in the post. The problem images could not be opened and displayed or had missing parts, mixed up parts and colour problems. The conclusion is that the tool Bad Peggy was able to detect all of the visually corrupt images. The JHOVE JPEG module missed 7 corrupt images out of 18.
This blog contains information related to digital preservation, long term access, digital archiving, digital curation, institutional repositories, and digital or electronic records management. These are my notes on what I have read or been working on. I enjoyed learning about Digital Preservation but have since retired and I am no longer updating the blog.
Showing posts with label JHOVE. Show all posts
Showing posts with label JHOVE. Show all posts
Saturday, December 10, 2016
Saturday, August 29, 2015
Update on JHOVE
Update on JHOVE. Gary McGath. Mad File Format Science blog. August 27, 2015.
Open Preservation Foundation has accepted the stewardship of JHOVE, and Carl Wilson has made impressive progress. Changes include reorganizing the code and making installation more straightforward.
Open Preservation Foundation has accepted the stewardship of JHOVE, and Carl Wilson has made impressive progress. Changes include reorganizing the code and making installation more straightforward.
Thursday, June 18, 2015
File identification tools, part 5: JHOVE
File identification tools, part 5: JHOVE. Gary McGath. File Formats Blog. June 11, 2015.
JHOVE is a tool that identifies and validates AIFF, GIF, HTML, JPEG, JPEG2000, PDF, TIFF, WAV, XML, ASCII, and UTF-8 files. Unrecognized files are called a “Bytestream.”
Key concepts in JHOVE are “well-formed” and “valid.” A file which is “well-formed but not valid” has errors, but not ones that should prevent rendering. JHOVE focuses on the semantics of a file rather than its content. It only reports full conformance to a profile. It won’t tell you why it fell short.
Download JHOVE from GitHub the Open Preservation Foundation; (do not download from SourceForge). Documentation is on the OPF website. A developer's guide is also available: JHOVE Tips for Developers.
It shouldn't be confused with JHOVE2 which does similar things but has a different code base.
JHOVE is a tool that identifies and validates AIFF, GIF, HTML, JPEG, JPEG2000, PDF, TIFF, WAV, XML, ASCII, and UTF-8 files. Unrecognized files are called a “Bytestream.”
Key concepts in JHOVE are “well-formed” and “valid.” A file which is “well-formed but not valid” has errors, but not ones that should prevent rendering. JHOVE focuses on the semantics of a file rather than its content. It only reports full conformance to a profile. It won’t tell you why it fell short.
Download JHOVE from GitHub the Open Preservation Foundation; (do not download from SourceForge). Documentation is on the OPF website. A developer's guide is also available: JHOVE Tips for Developers.
It shouldn't be confused with JHOVE2 which does similar things but has a different code base.
Wednesday, March 25, 2015
JHOVE Evaluation & Stabilisation Plan
JHOVE Evaluation & Stabilisation Plan. Open Preservation Foundation. March 2015.
JHOVE is an extensible software framework for performing format identification, validation, and characterization of digital objects. In February the JHOVE format validation tool was transferred to Open Preservation Foundation stewardship. Their initial review of JHOVE has been completed and the Evaluation & Stabilisation Plan is now available on the site.
JHOVE is an extensible software framework for performing format identification, validation, and characterization of digital objects. In February the JHOVE format validation tool was transferred to Open Preservation Foundation stewardship. Their initial review of JHOVE has been completed and the Evaluation & Stabilisation Plan is now available on the site.
The main objective of our work to date has been to
establish a firm foundation for future changes based on agile software
development best practises. A further technical evaluation will be published in April
that will also outline options for possible future development and maintenance
tasks.
Tuesday, February 03, 2015
Open Preservation Foundation to provide sustainable home for JHOVE
Open Preservation Foundation to provide sustainable home for JHOVE. Becky. Open Preservation Foundation Blog. 3 Feb 2015.
The Open Preservation Foundation is taking stewardship of the JHOVE preservation tool and providing a sustainable home. The tool will become part of the OPF software portfolio
and follow their Software Maturity Model. Portico is contributing code improvements that they have made to the tool. Other tools in the portfolio include:
The Open Preservation Foundation is taking stewardship of the JHOVE preservation tool and providing a sustainable home. The tool will become part of the OPF software portfolio
and follow their Software Maturity Model. Portico is contributing code improvements that they have made to the tool. Other tools in the portfolio include:
- Jpylyzer: JP2 image validator and properties extractor
- FIDO: command-line tool to identify the file formats of digital objects.
- Matchbox: duplicate image detection tool
- xcorrSound: four tools to improve Digital Audio Recordings
Monday, January 19, 2015
Ensuring long-term access: PDF validation with JHOVE?
Ensuring long-term access: PDF validation with JHOVE? Yvonne Friese. ZBW - Leibniz Information Centre for Economics. PDF Association. December 17, 2014.
JHOVE is an open source tool for identifying, characterizing and validating twelve common formats such as pdf, tiff, jpeg, aiff and wave. Pages within a PDF file are usually stored as a page tree, allowing the user to reach a given page as quickly as possible. Common advice for long-term archiving is to preferentially use the PDF/A format. However, this no longer matches to the day-to-day reality of many workflows which use JHOVE for validation tests. The differences between PDF and PDF/A means that there there can be validation errors. JHOVE’s PDF module is certainly capable of validating PDF/A files but the feature does not work well. The process does not analyze the content of the data streams, meaning that it cannot validate PDF/A compliance in line with ISO standards. JHOVE is not suited to PDF/A validation but there currently are no alternatives to JHOVE for validating standard PDFs.
JHOVE can still be useful, provided users understand its error reports and are aware of ways to resolve them. Even with the problems JHOVE remains an excellent option for providing initial guidance.
[In our own institution, we have found JHOVE to be useful in identifying PDF files that have potential problems. Each problem for each source needs to be examined to decide if there is a preservation risk.]
JHOVE is an open source tool for identifying, characterizing and validating twelve common formats such as pdf, tiff, jpeg, aiff and wave. Pages within a PDF file are usually stored as a page tree, allowing the user to reach a given page as quickly as possible. Common advice for long-term archiving is to preferentially use the PDF/A format. However, this no longer matches to the day-to-day reality of many workflows which use JHOVE for validation tests. The differences between PDF and PDF/A means that there there can be validation errors. JHOVE’s PDF module is certainly capable of validating PDF/A files but the feature does not work well. The process does not analyze the content of the data streams, meaning that it cannot validate PDF/A compliance in line with ISO standards. JHOVE is not suited to PDF/A validation but there currently are no alternatives to JHOVE for validating standard PDFs.
JHOVE can still be useful, provided users understand its error reports and are aware of ways to resolve them. Even with the problems JHOVE remains an excellent option for providing initial guidance.
[In our own institution, we have found JHOVE to be useful in identifying PDF files that have potential problems. Each problem for each source needs to be examined to decide if there is a preservation risk.]
Subscribe to:
Posts (Atom)