Showing posts with label e-books. Show all posts
Showing posts with label e-books. Show all posts

Tuesday, September 15, 2015

EPUB file validator and guidelines

Epubcheck 4.0.0 Available for Download. EPUBZone website. September 8, 2015.
     The latest version of Epubcheck is now available on Github.  This open source tool validates EPUB documents and makes sure they conform to the latest specifications. It is also used to provide validation information at the online idpf validator website and the iBooks Store.

The iBooks site also has support helps for resolving the errors and tips about EPUB namespace and adding alt text to images. Other EPUB guidelines can be found at: EPUB 3 Accessibility Guidelines.

Friday, February 13, 2015

Crystal clear digital preservation: a management issue

Crystal clear digital preservation: a management issue.  Barbara Sierman. Digital Preservation Seeds.
February 1, 2015.
The book Digital Preservation for Libraries, Archives and Museums by Edward Corrado and Heather Lea Moulaison does a great job of explaining to people about digital preservation. "In crystal clear language, without beating about the bush and based on extensive up to date (until 2014) literature, digital preservation is explained and almost every aspect of it is touched upon. " It explains what digital preservation is not (backup, etc.) The point of the book is expressed by the statement:
“ensuring ongoing access to digital content over time requires careful reflection and planning. In terms of technology, digital preservation is possible today. It might be difficult and require extensive, institution-wide planning, but digital preservation is an achievable goal given the proper resources. In short, digital preservation is in many ways primarily a management issue”.
It uses the Digital Preservation Triad to symbolize the interrelated activities of
  • Management-related activities,
  • Technological activities and
  • Content-centred activities.
The book, which is also available as an eBook, has a practical approach and emphasizes that “that digital preservation is important to the overall mission of the organization”, and not just an experimental project.




Saturday, February 07, 2015

Digital Tools and Apps

Digital Tools and Apps. Chris Erickson. Presentation for ULA. 2014. [PDF]
This is a presentation I created for ULA to briefly outline a few tools that I find helpful. There are many useful tools, and more are being created all the time. Here are a few that I use.
  • Copy & Transfer Tools: WinSCP; Teracopy;
  • Rename Tools: Bulk Rename Utility
  • Integrity & Fixity Tools: MD5Summer; MD5sums 1.2; Quick Hash; Hash Tool
  • File Editing Tools: Babelpad; Notepad++; XML Notepad; 
    • ExifTool; BWF MetaEdit; BWAV Reader;
  • File Format Tools: DROID; 
  • File Conversion:  Calibre; Adobe Portfolio;
  • Others: A whole list of other tools that I use or suggest you look at.
    •  PDF/A tools
    • Email tools
 Please let me know what tools you find helpful.

Friday, July 11, 2014

Preserving eBooks

Preserving eBooks. Amy Kirchhoff and Sheila Morrissey. DPC Technology Watch Report 14-01. 01 June 2014.
There is some question as to whether one can even speak of ‘selling’ and, correspondingly, ‘owning’ eBooks. The right to permanent possession, including perpetual access and preservation rights, is the exception rather than the norm in eBook licensing.Libraries and publishers are still experimenting with how to purchase or license eBooks and then how to lend them to patrons

There is concern about the possibility of modification, retraction or withdrawal of an eBook. This
happened in 2009 when Amazon deleted some editions from customers who had purchased some eBooks. Memory institutions need to be able to ensure the stability of eBook content in their collections and maintain control over any withdrawal or de-accessioning of that content.

Preservation of eBooks is not free. It is expensive to identify content for preservation, gather it, perform initial actions on it, and then preserve that content for the long term. Some approaches that exist are:
  • Collective model. Such as HathiTrust.
  • Subscription service. Portico 
  • Government support. The national libraries of the United Kingdom, France, the Netherlands, etc.
Some of the general formats used for eBook Publication include:
  • HTML
  • PDF
  • MOBI
  • EPUB4
  • OEB (Open eBook Publication Structure) superseded by EPUB
  • Microsoft LIT50
  • DAISY
  • Text Encoding Initiative

Recommended Actions for Libraries and other institutions:
  • Specify who has responsibility for preserving eBook content 
  • Co-ordinate with other institutions, to eliminate preservation gaps and avoid duplicating efforts
  • When acquiring or licensing eBook content, ensure the acquisition includes preservation rights, and prohibits DRM technologies in the preservation copy acquired from the vendor;
  • Consider and understand what preservation rights are provided when eBooks are licensed and exactly how long-term access will be ensured by the publisher;
  • Articulate preservation policies for the handling of embedded objects, including articulation of legal rights to the content, and workflow requirements to ascertain preservation risks for that embedded content;
  • Encourage publishers to participate in preservation institutions to ensure the long-term viability of their eBook content; and
  • Invest in maturing existing characterization tools, and extending the toolset. Establish whether there is a preservation requirement somehow to maintain the hardware,



Saturday, June 15, 2013

EPUB for archival preservation: an update

EPUB for archival preservation: an update. Johan van der Knijff's blog on Open Planets.
In 2012  the KB released a report on the suitability of the EPUB format for archival preservation. A substantial number of EPUB-related developments have happened since then, and as a result some of the report's findings and conclusions have become outdated, particularly the observations on EPUB 3, and the support of EPUB by characterisation tools. This blog post provides an update to those findings :
  • Use of EPUB in scholarly publishing
  • Adoption and use of EPUB 3
  • EPUB 3 reader support
  • Support of EPUB by characterisation tools
The use of EPUB is increasing and a number of publishers are all using EPUB 2. Also, a number of organisations representing the publishing industry support EPUB 3, though the actual use of EPUB 3 is still limited.The 2012 report concluded that EPUB was not optimally supported by characterisation tools. This situation has improved quite a lot since that time. EPUB is now included in PRONOM, and DROID.  Overall, EPUB's credentials as a preservation format appear to have improved quite a bit over the last year.


Saturday, May 11, 2013

Tor Books says cutting DRM out of its e-books hasn’t hurt business.

Tor Books says cutting DRM out of its e-books hasn’t hurt business. Megan Geuss.   Ars Technica. May 4, 2013.

Tor Books announced last April that it would only retail e-books in DRM-free formats because its customers are “a technically sophisticated bunch, and DRM is a constant annoyance to them. It prevents them from using legitimately-purchased e-books in perfectly legal ways, like moving them from one kind of e-reader to another."

This week, Julie Crisp, editorial director at Tor UK, wrote that the publisher has seen “no discernible increase in piracy on any of our titles, despite them being DRM-free for nearly a year.”

Tor's 2012 decision was largely applauded by its customers and authors. The authors agreed to a scheme which would allow their readers greater freedom with their novels.


Tuesday, August 16, 2011

Will Kindles kill libraries?

Will Kindles kill libraries? Eugenia Williamson. The Phoenix. July 27, 2011.
[There are many sources discussing this major change for publishing and libraries.  This post really looks at the preservation aspect.]

"Preserving materials for future generations is a big part of why libraries exist in the first place. According to the American Library Association, preservation upholds the First Amendment by contributing to the free flow of information."

But a library can't preserve a book it doesn't own, and many digital works are now being licensed rather than purchased. One company, OverDrive, is a middleman negotiating between the libraries and publishers.  It is unknown if there are long term rights for these materials, and if so, what the rights are, and how this fits into a preservation model.

As reported in Library Journal, that state's library system began using those services in 2006, and last year, that company proposed a new contract that would raise administrative fees 700 percent by 2015.

Kansas has announced their intent to petition for the right to terminate its contract which Kansas believes that it owns the e-books it licensed and has the right to transfer them to a new service provider. If the library cannot do this, they will have spent $568,000 for books it can no longer access, which is more than if they had purchased print copies that they would own.

Friday, August 12, 2011

New Statistics Model for Book Industry Shows Trade Ebook Sales Grew Over 1,000 Percent

New Statistics Model for Book Industry Shows Trade Ebook Sales Grew Over 1,000 Percent. Library Journal. Michael Kelley. August 9, 2011.
A new annual survey of the total U.S. book publishing industry shows growing revenue and exponential eBook sales.

The industry sold 2.57 billion books in all formats in 2010, a 4.1 percent increase over 2008.  Publishers' net sales revenue grew to $27.94 billion in 2010, a 5.6 percent increase over 2008. Net revenue from trade books grew 5.8 percent since 2008, to $13.94 billion.

Within the trade segment, eBooks, again excluding the robust growth that has occurred in 2011, grew from 0.6 percent of the total trade market share in 2008 to 6.4 percent in 2010, which translates to a 1,274.1 percent increase in publisher net sales revenue year-over-year, with total net revenue for 2010 at $878 million. In the same three years, 114 million ebooks were sold, a 1,039.6 percent increase. In adult fiction, ebooks represent 13.6 percent of the net revenue market share.

Online sales became an increasingly important distribution channel. Net sales revenue for content distributed online was $2.82 billion in 2010, a three-year overall growth of 55.2 percent. Net unit sales by publishers to online channels grew 68.6 percent, to 276 million in 2010.

For 2010, overall bricks-and-mortar trade retail remained the largest distribution channel in the United States (40.8 percent). In contrast to the eBook numbers, total net sales revenue of trade hardcovers in 2010 was $5.26 billion, an increase of only 0.9 percent over the three years, and its share of the market declined from 39.6 percent in 2008 to 37.7 percent in 2010. Softcover revenue was up 1.2 percent to $5.27 billion, with a similar decline in market share, and mass-market paperback net sales revenue was down 13.8 percent to $1.28 billion.


Friday, April 16, 2010

Digital Preservation Matters - April 16, 2010

State Of America's Libraries Report 2010. American Library Association. April 11, 2010.

Interesting report about libraries. As the recession continues, Americans turn to libraries in ever larger numbers for access to resources for employment, continuing education, and government services. The local library has become a lifeline of resources, training and workshops. Even in the age of Google, academic libraries are being used more than ever. During a typical week in fiscal 2008, academic libraries in the United States had more than 20.3 million visits, answered more than 1.1 million reference questions, and made more than 498,000 presentations to groups attended by more than 8.9 million students and faculty, increases over the previous years. Over 43% of libraries provide access to locally produced digitized collections.

---

A National Conversation on the Economic Sustainability of Digital Information. Blue Ribbon Task Force on Sustainable Digital Preservation and Access. April 1, 2010. [Silverlight video.]

This page has the agenda and video presentations from A National Conversation on the Economic Sustainability of Digital Information, a recent meeting hosted by the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.

BRTF's Featured Agenda and Presentations:

  • Research Data, Daniel E. Atkins, Wayne Clough,
  • Scholarly Discourse, Derek Law, Brian Schottlaender,
  • Economics of Collectively-Created Content, George Oates, Timo Hannay
  • Commercially-owned Cultural Content, Chris Lacinak, Jon Landau
  • Economics of Digital Information, William G. Bowen, Hal R. Varian, Dan Rubinfeld
  • Summary by Clifford Lynch.

---

How Tweet It Is!: Library Acquires Entire Twitter Archive. Matt Raymond. Blog. Library of Congress. April 14, 2010.

The Library of Congress is digitally archiving every public tweet made since Twitter started in 2006. "Expect to see an emphasis on the scholarly and research implications of the acquisition." Amazing to think what we can "learn about ourselves and the world around us from this wealth of data. And I'm certain we'll learn things that none of us now can even possibly conceive." The Library of Congress has been archiving information from the web since 2000. It now has more than 167 terabytes of web-based information, including legal blogs and political websites.

---

Library of Congress: We're archiving every tweet ever made. Nate Anderson. Ars Technica. April 16, 2010.

Comments about the Library of Congress archiving tweets:

  • There's been a turn toward historicism in academic circles over the last few decades, a turn that emphasizes not just official histories and novels but the diaries of women who never wrote for publication, or the oral histories of soldiers from the Civil War, or the letters written by a sawmill owner. The idea is to better understand the context of a time and place, to understand the way that all kinds of people thought and lived, and to get away from an older scholarship that privileged the productions of (usually) elite males."
  • Digital technologies pose a problem for the Library and other archival institutions, though. By making data so easy to generate and then record, they push archives to think hard about their missions and adapt to new technical challenges."

---

Aligning Investments with the Digital Evolution: Results of 2009 Faculty Survey Released. Roger C. Schonfeld, Ross Housewright. Ithaka. April 07, 2010. [37p. PDF]

An excellent report for academic libraries especially, Faculty Survey 2009: Strategic Insights for Librarians, Publishers, and Societies, that looks at faculty attitudes towards the academic library, information resources, and the scholarly communications system. A few quotes from the report:

  • Faculty most often turn to network-level services, including both general purpose search engines and services targeted specifically to academia.
  • Of all disciplines, scientists remain the least likely to utilize library-specific starting points;
  • Network-level services are increasingly important for discovery, not only of monographs and journals but archival resources and other primary source collections.
  • The library must evolve to meet these changing needs.
  • 90% of faculty members view the library buyer role as very important, 71% and 59% now view the archive and gateway roles as very important, respectively. Archiving is the 2nd highest role.
  • Despite the reported declines in importance of all the library's roles other than as a buyer, the 2009 study saw a slight rise in perceived dependence on the library
  • The declining visibility and importance of traditional roles for the library and the librarian may lead to faculty primarily perceiving the library as a budget line, rather than as an active intellectual partner.
  • Faculty members most strongly support and appreciate the library's infrastructural roles, in which it acquires and maintains collections of materials on their behalf.
  • Faculty members sense of the significance of long-term preservation of electronic journals has steadily increased over time
  • Effective and sustainable models for the preservation of electronic journals must be developed
  • Scholars, regardless of field, indicate a general preference that digital materials be preserved.
  • Less than 30% of faculty members have deposited any scholarly material into a repository; nearly 50% have not deposited but hope to do so in the future
  • Faculty attitudes and practices are at the strategic core. Greater engagement with and support of trailblazing faculty disciplines may help develop the roles and services to serve faculty needs into the future. The institutions that serve faculty must also anticipate them, both to ensure that the 21st century information needs of faculty are met and to secure their own relevance for the future.

Monday, February 22, 2010

Digital Preservation Matters - February 22, 2010

Appraisal Actions and Decisions. Chris Prom. Practical E-Records. February 15, 2010.

Most development work on digital repositories focuses on the requirements of the OAIS reference model. But OAIS doesn’t say how records should be selected for deposit. While each archive has a different focus, selecting records for inclusion in an archive is heavily debated. The appraisal process requires careful and intelligent decision making by a person. When appraising electronic records, several tools are needed:

  • examine, identify, compare, delete, rename, and reorganize records
  • manage information concerning records surveys/assessments.
  • manage submission agreements
  • ensure that appraisal actions are documented.

A set of tools is needed to examine, characterize, delete and possibly, reorder records quickly. This would make it easier to decide if the records are within the scope of the archives policy, then take appropriate actions concerning them.

---

E-Library Economics. Steve Kolowich. Inside Higher Ed. February 10, 2010.

Two studies from the Council on Library and Information Resources examine the implications of libraries changing to digital collections. Libraries seem to be headed in the direction of primarily digital infrastructures but the journey is slow going. Digital standards, such as those for eBooks, are still changing. “While they enjoy the searchability of electronic documents and databases, academics still prefer holding a book in their hands to read it.” The studies point to an average of $4.26 per book per year to keep the book on the shelf. The cost for digital is much less; the digital media repository Hathi Trust stores five million copies at between $0.15 and $0.40 per volume, per year. Books in high-density storage facilities cost only $0.86 per year to keep in usable condition. “The administrators who provide library budgets may be reluctant to fund new facilities to house print collections and may question large expenditures to support both print and electronic formats. Library directors must consider not only the immediate expectations of faculty, but also the long-term goals for the library.”

---

Studies Cite Argument for, Resistance to Increased Digital Library Collections. Library Journal. February 11, 2010.

A reaction to the E-Library Economics article. The keys to success are to communicate with and educate the students and faculty why the changes are important; to emphasize the preservation of resources, security, and the benefits; and to make the electronic resources available without barriers. One concern, the “move to electronic collections requires certainty about access to digital collections and their persistence. Also, removing books would not change the fixed costs of the building. The report authors also acknowledge “that the business model for ebooks remains unsettled and that print plays an important role for resources that don't yet work so well in digital format."

---

Using DROID for Appraisal. Chris Prom. Practical E-Records. February 17, 2010.

DROID is a tool to help archivists identify file formats. But it may be valuable in the appraisal process to help an archivist understand the components of a records series. By running DROID and analyzing the reports, it is possible to identify particular file formats outside of the proposed collection scope, especially useful if they are deep in a directory structure. Specific examples and processes used are outlined.

---

Film Institute launches first digital archive in Wales. BBC News. 9 February 2010.

The British Film Institute has launched its first "digital jukebox" in Wales, allowing people to access its archive. The Mediatheque is already available in England. The system allows people to watch films and TV programs, currently 1,500 titles, from the national archive free of charge; 85% of the titles had not been released on DVD or online.

---

Innovation: We can't look after our data – what can? Tom Simonite . New Scientist. 11 February 2010.

Anyone worried about the fragility of digital data and civilization’s chances to survive would do well to look to their own data stores first. “Most of us today are blithely heading for our own personal data disasters” because of benign neglect. Data is often lost more from disorganization than from a technological catastrophe, though that happens too. Two possible approaches are mentioned: the Self Archiving Legacy Toolkit (SALT); and the Pergamum project. We are in need of tools to help with diverse, disorganized digital archives which are becoming the norm.

---

Court Finds E-Mails Stored on Old Archiving System Reasonably Accessible; Costs Exaggerated. Kroll Ontrack. Recent ESI Court Decisions. February 2010.

A court case where the defendant argued that e-mails archived on the company's "cumbersome" old system were not reasonably accessible. “The court found that the plaintiff should not be disadvantaged since the defendant, a "sophisticated" company, chose not to migrate the e-mails to the now-functional archival system and thus determined that the e-mails were reasonably accessible.”


Tuesday, February 16, 2010

Digital Preservation Matters - February 16, 2010

Library of Congress Digital Preservation Newsletter. Library of Congress. February 2010.
The Library of Congress has:
The International Internet Preservation Consortium has released a web archives registry. The registry provides an overview of member web archiving efforts as well as access. It currently includes 21 archives from around the world.

ALA is launching its first Preservation Week on May 9 - 15, 2010.

---

Results of Digital Preservation Costs Survey now available. Neil Beagrie. 03 February 2010.
The Keeping Research Data Safe 2 survey of digital preservation cost information is now available. The project was to identify institutions with cost information for preservation of digital research data and to conduct a survey of them. The collections will then be the basis of further study. The Summary Analysis of Data Survey Responses can be downloaded as a Word file, as well as each survey response. Survey questions included:
  • Principal data file formats included
  • Size of collection
  • Identification of which types of costs they were tracking
---

The Online Guide to Open Access Journals Publishing. Directory of Open Access Journals. February 5, 2010.
The online guide is now available and updated. It provides practical information and tools for those producing independent Open Access journals. The guide sections discuss: Planning, Setup, Launch, Publish, and Manage. It refers to several other guides, and provides an input ability for others to add their experiences.

---

British Library to offer free ebook downloads. Richard Brooks. Times Online. February 7, 2010.
Over 65,000 19th-century works of fiction from the British Library will be available for free downloads this spring. The library, in partnership with Microsoft, began digitizing items several years ago. They will be available online for free, but printed copies will also be available from Amazon. The online and printed versions will look like the rare 19th-century editions. “Altogether, 35%-40% of the library’s 19th-century printed books — now all digitised — are inaccessible in other public libraries and are difficult to find in second-hand or internet bookshops.” They hope to extend this effort to books out of copyright dating from the early 20th century.

---

Doc or Docx? Which Office Format to Use. Minda Zetlin. Inc. Feb 15, 2010.
The new Office formats have caused user irritation in trying to read the documents. Microsoft has free conversion programs but many refuse to use them. The newer file formats have an x at the end of the file extension, meaning they are based on Extensible Markup Language or XML. With docx, pptx, and .xlsx, Microsoft made a fundamental change with how the files are created. The files also use file compression to reduce the file size and to hopefully reduce the possibility of the full document becoming corrupted. Some suggest keeping the .doc format as the default. Some “save every document in three formats: .doc, .docx, and .pdf.”

---

The National Geospatial Digital Archive: A Collaborative Project to Archive Geospatial Data. Tracey Erwin; Julie Sweetkind-Singer. GIS and Science. February 8, 2010.
This is a collaborative project to collect, preserve, and provide long-term access to at-risk geospatial data. he project partners created preservation environments at both universities, created and populated a format registry, collected more than ten terabytes of geospatial data and imagery, wrote collection development policies governing acquisitions, and created legal documents designed to manage the content and the relationship between the two nodes.” The article was published in the Journal of Map And Geography Libraries. The difference with geospatial data is that it may reside in complex, multi-file objects, and that it “can remain dynamic indefinitely due to the lifetime of the generating program and the need to be periodically reprocessed.” One of the preservation strategies is to attempt to create multiple copies, with varying capabilities. Preserving context is difficult because the data is voluminous. “It is now understood that access is inextricably linked to preservation.” “The results of the NGDA experience are multifaceted. In practical terms, the successful ingestion of data into working repositories is the most significant outcome.”

---


Digital doomsday: the end of knowledge. Tom Simonite, Michael Le Page. 02 February 2010.
Recent doomsday article about the loss of digital data. “The current strategy for preserving important data is to store several copies in different places, sometimes in different digital formats. This can protect against localised disasters such as hurricanes or earthquakes, but it will not work in the long run.” “There really is no digital standard that could be counted on in the very long term….”

Tuesday, February 09, 2010

Digital Preservation Matters - February 8, 2010


Online Recordkeeping: It's All in a Name. Mimi Dionne. Internet Evolution. February 2, 2010.


The born-digital record lifecycle has five stages, in chronological order: creation; distribution and use; storage and maintenance; retention; and disposition or archival preservation. All five stages are important. One of the best practices for born-digital records is uniform file naming protocols, including location, to encourage strong content management. These should align with the records retention policies. Organizations are better off if they select the information they need to retain and destroy what they don’t need. “The benefits of implementing a records program that includes regular records destruction have far-reaching influence not only on compliance issues and maintenance of a company’s IT environment but also the health of its budget.”


---


SPIE to Preserve E-Books in Portico. Press Release. Portico. 2 February 2010.

Portico has agreed with SPIE (the international society for optics and photonics) to preserve its collection of e-books, currently 93 items. It already participates with Portico to preserve its e-journals. Portico now holds over 34,000 e-books and over 10,000 e-journals. The SPIE has also announced the launch of their digital library, which includes 120 SPIE Press titles from the Field Guides, Monographs, and Tutorial Texts series.


---


Long-Term Preservation Of Web Archives – Experimenting With Emulation And Migration Methodologies. Andrew Stawowczyk Long. IIPC. December 2009. [54 p. PDF]

The decision to emulate or migration are largely based on personal beliefs, rather than on any particular evidence. We do not know which of these is more useful in the long term. All objects change over time, so ensuring long-term, useful access to collections requires we first define the most important aspects of an object that needs to be preserved. The “Preservation Intent” may be useful for this, which is what the institution intends to preserve for any given digital object and for how long. Also needed is the creator’s intent, the contextual information and the technical information.

Two possible approaches for institutions may be:

  1. preserve digital objects over the next twenty years;
  2. find means of preserving objects for longer.

Or an approach may include both: preserve items for 20 years while the search for longer preservation mechanisms continues. “Significant properties” means the properties of a digital object that are essential to the representation of the intended meaning of that object.

The author does not recommend either emulation or migration as a perfect solution to the problem at this current time. Also, their findings and recommendations include:

  1. There are no tools suitable for long-term preservation of very large web archives
  2. All preservation actions need to be based on a clearly defined “Preservation Intent”
  3. Migration and emulation offer some time extensions to for short term access to digital objects.
  4. Emulation seems to present higher risks as a long-term preservation methodology.

It is not possible to preserve it all. Priorities need to be established for practical, long-term preservation solutions. The best hope for adequate long-term preservation, lies in continuous and systematic work, researching various preservation methodologies, and improving our understanding of the future use of web archives.

---

Is NAND flash about to hit a dead end? Lucas Mearian. Computerworld. February 4, 2010.

IM Flash Technologies has said that shrinking the technology much further may not be possible because of problems with bit errors and reliability. The number of electrons that can be stored in the memory cell decreases with each generation of flash memory, making it more difficult for the cells to reliably retain data.

---

CNRI Digital Object Repository™. Corporation for National Research Initiatives. 19 January 2010.

(CNRI) has developed a new version of its Digital Object Repository Software. It is open source, flexible, scalable, secure, and has a suite that provides a common interface for accessing all types of digital objects. Redundancy is supported by a mirroring system with software to ensure that replicated objects are kept in sync.