Australian electronic books to be preserved at the National Library in Canberra under new laws. Clarissa Thorp. ABC. 3 July 2015.
Starting in January of next year digital materials including e-books, blogs, prominent websites, and important social media messages will be collected as a snapshot of Australian life. Under existing copyright laws, the National Library of Australia is able to collect all books produced by local publishers through the legal deposit system. Now with new legislation adopted by the Federal Parliament the Library will be able to preserve published items from the internet that could disappear from view in future. "This legislation puts us in a position where we are able to ask publishers to deposit electronic material with the National Library in a comprehensive way." "So we will be able to open that up and collect the whole of the Australian domain, for websites for example it means we are able to collect e-books that are only published in digital form." This new legislation will expand the Library's digital preservation program and ensure that future collections reflect Australian society as a whole.
This blog contains information related to digital preservation, long term access, digital archiving, digital curation, institutional repositories, and digital or electronic records management. These are my notes on what I have read or been working on. I enjoyed learning about Digital Preservation but have since retired and I am no longer updating the blog.
Showing posts with label electronic resources. Show all posts
Showing posts with label electronic resources. Show all posts
Friday, July 03, 2015
Tuesday, February 17, 2015
AHRQ Public Access to Federally Funded Research
AHRQ Public Access to Federally Funded Research. Francis D. Chesley. Agency for Healthcare Research and Quality. February, 2015.
The Agency for Healthcare Research and Quality's has established a policy for public access to scientific publications and scientific data in digital format resulting from funding through the agency. Preservation is one of the Public Access Policy's primary objectives.
The Public Access Policy includes the following objectives:
Digital scientific data is defined as "the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens."
The Agency for Healthcare Research and Quality's has established a policy for public access to scientific publications and scientific data in digital format resulting from funding through the agency. Preservation is one of the Public Access Policy's primary objectives.
The Public Access Policy includes the following objectives:
- Ensure that the public can access the final published digital documents.
- Facilitate easy public search, analysis of and access to these publications
- Ensure the attributes to authors, journals, and original publishers are maintained.
- Ensure that publications and metadata are in an archival solution.
- Ensure that all researchers receiving grants develop data management plans, describing how they will provide for long-term preservation of and access to scientific data in digital format.
- A plan for protecting confidentiality and personal privacy.
- A description of how scientific data in digital format will be shared
- It must include a plan for long-term preservation and access to the data
Digital scientific data is defined as "the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens."
Tuesday, December 16, 2014
A picture is worth a thousand (coherent) words: building a natural description of images
A picture is worth a thousand (coherent) words: building a natural description of images. Google Research Blob. November 17, 2014.
Google has developed a machine-learning system that can automatically produce captions to accurately describe images the first time it sees them. It can describe a complex scene which requires a deeper representation of what’s going on in the scene, capturing how the various objects relate to one another and translating it all into natural-sounding language. The full paper "Show and Tell: A Neural Image Caption Generator" is here.
Google has developed a machine-learning system that can automatically produce captions to accurately describe images the first time it sees them. It can describe a complex scene which requires a deeper representation of what’s going on in the scene, capturing how the various objects relate to one another and translating it all into natural-sounding language. The full paper "Show and Tell: A Neural Image Caption Generator" is here.
Monday, November 24, 2014
Are libraries sustainable in a world of free, networked, digital information?
Are libraries sustainable in a world of free, networked, digital information? Lluís Anglada. El profesional de la información. 7 November 2014. [PDF]
Interesting article looking at libraries through the stages of modernization, automation and digitization, and at a formula for evaluating the importance of libraries to society. The article concludes that "if the current generation of librarians does not introduce radical changes in the role of libraries, their future is seriously threatened."
The formula proposed is the sustainability is equal to the value divided by the cost, and the value is the use minus the dysfunctions and modified by the perceptions of the library
"S= (U - D + 2P) / C".
"S= (U - D + 2P) / C".
Libraries are changing because of technology and needs, but there is a danger that people will perceive them as unable to provide the information that users demand. If this continues, those funding the libraries will provide less support. The perceptions must change in order for libraries to be sustainable.
Some thoughts from the article:
- Libraries are changing from being a space to store, locate and use books to places where people interact and socialize. This should transform the perception that citizens have of their libraries, seeing them as places to ‘change lives by giving people the tools they need to succeed’.
- Libraries depend on public funding, and their future depends on the perception or mental image of libraries held by administrators and policy makers who allocate budgets
- Libraries used to show statistical data on resources; they must now show their value to those who support them financially
- The emergence of new roles for libraries does not mean that all library services have evolved over time. In the new environment, some traditional strengths of libraries are weakening.
- Library catalogues and automated systems were innovative in the ’80s, but have been stuck in outmoded practices. Users have adapted quickly to the ‘googlization’ of information and do not understand why they should have to look in different places to get a unique solution to an information need.
- Two key elements for future library sustainability: perception and adaptation to a new paradigm
- The perception of libraries remains increasingly attached to the printed book, from 69% of Americans in 2005, to 75% in 2010.
- Libraries may end up being seen as useful only to preserve the past (i.e. the printed book), and consequently of little use to handle digital information.
- The library has been steadily declining in importance in university budgets.
- People sustain libraries because of a positive perception and a feeling that the libraries are important. We believe that society still needs the functions performed by libraries and librarians, but the feeling alone does not make them immediately sustainable.
- We must soon establish a new stereotype of ‘library’ in people’s minds, one that is not based on the physicality of the buildings or books, but focuses on the role of support and assistance in the difficult process of using information and transforming it into knowledge.
- The creation of perceptions of a library and librarian that are associated with assistance regarding information is a contribution that has not yet been made.
- This is the challenge and responsibility for young librarians: to create a new perception of our profession. We must establish a new stereotype of ‘library’ in peoples’ minds, one that
is not based on the physicality of the buildings and books, but on the role of support and assistance in the difficult process of using information and transforming it into knowledge.
Sunday, November 02, 2014
ARMA 2014: The Convergence of Records Management and Digital Preservation
ARMA 2014: The Convergence of Records Management and Digital Preservation. Howard Loos, Chris Erickson. October 2014. [PDF]
Presentation on records management and digital preservation given at the ARMA 2014 conference.
Notes:
Presentation on records management and digital preservation given at the ARMA 2014 conference.
Notes:
- Records Management mission: To assist departments in fulfilling their responsibility to identify and manage records and information in accordance with legal, regulatory, and operational requirements
- RIM Life Cycle to DP Life Cycle
- Challenges and successful approaches
- Storing records permanently with M-Discs
- Introduction to Digital Preservation, challenges, format sustainability, media obsolescence, metadata, organizational challenges,
- Life of digital media
- Best practices and processes
- OAIS model
- Rosetta Digital Preservation System
- Library of Congress Digital Preservation Outreach & Education (DPOE) Network
Thursday, October 30, 2014
NTT Data Digitizes Vatican Library Manuscripts for Online
NTT Data Digitizes Vatican Library Manuscripts for Online. Jun Hongo. Blog: Wall Street Journal, Japan Realtime. Oct 28, 2014.
NTT Data signed an initial agreement with the Vatican to work on about 3,000 of the approximately 80,000 manuscripts owned by the library, which was established in 1475. Some of the items it holds date back to the 2nd century. There are about 50 staff members working on the project, at a cost of about $21.3 million. The finished manuscripts are available online.
NTT Data signed an initial agreement with the Vatican to work on about 3,000 of the approximately 80,000 manuscripts owned by the library, which was established in 1475. Some of the items it holds date back to the 2nd century. There are about 50 staff members working on the project, at a cost of about $21.3 million. The finished manuscripts are available online.
Wednesday, October 15, 2014
Functional Access To Electronic Media Collections using Emulation-as-a-Service.
Functional
Access To Electronic Media Collections using Emulation-as-a-Service. Thomas Bähr, et al. German National Library of Science and Technology. October
14, 2014.
Poster that looks at the emulation of CD collections.
Examines:
- User Layer: the ingest workflow, data evaluation and prioritization, license evaluation, and creation of images.
- Workflow layer: retrieving image, evaluate rendering, technical metadata, object rendering.
- Technical layer: EaaS environments, local resources, resource allocation
This won the Best Poster Award at iPRES 2014.
Saturday, August 10, 2013
Game Walkthroughs As A Metaphor for Web Preservation
Game Walkthroughs As A Metaphor for Web Preservation. Michael Nelson. Web Science and Digital Libraries Research Group. May 25, 2013.
Somethings can't really be preserved digitally, such as computer games, even though it would be possible to create emulators. So for some, the best way to experience the game is though walk throughs on YouTube.
"I think game walkthroughs can provide us with an interesting metaphor for web archiving, not simply walkthroughs of web instead of game sessions (though that is possible), but in the sense of capturing a series of snapshots of dynamic services and archiving them. Given "enough" snapshots, we might be able to reconstruct the output of a black box"
Somethings can't really be preserved digitally, such as computer games, even though it would be possible to create emulators. So for some, the best way to experience the game is though walk throughs on YouTube.
"I think game walkthroughs can provide us with an interesting metaphor for web archiving, not simply walkthroughs of web instead of game sessions (though that is possible), but in the sense of capturing a series of snapshots of dynamic services and archiving them. Given "enough" snapshots, we might be able to reconstruct the output of a black box"
Google Maps is another site that has preservation issues.
"There are a number of issues to be researched to make this easy enough for people to do (many of which our group is investigating), but the popularity of game walkthroughs and their preservation side-effects suggests to me that the web archiving community should be informed by them."
Saturday, June 15, 2013
EPUB for archival preservation: an update
EPUB for archival preservation: an update. Johan van der Knijff's blog on Open Planets. 23 May 2013.
In 2012 the KB released a report on the suitability of the EPUB format for archival preservation. A substantial number of EPUB-related developments have happened since then, and as a result some of the report's findings and conclusions have become outdated, particularly the observations on EPUB 3, and the support of EPUB by characterisation tools. This blog post provides an update to those findings :
In 2012 the KB released a report on the suitability of the EPUB format for archival preservation. A substantial number of EPUB-related developments have happened since then, and as a result some of the report's findings and conclusions have become outdated, particularly the observations on EPUB 3, and the support of EPUB by characterisation tools. This blog post provides an update to those findings :
- Use of EPUB in scholarly publishing
- Adoption and use of EPUB 3
- EPUB 3 reader support
- Support of EPUB by characterisation tools
Sunday, May 12, 2013
ZENODO. Research. Shared.
ZENODO. Research. Shared. Website. May 12, 2013.
ZENODO is a new open digital repository repository service that enables researchers, scientists, projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories. The repository is created by OpenAIRE and CERN, and supported by the European Commission. It promotes peer-reviewed openly accessible research; all items have a DOI, so they are citable. All formats are allowed. There is a 1GB per file size constraint. Data files are versioned, but records are not. Files may be deposited under closed, open, embargoed or restricted access.
It is named after Zenodotus, the first librarian of the Ancient Library of Alexandria and father of the first recorded use of metadata, a landmark in library history. ZENODO is provided free of charge for educational and informational use.
ZENODO is a new open digital repository repository service that enables researchers, scientists, projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories. The repository is created by OpenAIRE and CERN, and supported by the European Commission. It promotes peer-reviewed openly accessible research; all items have a DOI, so they are citable. All formats are allowed. There is a 1GB per file size constraint. Data files are versioned, but records are not. Files may be deposited under closed, open, embargoed or restricted access.
It is named after Zenodotus, the first librarian of the Ancient Library of Alexandria and father of the first recorded use of metadata, a landmark in library history. ZENODO is provided free of charge for educational and informational use.
Monday, May 06, 2013
Digital Preservation Best Practices and Guidelines.
Digital Preservation Best Practices and Guidelines. North Carolina Department of Cultural Resources. Website. April 23, 2013.
This site contains many useful tools and resources for digital preservation:
This site contains many useful tools and resources for digital preservation:
- Naming conventions for files
- How to manage digital files
- Examples of many policies and guidelines
- Electronic records management policies/guidelines by state
- Best practices
- File formats
- Email policies
- Preservation metadata
Saturday, May 04, 2013
Reel to Real: Sound at the Pitt Rivers Museum.
Reel to Real: Sound at the Pitt Rivers Museum. Pitt Rivers Museum website. April 2013.
Reel to Real is the archival sound project at the Pitt Rivers Museum, University of Oxford. [One of my favorite museums.] It describes methods used to digitize wax cylinders, reel to reel tapes, audio cassettes and other formats in the Museum's ethnographic sound archive. Also included are examples of the archival sounds from the collections. This has been done to connect the sounds with "wider collections and to engage diverse audiences".
Reel to Real is the archival sound project at the Pitt Rivers Museum, University of Oxford. [One of my favorite museums.] It describes methods used to digitize wax cylinders, reel to reel tapes, audio cassettes and other formats in the Museum's ethnographic sound archive. Also included are examples of the archival sounds from the collections. This has been done to connect the sounds with "wider collections and to engage diverse audiences".
Saturday, April 06, 2013
Viewshare: Interfaces to our heritage
Viewshare: Interfaces to our heritage. Library of Congress. Website. April 2013.
Viewshare, an open source instance of Recollection, is a free platform for generating and customizing views (interactive maps, timelines, facets, tag clouds) that allow users to experience your digital collections.
Functionality:
Viewshare, an open source instance of Recollection, is a free platform for generating and customizing views (interactive maps, timelines, facets, tag clouds) that allow users to experience your digital collections.
Functionality:
- Ingest collections from spreadsheets or MODS records.
- Generate distinct interactive visual interfaces to digital collections, including maps and timelines, and sophisticated faceted navigation.
- Just copy-paste to embed interface in any webpage. Provide users with new ways to explore content
Wednesday, April 03, 2013
The Digital-Surrogate Seal of Approval: a Consumer-oriented Standard
The Digital-Surrogate Seal of Approval: a Consumer-oriented Standard. James A. Jacobs, James R. Jacobs. D-Lib Magazine. March/April 2013.
"Digital-Surrogate Seal of Approval" (DSSOA) is a proposed way to describe the accuracy and completeness of digital objects that were created from printed books and other non-digital originals. It indicates that the original has been digitized completely and with 100% accuracy. This seal of approval may be applied to a digitized version of an analog original when it accurately replicates the original. To do this, two criteria must be met and verified:
"Digital-Surrogate Seal of Approval" (DSSOA) is a proposed way to describe the accuracy and completeness of digital objects that were created from printed books and other non-digital originals. It indicates that the original has been digitized completely and with 100% accuracy. This seal of approval may be applied to a digitized version of an analog original when it accurately replicates the original. To do this, two criteria must be met and verified:
- Completeness. All pages of the original are fully and completely reproduced.
- Accuracy. The original layout and appearance are preserved. All text is legible and there is no visual degradation when compared to the original.
Tuesday, April 02, 2013
UPDATE 1-Capitol wins digital records lawsuit vs ReDigi start-up.
UPDATE 1-Capitol wins digital records lawsuit vs ReDigi start-up. Jonathan Stempel and Alistair Barr. Reuters. Apr 1, 2013.
U.S. District Judge Richard Sullivan ruled in the case Capitol Records LLC v. ReDigi Inc, U.S. District Court, Southern District of New York, No. 12-00095.
Think you own your downloads? Court deals blow to 'used' digital goods market.
Judge rules digital music cannot be sold 'second hand'
Reselling Digital Goods Is Copyright Infringement, Judge Rules
U.S. District Judge Richard Sullivan ruled in the case Capitol Records LLC v. ReDigi Inc, U.S. District Court, Southern District of New York, No. 12-00095.
- ReDigi was not authorized to allow listeners to use its platform to buy and sell "used" digital music tracks originally bought from Apple Inc's iTunes website.
- This will profoundly affect any digital re-sale marketplace [and digital preservation] by limiting what can be sold as "used" or by forcing sellers to obtain copyright holders' approval before transacting business.
- ReDigi's service "infringes Capitol's reproduction rights under any description of the technology"
- The service "does not deserve protection under the theory of fair use."
- "It is beside the point that the original phonorecord no longer exists. It matters only that a new phonorecord has been created."
- "Because it is therefore impossible for the user to sell her 'particular' phonorecord on ReDigi, the first sale statute cannot provide a defense."
Think you own your downloads? Court deals blow to 'used' digital goods market.
Judge rules digital music cannot be sold 'second hand'
Reselling Digital Goods Is Copyright Infringement, Judge Rules
Sunday, October 07, 2012
Opinion: Why the Bruce Willis Apple iTunes story matters
Opinion: Why the Bruce Willis Apple iTunes story matters. Jonny Evans. Computerworld. September 04, 2012.
Article discussing a possible lawsuit against Apple in order to regain the right to bequeath iTunes music collection. "The music industry isn’t really about music: it’s about formats and distribution. First there was ... sheet music, then 78rpm records, then 45rpm vinyl, Super8, cassette, CD -- and now digital. The only difference between each evolving format is that the industry willfully ignored digital until it was too late for it to completely control music acquired in those formats.
That’s why label bosses (who like to pay artists a mere 10-13 percent of the profits of music releases) stress artist “rights” while insisting on ever more draconian monitoring of the online world in order to ensure distribution of tracks they acquire rights to is controlled."
"If the music industry were about music then every track ever licensed by labels would be made available via all digital services."
The only reason for a label refusing to allow music to be left to friends or family is to "ensure that when the next evolution of music distribution takes place it can ensure we all invest in the same music in a different format." The industry wants to keep an even bigger slice of the overall income while making users pay regular recurring subscription fees to access music that is never actually owned, or in other words, making a system of lifetime rentals. While there is not yet a lawsuit, it does draw attention to a consumer right that’s been quietly obliterated in the digital age: the chance to actually own the collection of digital music.
Article discussing a possible lawsuit against Apple in order to regain the right to bequeath iTunes music collection. "The music industry isn’t really about music: it’s about formats and distribution. First there was ... sheet music, then 78rpm records, then 45rpm vinyl, Super8, cassette, CD -- and now digital. The only difference between each evolving format is that the industry willfully ignored digital until it was too late for it to completely control music acquired in those formats.
That’s why label bosses (who like to pay artists a mere 10-13 percent of the profits of music releases) stress artist “rights” while insisting on ever more draconian monitoring of the online world in order to ensure distribution of tracks they acquire rights to is controlled."
"If the music industry were about music then every track ever licensed by labels would be made available via all digital services."
The only reason for a label refusing to allow music to be left to friends or family is to "ensure that when the next evolution of music distribution takes place it can ensure we all invest in the same music in a different format." The industry wants to keep an even bigger slice of the overall income while making users pay regular recurring subscription fees to access music that is never actually owned, or in other words, making a system of lifetime rentals. While there is not yet a lawsuit, it does draw attention to a consumer right that’s been quietly obliterated in the digital age: the chance to actually own the collection of digital music.
Friday, August 31, 2012
Digital documents debut on historically endangered list.
Digital documents debut on historically endangered list. MaineBiz. August 30, 2012.
Digital documents debut on
historically endangered list for the first time, Maine's most endangered
historical assets now include digital records, according to an annual survey by
the nonprofit Maine Preservation. This was done to emphasize the importance of
the preservation of these documents to others. Electronic documents are in
particular danger because of the ease with which they can be lost or destroyed.
"We've not figured out how to manage this material. There's such a
proliferation that nobody thinks about this material."
Friday, May 18, 2012
The CLIF Project: The Repository as Part of a Content Lifecycle
The CLIF Project: The Repository as Part of a Content Lifecycle. Richard Green, Chris Awre, Simon Waddington. Ariadne. 9 March 2012.
This was a joint project that did an extensive literature review and worked with digital content creators to understand how to deal with the interaction of the authoring, collaboration and delivery of materials. At the heart of meeting institutional requirements for managing digital content is the need to understand the different operations through which content goes, from planning and creation through to disposal or preservation. Repositories must be integrated with the other systems that support other parts of this lifecycle to prevent them becoming yet another information silo within the institution.
The CLIF software has been designed to try and allow the maximum flexibility in how and when users can transfer material from one system to another, integrating the tools in such a way that they seem to be natural extensions of the basic systems. This open source software is available for others to investigate and use.
The repository’s archival capability is regarded as one of its strongest assets, and the role of the repository within a University will be regarded very much in terms of what it can offer that other campus systems cannot. It should not try to compete on all levels. There is a need to clarify better at an institutional level what functionality is offered by different content management systems, in order to better understand how different stages of the digital content lifecycle can be best enabled.
This was a joint project that did an extensive literature review and worked with digital content creators to understand how to deal with the interaction of the authoring, collaboration and delivery of materials. At the heart of meeting institutional requirements for managing digital content is the need to understand the different operations through which content goes, from planning and creation through to disposal or preservation. Repositories must be integrated with the other systems that support other parts of this lifecycle to prevent them becoming yet another information silo within the institution.
The CLIF software has been designed to try and allow the maximum flexibility in how and when users can transfer material from one system to another, integrating the tools in such a way that they seem to be natural extensions of the basic systems. This open source software is available for others to investigate and use.
The repository’s archival capability is regarded as one of its strongest assets, and the role of the repository within a University will be regarded very much in terms of what it can offer that other campus systems cannot. It should not try to compete on all levels. There is a need to clarify better at an institutional level what functionality is offered by different content management systems, in order to better understand how different stages of the digital content lifecycle can be best enabled.
Wednesday, May 16, 2012
Implementing DOIs for Research Data.
Implementing DOIs for Research Data. Natasha Simons. D-Lib Magazine. May/June 2012.
As research becomes more collaborative and global it is also becoming more difficult to manage the large amounts of research data generated daily. The Digital Object Identifier (DOI) system is one way to create persistent identifiers for research data collections and datasets. "Data that is richly described, organised, integrated and connected allows the data to be more easily discovered by other researchers." Identifying such resources allow research data collections and datasets be open and discoverable to others, but there are questions that need to be answered, such as the type of material to get a persistent id, the granularity, whether the landing page or the resource should get the id, who creates and maintains the ids, and for how long. The questions, common to other institutions, should encourage discussion and collaboration.
As research becomes more collaborative and global it is also becoming more difficult to manage the large amounts of research data generated daily. The Digital Object Identifier (DOI) system is one way to create persistent identifiers for research data collections and datasets. "Data that is richly described, organised, integrated and connected allows the data to be more easily discovered by other researchers." Identifying such resources allow research data collections and datasets be open and discoverable to others, but there are questions that need to be answered, such as the type of material to get a persistent id, the granularity, whether the landing page or the resource should get the id, who creates and maintains the ids, and for how long. The questions, common to other institutions, should encourage discussion and collaboration.
Sunday, April 29, 2012
The secrets of Digitalkoot: Lessons learned crowdsourcing data entry to 50,000 people (for free).
The secrets of Digitalkoot: Lessons learned crowdsourcing data entry to 50,000 people (for free). Tommaso De Benetti. Microtask. June 16, 2011.
National Library of Finland launched a project called Digitalkoot, which was a test of crowdsourcing with 50,000 volunteers. The aim was to digitize the National Library’s archives and make them searchable over the internet. The volunteers input data that Optical Character Recognition (OCR) software struggles with (for example documents that are handwritten or printed in old fonts). Digitalkoot relies on machines, humans and a gaming twist.
National Library of Finland launched a project called Digitalkoot, which was a test of crowdsourcing with 50,000 volunteers. The aim was to digitize the National Library’s archives and make them searchable over the internet. The volunteers input data that Optical Character Recognition (OCR) software struggles with (for example documents that are handwritten or printed in old fonts). Digitalkoot relies on machines, humans and a gaming twist.
Subscribe to:
Posts (Atom)