"At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:
Basic Principles of Working with Power Tools
- Create a Sandbox Environment: have backups. It is ok to break things
- Think Algorithmically: Break a big problem down into smaller steps
- Choosing a Tool: The best tools, works for your problem and skill set
- Document: Successes, failures, procedures
- as long as you know how to recognize and undo them!
- view mistakes as an opportunity
- mistakes can teach you as much about your data as about your tool
- share your mistakes so others may benefit
- realize that everybody makes them
- Know the applicable standards
- Know your data
- Know what you want
- Normalize your data before you start a big project
- The problem is intellectual, not technical
- Use the tools available to you
- Don’t do what a machine can do for you
- Think about one-off operations vs. tools you might re-use or re-purpose
- Think about learning tools in terms of raising the level of staff skill
- XPath
- Regex
- XQuery
- XQuery Update
- XSLT
- batch
- Linux command line
- Python
- AutoIt
No comments:
Post a Comment