Metadata Tracking System
Chapin Hall is considering the feasibility of creating an open-source full-lifecycle metadata tracking system, tentatively named Metadash (see http://bit.ly/1gfFdOB). Government, non-profit and academic institutions could freely download the Metadash code base and install it on their network to track information about their data holdings. The metadata life cycle begins when when a request is received or a need arises for data that--at this early stage--might be imprecise.
Metadata tracking follows a predictable path:
- A data request or need is refined until it is possible to search for available “candidate” data sets to fulfill the need.
- Available data sets are evaluated for quality, difficulty of processing, cost, sustainability and good fit.
- Data sets that meet applicable criteria are acquired, evaluated in more detail and documented.
- Extract-transform-load (ETL) processes begin, with cleaning, geocoding, aggregation and other transformations coded and documented.
- Finished data tables are documented at the database, table and column levels through human-crafted narrative and automated digital processes. The resulting metadata can be viewed by partners or the general public (if appropriate).
- The desired data elements are made available via a data warehouse, open data portal, APIs or other dissemination methods.
- The metadata that has been tracked in this system can be linked to other metadata documentation by means of meta tags, semantic web technologies or other linkages.
- Our goal is to make it easy to track details about both the data and the process. This project will build on the success of the CIty of Chicago’s Metalicious data dictionary platform, which is maintained here at Chapin Hall.