For each:
What was your original source?
What shape is that data? (schema + constraints)
How often is it updated?
Does it have an explicit maintainer?
How did you work with that data?
What shape is the data you extracted from it?
Did you refine or clean it in some way?
If so, what protocol did you use? (option: index it via protocols.io)
What date did you retrieve it, what date was the source last updated?
What derivative datasets did you produce?
Add a line for each: w/ name, permalink, doc, depdendencies
last update, maintainer
What is the update tempo: one-time, time series, patches only?
How is each dataset being used?
Further derived/remixed datasets, visualizations
References, other backlinks
Example:
We have a
Topos-Policy
dataset that we update periodically (link to doc describing time-series plan) derived from the best sources we can find. Currently that is the Kaiser Family Foundation (URL) and the NYT (URL).As of
date
, we visited the Kaiser Family Foundation (URL, source date, retrieved date) and mapped what we found in Tables A,B,C (names, section links) to our local-policy-schema (permalink to versioned data schema). We ran it through an OpenRefine script (permalink, version) producingTopos-Policy-KFF-4-8.csv
(permalink). We did the same for data from this NYT article (url, source date), hand cleaning it through our in-house team (url, named process) to produceTopos-Policy-NYT-4-8.csv
(permalink).We combined these into a one-time policy dataset
Topos-Policy-4-8.csv
(permalink, DOI, maintainer), which is used in the following visualizations (name+URL)
Send us a link to your sheet (or add it here)