Skip to main content
SearchLoginLogin or Signup

Data sharing checklist

Published onMay 19, 2020
Data sharing checklist
·

1. Make a spreadsheet of your sources and outputs

For each:

  • What was your original source?

    • What shape is that data? (schema + constraints)

    • How often is it updated?

    • Does it have an explicit maintainer?

  • How did you work with that data?

    • What shape is the data you extracted from it?

    • Did you refine or clean it in some way?

    • If so, what protocol did you use? (option: index it via protocols.io)

    • What date did you retrieve it, what date was the source last updated?

  • What derivative datasets did you produce?

    • Add a line for each: w/ name, permalink, doc, depdendencies

      • last update, maintainer

    • What is the update tempo: one-time, time series, patches only?

  • How is each dataset being used?

    • Further derived/remixed datasets, visualizations

    • References, other backlinks

2. Write a narrative overview

Example:

We have a Topos-Policy dataset that we update periodically (link to doc describing time-series plan) derived from the best sources we can find. Currently that is the Kaiser Family Foundation (URL) and the NYT (URL).

As of date, we visited the Kaiser Family Foundation (URL, source date, retrieved date) and mapped what we found in Tables A,B,C (names, section links) to our local-policy-schema (permalink to versioned data schema). We ran it through an OpenRefine script (permalink, version) producing Topos-Policy-KFF-4-8.csv (permalink). We did the same for data from this NYT article (url, source date), hand cleaning it through our in-house team (url, named process) to produce Topos-Policy-NYT-4-8.csv (permalink).  

We combined these into a one-time policy dataset Topos-Policy-4-8.csv (permalink, DOI, maintainer), which is used in the following visualizations (name+URL)  

  1. Send us a link to your sheet (or add it here)

Comments
0
comment
No comments here
Why not start the discussion?