“Seeing With Fresh Eyes” by Edward Tufte

Written by

in

,
  • Autocorrelation, serial correlation, bang-bang duplicate,  or pseudo-replications are when a piece of data follows another without new measures
  • There is a difference between having questions to be solved by a database and poking around a database looking for interesting answers to question
  • Early screening boosts “survival time” in useless ways
  • Adjust measures to avoid errors rather than modeling them away
  • “Guide to Bad Data” by Chris Groskopf
    • Values missing
    • Zeros replace missing values
    • Data missing that you know should be there
    • Rows or values duplicated
    • Total differ from aggregates
    • Suspicious values present
    • Spreadsheet have 65536 rows or 255 columns
    • Margin-of-error to large or unknown
    • Benford’s Law fails
    • Too good to be true
  • Fix bad names immediately
  • Survivor bias: “Most medieval castles were made of wood”

Comments

Leave a Reply