Open Data’s Green, Yellow and Red Buckets

When think of government data as falling into green, yellow, and red “buckets.”  We start by encouraging government to publish all of their green bucket data. Green bucket data is obviously public and can be published with few, if any, redactions. Data in the yellow bucket might be publishable, but raise potential questions.  The red bucket is data that clearly endangers privacy, public safety and security, trade and contract secrets and law enforcement investigations. Most fights over data release are over yellow bucket data, and it often hinges on the definition of privacy or safety, and a rough feel for what the public is OK with.  There are yellow bucket fights over just about everything you could imagine, including government agencies refusing to publish digital maps of a waterfront which show sewer discharge outlets or subway exits.  Probably the most complex area of the yellow bucket are the many inconsistencies in the release of personally identifiable information. Under NY Law, court records are public, as are property ownership records and professional licenses for doctors, lawyers and others. Yet, state and city legislators have proposed bills which ban release of personally identifiable information for any open data. Below is a rough spectrum originally created by open data and search guru, Jim Hendler of RPI and semantic search fame. It is a few years old and needs to be updated, but is a good place to start the conversation.

Screen Shot 2016-03-07 at 12.44.19 PM