Grading the Progress of NYC’s Open Data Law

July 18, 2016

Last Friday, July 15th, right on schedule as required by the NYC Open Data Law, New York City published Open Data For All 2016 its annual progress report on Open Data. The short four page report rightly touted some big successes publishing new data sets of great interest to the public and automating data sets so they update without human intervention. But the report did not mention or at all explain how the City is dealing with persistent problems with data quality, or how mayoral agencies will meet the new mandates imposed by seven new amendments to the Open Data Law.

The NYC Transparency Working Group and Reinvent Albany are huge boosters of the City’s Open Data initiative, but as advocates, our role is to keep pushing the City to obey the letter and spirit of the law, and that can entail some criticism. Overall, we are very happy to see that the de Blasio administration has made a real commitment to improving open data, and that the mayor signed the seven open data bills passed by City Council in the Fall of 2015 and early 2016 – he did not have to do that, and we do not take his action for granted. We also appreciate that NYC DoITT and the Mayor’s Office published the open data progress report on time.

NYC Open Data Law Implementation 2016

Highlights

An A plus grade for NYC DoITT’s ongoing work to program agency data sets to automatically update. DoITT set up an impressive 100 auto-updating data sets in the last twelve months for a total of 200 automatically updating data sets out of roughly 1,500 data sets total. This is excellent work and greatly improves the value and reliability of the published data. Datasets that do not automatically update on the open data portal have to be manually re-uploaded whenever a new version is published; this is expensive, often delayed, and creates opportunities for big errors as data sets are updated in all kinds of different ways, including cutting and pasting from spreadsheets.

An A grade for publishing major new data sets including the City Budget, City Record Online, and Seven Major Felonies among the 150 new data sets published. The City also published a huge data set of Taxi and Limousine taxi trips. The TLC was getting about 75 FOIL requests a year for this data set, which is so big that requestors were required to give the TLC a blank hard drive that could hold the roughly 50 DVD’s worth of trip data. Open Data is supposed to help government save on the cost and hassle of fulfilling FOI requests, so we are particularly glad to see this happen.

We are also encouraged to see that NYC is explicitly connecting Freedom of Information Law and open data. For the first time the Progress Report includes Local Law 7 of 2016’s mandate that each agency must publish:

The number of FOIL responses that included the release of data.
The number of FOIL responses which included a data set not yet on the open data portal.
The number of FOIL responses that resulted in data sets being voluntarily published (i.e. created and uploaded) on the open data portal.

Thirty agencies out of roughly eighty complied with this new law, which is a good start, but but by next July, all agencies should be complying.

Lesser Lights

Though it was a very big year for automation and data publishing, the Progress Report includes a few gaping holes and does not say anything about the data quality issues that have dominated City Council hearings and NYC’s public discussion about open data for the last year or more. How is the Mayor’s Office of Data Analytics and DoITT going to get agencies to fix serious data errors? Will the City create a workable process for the public to report data quality problems and get them fixed? We do not know. How is the City going to meet the fast approaching new reporting mandates created by the seven amendments to the Open Data Law? Again we do not know.

What we do know is that every type of open data stakeholder—from community based organizations to academia to watchdog groups to journalists and businesses—has vociferously complained about data errors, the lack of explanatory metadata, and poorly structured data for years and nothing has changed. Many open data users have told us that the data they use is so riddled with obvious errors that it causes them to wonder if any of the data is correct. This serious crisis of confidence in the quality of agency data needs to be addressed or the great dream of open data as a fundamentally new form of open government is going to fail. Given the commitment of the mayor and City Council to making open data work, we are optimistic that the hard work needed to fix this problem will be done. But, it is still disappointing that the 2016 progress report says nothing about the issue that is most on the mind of the open data community.