The Willie Sutton Guide to “High Value Data.”


It should be fairly straightforward to figure out what government data sets the public is really interested in. The public is telling government with every website visit, search, and download; with every online transaction, phone call, and text to 311; and with every FOIL request. Unfortunately, government often isn’t listening. (How many New York State and City agencies have good web analytics? How many have any web analytics?)

Willie_SuttonBut there’s another way of identifying high value data that isn’t usually discussed – just take a look at what data your government is charging a fee for. Bingo, if people are willing to pay for data, it’s by definition “high value data.” New York City’s Open Data initiative scored a big success when it got the Department of City Planning to publish PLUTO and tax-lot data, and when the Department of Finance did the same with ACRIS property records. Previously, the data was either available for a fee or via online forms which allowed one search at a time. Now, both data sets can be downloaded in a machine-readable format. Neither the city nor the state keep a central directory of all data sets that they charge a fee for, but odds are good that they both still do.

But what about the foregone revenue? Shouldn’t taxpayers get a return from all of that expensive data? Yes! Of course we should. But there is a very strong case to be made that this data generates a far higher overall return for tax-payers when it can be used freely by a large universe of developers, data scientists, and researchers than by a small number who pay a fee which doesn’t generate much revenue. PLUTO, for instance, grossed roughly $250k/year for the Department of City Planning when it was for sale. Millions of dollars of taxpayer money went into creating the PLUTO database and keeping it updated, and there is no realistic way to recoup that cost. So, the debate is how to get the biggest bang for the buck from the heavily subsidized data, not to recover a small portion of its cost. We believe it is a false economy to nickel and dime the public with fees to use data our taxes already paid for.

Back to the matter at hand; Willie Sutton was a real bank robber and Greenpoint, Brooklyn native who probably never said that he “robbed banks because that’s where the money is.” But whether or not Willie said it, you can take that advice, follow the money, and figure out where the “high value data” is locked up.