How NYC’s Open Data Law Defines “Public Data”
New York City’s Open Data Law—Local Law 11 of 2012—requires each city agency to publish all its Public Data Sets by December 31, 2018. Not all agency data is “public data,” so the definition of public data is important; if an agency classifies data as non-public, it does not have to publish that data. Last year, the city’s Open Data Team seemed to raise questions about what exactly public data is in their audit report on the Departments of Sanitation, Correction, and Housing Preservation and Development (see bottom of page 2) We think the definition of Data and Public Data Set is clear and hope city agencies take another look at what the law says.
Definition of Data
The Open Data Law defines Data in §23-501(b), as:
final versions of statistical or factual information (1) in alphanumeric form reflected in a list, table, graph, chart or other non-narrative form, that can be digitally transmitted or processed; and (2) regularly created or maintained by or on behalf of and owned by an agency that records a measurement, transaction, or determination related to the mission of an agency.
However, Data does not include “information provided to an agency by other governmental entities.” For example, the City Department of Transportation does not have to publish data it gets from the State DOT or the Department of City Planning – though it could.
Additionally, Data does not include “image files, such as designs, drawings, maps, photos, or scanned copies of original documents.” This is largely redundant, given the ‘alphanumeric form’ requirement above.
Data does include “statistical or factual information about such image files and shall include geographic information system data.” Taken with the previous paragraphs, this means while charts, graphs, and maps themselves are not publishable Data, the tabular information underlying those are all Data.
Definition of Public Data Set
According to §23-501(g), a Public Data Set is:
A comprehensive collection of interrelated data that is available for inspection by the public in accordance with any provision of law and is maintained on a computer system by, or on behalf of, an agency.
The open data law relies heavily on the New York State Freedom of Information Law to determine what is public (i.e. a Public Data Set) and what is private. The first sentence in the definition of Public Data Set makes two references to FOIL. The term available for inspection is a direct quote from §87(2) of FOIL: “Each agency shall, in accordance with its published rules, make available for public inspection and copying all records…” Likewise, the phrase maintained on a computer system by, or on behalf of, an agency is a reference to the FOIL §86(4) definition of Record as “any information kept, held, filed, produced or reproduced by, with or for an agency…”
Thus, it is convenient to say tabular data subject to FOIL is a Public Data Set for purposes of the open data law. However, there are seven subsequent exceptions to the definition of Public Data Set:
1. any portion of such data set to which an agency may deny access pursuant to the public officers law or any other provision of a federal or state law, rule or regulation or local law;
The Public Officers Law contains FOIL, thus the open data law here explicitly incorporates all fifteen exceptions within FOIL. If data is not subject to public disclosure under FOIL, it is not public data and not publishable under the Open Data Law.
2. any data set that contains a significant amount of data to which an agency may deny access pursuant to the public officers law or any other provision of a federal or state law, rule or regulation or local law and where removing such data would impose undue financial or administrative burden;
Whereas FOIL requires an agency to redact the sensitive portions of public records and grant access to the remainder, the open data law here excludes data sets with “significant” amount of redactable data from the definition of Public Data Set. (The definition of “significant” is ambiguous and has not been established through experience.
3. data that reflects the internal deliberative process of an agency or agencies, including but not limited to negotiating positions, future procurements, or pending or reasonably anticipated legal or administrative proceedings;
This exception to the open data law is redundant in light of FOIL §87(2)(g), the “interagency” or “deliberative materials” exception. Here, it enumerates a handful of specific cases which are all squarely within the §87(2)(g) exception and would not be subject to disclosure under FOIL.
4. data stored on an agency-owned personal computing device, or data stored on a portion of a network that has been exclusively assigned to a single agency employee or a single agency owned or controlled computing device;
A version of a data set which only exists on one employee’s computer or corresponding network share is not a Public Data Set. This recognizes that certain agencies like DOT have terrible data hygiene and there are several non-canonical versions of any given data set floating around at once. While Employee A’s slightly tweaked copy of a data set may be FOILable, it is not technically a Public Data Set; this saves agencies the administrative burden of hunting for and publishing idiosyncratic copies of data sets.
5. materials subject to copyright, patent, trademark, confidentiality agreements or trade secret protection;
This exception is poorly written. Data sets cannot be patented or trademarked. Tabular data is copy rightable, but the routine administrative collection and compilation of government records does not create a copyright interest in the resulting data set. Trade secrets are explicitly not Public Data Sets, per FOIL’s §87(2)(d) “trade secrets” exception. Data subject to confidentiality agreements is extremely broad, but probably necessary as Data can include data sets maintained by private parties on behalf of an agency. As of this writing, Reinvent Albany has no reason to believe the confidentiality agreements exception is being abused.
6. proprietary applications, computer code, software, operating systems or similar materials;
Presumably, this exception is meant to relieve agencies from the responsibility for decompiling existing software to access and publish the data set within. Such a data set would otherwise be accessible to FOIL.
7. employment records, internal employee-related directories or lists, and facilities data, information technology, internal service-desk and other data related to internal agency administration.
This exception is excessively broad and could be used as an excuse not to publish data. Many data sets could be said to relate to “internal agency administration,” but agencies do not appear to be abusing this carveout yet. In addition, the necessity of this language is questionable, as FOIL’s §87(2)(g) exception for intra-agency materials is already incorporated into the definition of Public Data Set.