Big Data Man Bites Big Data Dog


In a rare turn of events, a data miner is mapping the relationship between corporations, instead of engaging in the data vivisection that individual Americans are subject to every time they go online., a new member of the Data Transparency Coalition, is a search engine devoted to making sense of the jumbled public data on big business. New York City-based Enigma appears to be succeeding in creating a clear picture out of the muddle of public data resulting from archaic practices; government agencies each assign their own identification number to businesses, contracts, and payments. (In New York State, the Comptroller, Attorney general, and State Department of State use different ID numbers for businesses and subsidiaries can be very difficult to keep track of.)

Further complicating things, the U.S. federal and state government regulatory bodies also use their own identification numbers. For example, just within the banking industry, the Federal Reserve, the FDIC, and the Comptroller of the Currency each maintain their own unique identifiers. And the IRS maintains Employer Identification Numbers, separate from New York State’s own Certificate of Authority numbers, which is separate from its Sales Tax ID Numbers, and so on. Simply identifying a corporation is a feat in itself.

Once businesses are finally identified, it’s still difficult to determine what they’re up to; a significant amount of public reporting is often in non-machine-readable formats. Enigma’s founder says they have to “piece this puzzle together out of currently available bits and fragments… we have to operate in creative ways to bring these disparate data sets together to produce new insights.”

From a recent New York Times profile of Enigma:

Enigma scrapes a wide variety of federal and state level government websites to glean such fragments. Enigma also petitions for and buys additional information from agencies and commercial vendors. Once all those pieces of data are on its platform, Enigma applies its own algorithm to pull them together and link them to the same entity.

Not only are there no common identifiers, there are no common standards for the reporting of data. All the effort spent collecting and standardizing this data would be much better spent making sense of this data. The recently-passed DATA act is a seismic shift in the quality of data available to startups like Enigma and watchdogs like OpenSecrets or Maplight.