Congressional Data Mining: Coming Soon?

How a little-noticed provision in a House spending bill could revolutionize access to congressional information.

Photo by flickr user <a href="http://www.flickr.com/photos/ianalexandermartin/1846815435/" target="new">I am I.A.M.</a> used under a Creative Commons license.

Fight disinformation: Sign up for the free Mother Jones Daily newsletter and follow the news that matters.


By slipping a simple, three-sentence provision into the gargantuan spending bill passed by the House of Representatives last week, a congressman from Silicon Valley is trying to nudge Congress into the 21st Century. Rep. Mike Honda (D-Calif.) placed a measure in the bill directing Congress and its affiliated organs—including the Library of Congress and the Government Printing Office—to make its data available to the public in raw form. This will enable members of the public and watchdog groups to craft websites and databases showcasing government data that are more user-friendly than the government’s own.

If the Senate passes the bill with the provision intact, citizens seeking information about Congress’ activities—such as bill names and numbers, amendments, votes, and committee reports—won’t have to rely on government websites, which often filter information, are incomplete, or are difficult to use. Instead, the underlying data will be available to anyone who wants to build a superior site or tool to sift through it. “The language is groundbreaking in that it supports providing unfiltered legislative information to the public,” says Honda’s online communications director, Rob Pierson. “Instead of silo-ing the information, and only allowing access through a limited web form, access to the raw data will make it easier for people to learn what their government is doing.”

Successful, privately-created websites that provide the public with information about Congress’ actions already exist. OpenCongress.org, GovTrack.us, Legistorm.com, and MAPLight.org all make legislative data available to the public in ways that are easier to navigate than Congress’ primary web portal, a system called Thomas. Those sites currently get their data through techies who “scrape” Thomas and other government websites, which means they use bots to process the HTML and gather what is valuable. The process is labor-intensive and imprecise. “It’s difficult to keep the data up to date, in some cases impossible, and occasionally there are errors in the data,” says Josh Tauberer, the 26-year-old who runs GovTrack.us and does lots of the “scraping” that others use. “This could all be fixed by a bulk data download.”

Tauberer expects that the availability of additional and easier-to-use congressional data will spur innovation. “You can expect to see other sites spring up doing new and interesting things with the information.” He anticipates charts, graphs, and maps that represent congressional goings-on visually—”ways of visualizing the congressional process that we couldn’t yet imagine.” Honda, with his Silicon Valley roots, expects that developers and coders will quickly outpace the government’s efforts to date. “We hope that we can learn from the wisdom of crowds,” says Pierson.

There are government agencies that already provide massive amounts of data via databases. The Census Bureau provides huge amounts of information in raw form, allowing academics, statisticians, and think tank scholars to comb through it in any way they please. The Federal Elections Commission publishes unedited data on campaign contributions, giving rise to sites like OpenSecrets.org, which allows the public to see who is donating to whom, and allows journalists and watchdogs to investigate the influence of money in politics.

“In our Web 2.0 world, we can empower the public by providing them with raw data that they can remix and reuse in new and innovative ways,” Honda told Mother Jones in a statement. (Disclosure: In the summer of 2002, I briefly worked as an intern in Honda’s district office.) Honda’s provision, however, pertains only to legislative data. Federal departments like the Environmental Protection Agency, the Food and Drug Administration, and the Department of Energy have reams of data that political scientists, economists, and researchers of all stripes would love to get their hands on. Many who work at the intersection of technology, politics, and transparency believe that the key player in broadening Honda’s effort to include the executive branch will be Vivek Kundra, the former Chief Technology Officer of the District of the Columbia who was named Obama’s Chief Information Officer on Thursday. According to the National Journal‘s Tech Daily Dose, Kundra “told reporters Thursday he will launch data.gov, a Web site intended to ‘democratize data’ by giving the public raw feeds of information from a range of agencies.”

John Wonderlich, the policy director at the Sunlight Foundation, which has created or funded several tools that make government data easier to analyze, is holding out hope that the president’s Open Government Directive, which is due at the end of May, will further address the issue of data availability. He applauds Honda for putting Congress, at least, on the right track. “Without Honda’s attention to this issue, congressional level attention to bulk data access would be unlikely,” he says. “We’re happy to see this first step.”

WE'LL BE BLUNT

It is astonishingly hard keeping a newsroom afloat these days, and we need to raise $253,000 in online donations quickly, by October 7.

The short of it: Last year, we had to cut $1 million from our budget so we could have any chance of breaking even by the time our fiscal year ended in June. And despite a huge rally from so many of you leading up to the deadline, we still came up a bit short on the whole. We canā€™t let that happen again. We have no wiggle room to begin with, and now we have a hole to dig out of.

Readers also told us to just give it to you straight when we need to ask for your support, and seeing how matter-of-factly explaining our inner workings, our challenges and finances, can bring more of you in has been a real silver lining. So our online membership lead, Brian, lays it all out for you in his personal, insider account (that literally puts his skin in the game!) of how urgent things are right now.

The upshot: Being able to rally $253,000 in donations over these next few weeks is vitally important simply because it is the number that keeps us right on track, helping make sure we don't end up with a bigger gap than can be filled again, helping us avoid any significant (and knowable) cash-flow crunches for now. We used to be more nonchalant about coming up short this time of year, thinking we can make it by the time June rolls around. Not anymore.

Because the in-depth journalism on underreported beats and unique perspectives on the daily news you turn to Mother Jones for is only possible because readers fund us. Corporations and powerful people with deep pockets will never sustain the type of journalism we exist to do. The only investors who wonā€™t let independent, investigative journalism down are the people who actually care about its futureā€”you.

And we need readers to show up for us big timeā€”again.

Getting just 10 percent of the people who care enough about our work to be reading this blurb to part with a few bucks would be utterly transformative for us, and that's very much what we need to keep charging hard in this financially uncertain, high-stakes year.

If you can right now, please support the journalism you get from Mother Jones with a donation at whatever amount works for you. And please do it now, before you move on to whatever you're about to do next and think maybe you'll get to it later, because every gift matters and we really need to see a strong response if we're going to raise the $253,000 we need in less than three weeks.

payment methods

WE'LL BE BLUNT

It is astonishingly hard keeping a newsroom afloat these days, and we need to raise $253,000 in online donations quickly, by October 7.

The short of it: Last year, we had to cut $1 million from our budget so we could have any chance of breaking even by the time our fiscal year ended in June. And despite a huge rally from so many of you leading up to the deadline, we still came up a bit short on the whole. We canā€™t let that happen again. We have no wiggle room to begin with, and now we have a hole to dig out of.

Readers also told us to just give it to you straight when we need to ask for your support, and seeing how matter-of-factly explaining our inner workings, our challenges and finances, can bring more of you in has been a real silver lining. So our online membership lead, Brian, lays it all out for you in his personal, insider account (that literally puts his skin in the game!) of how urgent things are right now.

The upshot: Being able to rally $253,000 in donations over these next few weeks is vitally important simply because it is the number that keeps us right on track, helping make sure we don't end up with a bigger gap than can be filled again, helping us avoid any significant (and knowable) cash-flow crunches for now. We used to be more nonchalant about coming up short this time of year, thinking we can make it by the time June rolls around. Not anymore.

Because the in-depth journalism on underreported beats and unique perspectives on the daily news you turn to Mother Jones for is only possible because readers fund us. Corporations and powerful people with deep pockets will never sustain the type of journalism we exist to do. The only investors who wonā€™t let independent, investigative journalism down are the people who actually care about its futureā€”you.

And we need readers to show up for us big timeā€”again.

Getting just 10 percent of the people who care enough about our work to be reading this blurb to part with a few bucks would be utterly transformative for us, and that's very much what we need to keep charging hard in this financially uncertain, high-stakes year.

If you can right now, please support the journalism you get from Mother Jones with a donation at whatever amount works for you. And please do it now, before you move on to whatever you're about to do next and think maybe you'll get to it later, because every gift matters and we really need to see a strong response if we're going to raise the $253,000 we need in less than three weeks.

payment methods

We Recommend

Latest

Sign up for our free newsletter

Subscribe to the Mother Jones Daily to have our top stories delivered directly to your inbox.

Get our award-winning magazine

Save big on a full year of investigations, ideas, and insights.

Subscribe

Support our journalism

Help Mother Jones' reporters dig deep with a tax-deductible donation.

Donate