Meet the People Behind the Wayback Machine, One of Our Favorite Things About the Internet

The Internet Archive is home to more than 15 million gigabytes of free digital information—and it’s just getting started.

Brewster Kahle is quick to point out that we are not standing inside a former Scientology church. Visitors to this looming white building in San Francisco’s Inner Richmond District are often confused about its past life as a meeting place for Christian Scientists, not to be confused with Scientologists. It is now a different kind of house of worship, known as the Internet Archive, where free digital access to all knowledge is the canon.

“The average life of a web page is about 100 days before it’s either changed or deleted,” says Kahle. “Even if it’s supported by big companies: Google Video came down, Yahoo Video came down, Apple went and wiped out all the pages in Mobile Me.” Capturing this transient web was Kahle’s original mission for the Internet Archive when he founded it in 1996. Nearly two decades later, the 53-year-old compares his organization to a “Library of Alexandria, version two.”

That may be an understatement. In addition to hosting the Wayback Machine, an ever-growing collection of more than 400 billion copies of web pages, the Internet Archive has also expanded its services by providing millions of free digitized books, TV shows, movies, songs, documents, and software titles. Want to see what MotherJones.com looked like in 1996? Here you go. Are you a Deadhead in search of rare recordings? There are more than 9,000 to choose from. Remember when federal websites were closed for business during the government shutdown? They were still available thanks to the Internet Archive.

Kahle compares the Internet Archive to a “Library of Alexandria, version two.”

Walking through the Internet Archive’s physical headquarters, which has occupied this former church since 2009, is a surreal experience. Built in 1923, the grand worship hall on the second floor remains intact, with wooden pews lining the floor and a podium sitting atop a stage. But stacks of humming blinking server racks now rest against the walls. And then there are the figurines—dozens of half-size human models that populate the outside rows of pews and immortalize Archive employees and volunteers throughout the years. Kahle’s mini-mannequin stands in the front row. Next to him is Aaron Swartz, the “Internet folk hero” who was a volunteer and contractor from 2007 to 2009. Swartz committed suicide in 2013 following a federal indictment for downloading the contents of the digital library JSTOR from the Massachusetts Institute of Technology. Kahle remains disappointed with how prosecutors, MIT, and JSTOR handled the Swartz case. “Shame on them,” he says. “I think it’s a symbol of the old world and the old approach that must be overturned. There are some organizations that are still built around this idea of restricting, restricting, restricting, and that’s not going to fly.”

While Kahle is against restricting access to knowledge, he adamantly supports internet users’ right to privacy. In 2007, the FBI sent the Internet Archive a secret National Security Letter (PDF) seeking information about one of its patrons. With the help of the Electronic Frontier Foundation, Kahle challenged the request and won. “That a library has to sue the US government is not terribly appropriate,” he says. But the Internet Archive’s relationship with the feds is not entirely prickly. It also provides web crawling and book scanning services for the Library of Congress. Kahle says the Patent and Trademark Office has used the Wayback Machine to research which ideas are novel or not.

A collection like the Internet Archive’s is extremely valuable. Kahle estimates it has about 15 petabytes of information (a petabyte is approximately one million gigabytes of data). That’s a lot less than Facebook’s estimated 300 petabytes, but there’s a big difference: “The Internet Archive is a nonprofit, and nope, there’s no buying it,” says Kahle. Kahle has sold other companies in the past. The Internet Archive was started with funding from the 1995 sale of his search system WAIS, which AOL purchased for $15 million. His online tracking service Alexa was sold to Amazon for $250 million in 1999. The Internet Archive’s current budget is around $12 million.

One of the Internet Archive’s fastest growing collections is its TV News Archive. For 24 hours a day, 7 days a week, HD feeds from more than 65 news channels, both foreign and domestic, are recorded on the Internet Archive servers. The US feeds are fully searchable the following day. Roger Macdonald, who runs the project’s entire Television Archive, preaches treating all media as data. He says many TV and cable networks are “scared about experimenting” with closed captioning data that could make their content searchable by a global audience. By making its videos text-searchable, “our service has vaulted over the confines of the linear video storytelling,” he says. For example, when Harvard and MIT researchers studied how the media covered the Trayvon Martin shooting, they turned to the TV News Archive, using its closed captioning data to help map the story’s evolution.

In 2013, the Internet Archive received an unusual message from Michael Metelits. Metelits’s mother, Marion Stokes, who had recently passed away, had recorded more than 35 years of TV news in Philadelphia and Boston with her VHS and Betamax machines. Metelits was left with approximately 40,000 well-organized tapes, but he had nowhere to put them. So he emailed the Archive. “I thought there might be a typo in his email,” Macdonald recalls. “I couldn’t imagine an individual doing that.”

The donated collection turned out to be a goldmine. The TV News Archive began recording in 2000; Stokes had them beat by more than 20 years. And not only were her tapes in good condition, they also recorded closed captioning data, providing vital metadata. Digitizing and logging the massive trove, now stored in Richmond, California, is a challenge, to say the least. Macdonald says they’ve “only just scratched the surface of imagining what’s there.”

image: tape collection

Sean Fagan, logistics specialist for the Internet Archive, with the Marion Stokes collection— 35 years of TV news recorded on VHS and Beta tapes. Brett Brownell

Looming above the Richmond storage facility where the Stokes collection resides is another element of Kahle’s ongoing mission. It’s an antenna broadcasting free internet, one of two free wi-fi access points the Archive provides to San Francisco Bay Area residents. (A third free wi-fi setup is in North Carolina.) He says cities “haven’t been doing their part” to provide faster access to the web and that communication infrastructure is “just as much the lifeblood as water or transportation to a city.”

Adding to its long list of projects, the Internet Archive is also taking a swing at the housing market. Kahle wants to apply the tech industry concept of “open sourcing” to disrupt (if you will) the Bay Area’s affordable housing crisis, which has been fueled in part by the booming tech industry. The Internet Archive has set up a separate nonprofit to purchase an 11-unit apartment building six blocks from its San Francisco headquarters, which it hopes will offer “debt free” housing to nonprofit employees. Macdonald says the first Internet Archive employee will move in later this year. Eventually, Kahle’s dream is “to transition 5 percent of all housing into a new housing class that would be dedicated to supporting the nonprofit sector.”

Even as he sets more ambitious goals, Kahle worries that the end of net neutrality could spell the end of the open web he’s fought to preserve. “If we lose net neutrality,” he says, “or if we let monopolization happen, whether it’s Comcast and AT&T in the United States, or other players in other countries, we will lose the magic that we’ve had for the last 20 or 30 years with this internet.” He urges other technologists to get involved. “We can’t just wait on government to do something. They’ll be bashed around by the commercial players that have all to gain from monopolization.”

Thinking about the current state of internet, Kahle says, “I wake up sometimes really depressed, and sometimes really optimistic.” But, he adds, “As they said in other struggles, you should know which side you’re on, and at least the Internet Archive knows which side it’s on.”

WE'LL BE BLUNT

It is astonishingly hard keeping a newsroom afloat these days, and we need to raise $253,000 in online donations quickly, by October 7.

The short of it: Last year, we had to cut $1 million from our budget so we could have any chance of breaking even by the time our fiscal year ended in June. And despite a huge rally from so many of you leading up to the deadline, we still came up a bit short on the whole. We can’t let that happen again. We have no wiggle room to begin with, and now we have a hole to dig out of.

Readers also told us to just give it to you straight when we need to ask for your support, and seeing how matter-of-factly explaining our inner workings, our challenges and finances, can bring more of you in has been a real silver lining. So our online membership lead, Brian, lays it all out for you in his personal, insider account (that literally puts his skin in the game!) of how urgent things are right now.

The upshot: Being able to rally $253,000 in donations over these next few weeks is vitally important simply because it is the number that keeps us right on track, helping make sure we don't end up with a bigger gap than can be filled again, helping us avoid any significant (and knowable) cash-flow crunches for now. We used to be more nonchalant about coming up short this time of year, thinking we can make it by the time June rolls around. Not anymore.

Because the in-depth journalism on underreported beats and unique perspectives on the daily news you turn to Mother Jones for is only possible because readers fund us. Corporations and powerful people with deep pockets will never sustain the type of journalism we exist to do. The only investors who won’t let independent, investigative journalism down are the people who actually care about its future—you.

And we need readers to show up for us big time—again.

Getting just 10 percent of the people who care enough about our work to be reading this blurb to part with a few bucks would be utterly transformative for us, and that's very much what we need to keep charging hard in this financially uncertain, high-stakes year.

If you can right now, please support the journalism you get from Mother Jones with a donation at whatever amount works for you. And please do it now, before you move on to whatever you're about to do next and think maybe you'll get to it later, because every gift matters and we really need to see a strong response if we're going to raise the $253,000 we need in less than three weeks.

payment methods

WE'LL BE BLUNT

It is astonishingly hard keeping a newsroom afloat these days, and we need to raise $253,000 in online donations quickly, by October 7.

The short of it: Last year, we had to cut $1 million from our budget so we could have any chance of breaking even by the time our fiscal year ended in June. And despite a huge rally from so many of you leading up to the deadline, we still came up a bit short on the whole. We can’t let that happen again. We have no wiggle room to begin with, and now we have a hole to dig out of.

Readers also told us to just give it to you straight when we need to ask for your support, and seeing how matter-of-factly explaining our inner workings, our challenges and finances, can bring more of you in has been a real silver lining. So our online membership lead, Brian, lays it all out for you in his personal, insider account (that literally puts his skin in the game!) of how urgent things are right now.

The upshot: Being able to rally $253,000 in donations over these next few weeks is vitally important simply because it is the number that keeps us right on track, helping make sure we don't end up with a bigger gap than can be filled again, helping us avoid any significant (and knowable) cash-flow crunches for now. We used to be more nonchalant about coming up short this time of year, thinking we can make it by the time June rolls around. Not anymore.

Because the in-depth journalism on underreported beats and unique perspectives on the daily news you turn to Mother Jones for is only possible because readers fund us. Corporations and powerful people with deep pockets will never sustain the type of journalism we exist to do. The only investors who won’t let independent, investigative journalism down are the people who actually care about its future—you.

And we need readers to show up for us big time—again.

Getting just 10 percent of the people who care enough about our work to be reading this blurb to part with a few bucks would be utterly transformative for us, and that's very much what we need to keep charging hard in this financially uncertain, high-stakes year.

If you can right now, please support the journalism you get from Mother Jones with a donation at whatever amount works for you. And please do it now, before you move on to whatever you're about to do next and think maybe you'll get to it later, because every gift matters and we really need to see a strong response if we're going to raise the $253,000 we need in less than three weeks.

payment methods

We Recommend

Latest

Sign up for our free newsletter

Subscribe to the Mother Jones Daily to have our top stories delivered directly to your inbox.

Get our award-winning magazine

Save big on a full year of investigations, ideas, and insights.

Subscribe

Support our journalism

Help Mother Jones' reporters dig deep with a tax-deductible donation.

Donate