Facebook Threatens To Sue Researcher For Crawling Their Site
Despite the fact that the fundamental nature of the web is that public pages are open for indexing and analysis — that’s how search engines and other fundamental tools of the web work — Pete Warden recently discovered that Facebook doesn’t agree.
Pete is the guy that analyzed Facebook data by building his own web crawler, and then published some results of his analysis of social connections. I was one of the many folks that linked to his post, How To Split Up The US, that included this map:

In the post, Pete offered to share his dataset with others, which might have been the point that led to his getting a phone call from Facebook’s chief counsel, demanding that he destroy the data. Or else.
Pete Warden, How I Got Sued By Facebook
On Sunday around 25,000 people read the article, via YCombinator and Reddit. After that a whole bunch of mainstream news sites picked it up, and over 150,000 people visited it on Monday. On Tuesday I was hanging out with my friends at Gnip trying to make sense of it all when my cell phone rang. It was Facebook’s attorney.
He was with the head of their security team, who I knew slightly because I’d reported several security holes to Facebook over the years. The attorney said that they were just about to sue me into oblivion, but in light of my previous good relationship with their security team, they’d give me one chance to stop the process. They asked and received a verbal assurance from me that I wouldn’t publish the data, and sent me on a letter to sign confirming that. Their contention was robots.txt had no legal force and they could sue anyone for accessing their site even if they scrupulously obeyed the instructions it contained. The only legal way to access any web site with a crawler was to obtain prior written permission.
Obviously this isn’t the way the web has worked for the last 16 years since robots.txt was introduced, but my lawyer advised me that it had never been tested in court, and the legal costs alone of being a test case would bankrupt me. With that in mind, I spent the next few weeks negotiating a final agreement with their attorney. They were quite accommodating on the details, such as allowing my blog post to remain up, and initially I was hopeful that they were interested in a supervised release of the data set with privacy safeguards. Unfortunately it became clear towards the end that they wanted the whole set destroyed. That meant I had to persuade the other startups I’d shared samples with to remove their copies, but finally in mid-March I was able to sign the final agreement.
I’m just glad that the whole process is over. I’m bummed that Facebook are taking a legal position that would cripple the web if it was adopted (how many people would Google need to hire to write letters to every single website they crawled?), and a bit frustrated that people don’t understand that the data I was planning to release is already in the hands of lots of commercial marketing firms, but mostly I’m just looking forward to leaving the massive distraction of a legal threat behind and getting on with building my startup.
So: Frankenberg believes that we are moving toward a world where people will be sharing more and more information about themselves, freely. Facebook is merely on a mission to make that happen. People just don’t understand that Facebook isn’t evil, and doesn’t have ulterior motives at heart, or so Zuckerberg says. At least until someone else wants to gather that ‘open’ information, which is published publicly on users’ Facebook pages. Then they want it totally locked up, or else they will sue you.

