Thursday, April 15, 2010

The Library of Congress is going to "archive the collected works of Twitter."

We're going to record "the second-by-second history of ordinary people."

That NYT article makes this important point (which it attributes to unnamed academics):

For hundreds of years, . . . the historical record has tended to be somewhat elitist because of its selectivity. In books, magazines and newspapers, . . . it is the prominent and the infamous who are written about most frequently.
I've been critical of Twitter. But I only wish we had this kind of access to random people's fleeting thoughts from 50, 100, 200 years ago.

On the other hand, most tweets surely aren't very interesting. It's hard to imagine that more than 10% are worth going back and reading even a year later. But the Library of Congress is apparently archiving all tweets. This seems bizarrely undiscriminating for such an august institution.

Now, sifting through the 55 million tweets per day (!) would be an enormous task, so I'm not surprised they don't bother to do that. But couldn't they come up with an algorithm that would automate the process of distilling the whole archive into something with more overall value? For instance, they could have a rule that they only archive tweets that have garnered a certain number of "favorites."

Lastly, this is disturbing:
[T]he vast majority of Twitter messages that would be archived are publicly published on the Web.
"The vast majority"? You mean some of the posts that will be stored in the Library of Congress will be from posts that were never publicly published — that is, from "locked" accounts? How is that not a flagrant violation of privacy?