Tag Archives: dev

FriendFeed Active Users Crawler

Following my previous post, my FriendFeed crawler is ready. Well, at least a version 0.1 of it. Actually it didn’t take too long to develop and it was a nice exercise.

In any case I have sent it to crawl the FriendFeed main feed at regular intervals and I should be getting some initial results soon. I will share them of course but first of all we need to understand the methodology and limitations of my analysis.

How does the crawler work?

The crawler starts with the current public feed. For each entry it extracts (discovers) the poster as well as usernames of people who liked or commented on that entry. For each user discovered the crawler reads his or her subscriptions list and keeps on going from there.

So generally speaking the process is: 1) read the feed 2) discover users 3) extend discovery through subscriptions 4) repeat.

What kind of data is collected?

The crawler generates a long list of pairs where each pair represents a single subscription, a relation between a subscriber and the user he is subscribed to. For example, the relation “Robert Scoble -> Michael Arrington” means Robert Scoble is subscribed to Michael Arrington’s feed. Given enough data, I should be able to tell you who else is subscribed to Arrington, at least among the relatively active users.

What are the imitations?

  1. Only active users are covered: if you open a new FF account, you have no other user subscribed to you and you do not publish/like/comment on anything then there’s no way my crawler can discover you. In my view this is a good limitation since it narrows the analysis down to the interesting users, excluding the inactive ones.
  2. Only public users are covered: if your feed is private I cannot read your subscription list or do anything of value for that matter.

Any interesting numbers to share?

The crawler is running. I will test it, run it once or twice, and then share the results when I decide I’ve reached critical mass.

Exploring the FriendFeed API

Lately I’ve been playing with the FriendFeed API. It is generally a well written and responsive API. There are a few bugs and limitations (I even found a little bug in the C# library).

The 2 most annoying limitations for me are:

  1. You cannot retrieve entries by a date range, you can only specify a start position and how many entries you want to retrieve. So if you’re trying to scan backwards you need to jump through hoops to decide when to stop.
  2. The “page 11″ limitation – try browsing FriendFeed and navigate through previous pages. Page 9, page 10, page 11, page 12 … What’s that? Clicking anything past page 11 just gives you the same results over and over again. With a limitation of 30 entries per API request this means you can get a maximum of 300 entries.

No other major rants at this point.