Following my previous post, my FriendFeed crawler is ready. Well, at least a version 0.1 of it. Actually it didn’t take too long to develop and it was a nice exercise.
In any case I have sent it to crawl the FriendFeed main feed at regular intervals and I should be getting some initial results soon. I will share them of course but first of all we need to understand the methodology and limitations of my analysis.
How does the crawler work?
The crawler starts with the current public feed. For each entry it extracts (discovers) the poster as well as usernames of people who liked or commented on that entry. For each user discovered the crawler reads his or her subscriptions list and keeps on going from there.
So generally speaking the process is: 1) read the feed 2) discover users 3) extend discovery through subscriptions 4) repeat.
What kind of data is collected?
The crawler generates a long list of pairs where each pair represents a single subscription, a relation between a subscriber and the user he is subscribed to. For example, the relation “Robert Scoble -> Michael Arrington” means Robert Scoble is subscribed to Michael Arrington’s feed. Given enough data, I should be able to tell you who else is subscribed to Arrington, at least among the relatively active users.
What are the imitations?
- Only active users are covered: if you open a new FF account, you have no other user subscribed to you and you do not publish/like/comment on anything then there’s no way my crawler can discover you. In my view this is a good limitation since it narrows the analysis down to the interesting users, excluding the inactive ones.
- Only public users are covered: if your feed is private I cannot read your subscription list or do anything of value for that matter.
Any interesting numbers to share?
The crawler is running. I will test it, run it once or twice, and then share the results when I decide I’ve reached critical mass.