Home > Blog, Technomancy > Sampling Twitter to determine client breakdown

Sampling Twitter to determine client breakdown

April 14th, 2010 Leave a comment Go to comments

sampling the public timelineThe arrival of promoted tweets on Twitter got me thinking about how people post their tweets, as the initial dipping of the Twitter toe into monetization will only impact users of http://search.twitter.com.

So, how many people actually use the Twitter.com website to post their tweets? What are the most popular clients?

Twitter provides a useful sample streaming method in their API and Fenn Bailey has kindly written a PHP class called Phirehose to manipulate the sampled stream. With these tools, I was able to quickly modify the sample PHP script to grab a snapshot of the Twitter public timeline, extracting the source (e.g. web, Tweetdeck, Ubertwitter) of each tweet posted.

The stream captured data at a rate of about 25 tweets per second, so I just let it run for 15 minutes and gathered just over 20,000 records to analysis. Of course this sort of sampling should be extended over a 24-hour period to average out any variations across different timezones, but treat this as an rough first attempt rather than an authorative statistical experiment. As it was performed at around 12.30pm UK time, it pretty much excluded many American users. What was I thinking? I suppose I better repeat the experiment in 12 hours or something. Below are the top 20 clients, you can look at the full list here.

Update: I repeated the experiment later in the afternoon and posted a more accurate list of the top twenty Twitter clients.

So 32% of people are posting their tweets via the twitter website. I don’t know about you, but I thought the popular clients like Tweetdeck would make it out of single figures in percentage terms.

RANK SOURCE %
1 web 31.8%
2 UberTwitter 9.6%
3 TweetDeck 6.4%
4 API 4.8%
5 twitterfeed 4.0%
6 Echofon 3.9%
7 movatwitter 3.0%
8 Keitai Web 2.7%
9 Twitter for BlackBerry® 2.1%
10 Tween 1.9%
11 TwitBird iPhone 1.8%
12 Snaptu 1.5%
13 txt 1.3%
14 mobile web 1.3%
15 Tweetie 1.3%
16 mobile web 1.2%
17 Mobile Tweete 1.1%
18 Twitterrific 1.1%
19 HootSuite 0.9%
20 dabr 0.7%

The PHP script I used to extract the data was as follows:


require_once('../lib/Phirehose.php');
class SampleConsumer extends Phirehose
{
/**
* Enqueue each status
*
* @param string $status
*/
public function enqueueStatus($status)
{
$data = json_decode($status, true);
if (is_array($data) && isset($data['source'])) {
print $data['source']. "\n";
}
}
}

// Start streaming
$sc = new SampleConsumer('my_username', 'my_password', Phirehose::METHOD_SAMPLE);
$sc->consume();

This script was run from the command line on a SSH to my hosting server.

php5 my_sample.php > my_sample_data.txt &

Categories: Blog, Technomancy Tags: ,