The web’s real-time scalability problem (or Twitter’s 600 tweet per second problem)

Twitter’s geolocation guru Raffi Krikorian recently gave an interview to O’Reilly ahead of next month’s Where 2.0 conference in San Jose. The interview is obviously heavy on how Twitter is working with geo, but the last question, or more specifically, the last answer struck a chord and goes straight to the problem of trying to deal with the massive amounts of real-time data a lot of web players are dealing with today.

James Turner: What do you see as the technical side of geolocation, in terms of what’s going to be the new interesting technologies coming along, and how they’re going to be used?

Raffi Krikorian: From Twitter’s standpoint, it’s how do you accept all of this real-time data, index and analyze it and spread it throughout our system in almost real-time. People have traditionally built a bunch of GIS-like systems on top of PostgreSQL or on top of MySQL, and that’s fine, but it doesn’t scale after a while. After you throw a couple million or a couple hundred million entries at it, the amount it takes for one of those databases to process that, to insert it, all I have to do is select against it, and you can understand it’s untenable for real-time operation. And by real-time, I mean sub-second operation.

So the stuff that we’re doing is more geared towards how can you accept tweets that are coming in at what you can imagine to be an incredibly fast rate. Tweets are coming in, figure out their location, attach appropriate metadata data to it. Store it in our database. Span it out to anyone who wants to look at it. Run research and analytics on it and index it in their search index, and do this all within a couple of seconds on the way through the system. I think there’s a lot of interesting stuff being done out there on how things are being stored, how things are being indexed. But I think our personal contribution will be how do you do it at that kind of speed?

Of course, so far Twitter has been doing an admirable job (Fail Whales excluded) of providing uptime on a service with (forget a couple million) 9 billion rows in its statuses table — a table that’s growing by 50 million rows per day.

Those 600 tweets per second are certainly what turned Twitter away from a SQL cluster to using rival Facebook’s Cassandra system. On Tuesday, Twitter’s Ryan King explained why Twitter is turning to Cassandra.

We have a system in place based on shared mysql + memcache but its quickly becoming prohibitively costly (in terms of manpower) to operate. We need a system that can grow in a more automated fashion and be highly available.

My day job involves a lot of media monitoring with such products at Radian 6. Speaking last week with one of R6′s competitors, I was told that no one in the media monitoring space can really do real-time monitoring — there’s just too much data. I think that’s overstating the challenge a little bit, but it will surely be some time before a company can say it’s “real-time” without employing an army of engineers.

Counting sheep

For anyone who has ever had insomnia, this study reported in today’s Times.

What they found was that subjects took slightly longer to fall asleep on nights they were instructed to distract themselves by counting sheep or were given no instructions at all. But when they were told to imagine a relaxing scene — a beach, for example — they fell asleep an average of 20 minutes sooner than they did on other nights. Counting sheep, the scientists suggested, may simply be too boring to do for very long, while images of a soothing shoreline or tranquil stream are engrossing enough to concentrate on.

Silverman shocks TED

TechCrunch is currently gloating over Chris Anderson of TED’s disappointment with Sarah Silverman’s TED Talk. I love watching the videos of ideas that come out of TED, but I’m also a fan of Silverman’s shock humor. Anderson’s reaction is completely unreasonable, and it reminds me of Stephen Colbert’s 2006 White House Correspondent’s Dinner speech. Silverman, like Colbert, has a certain brand of comedy, and if you don’t know what you’re getting into when you sign on to one of them, that’s your own problem and it shows a real disconnect.

(This isn’t the first time I’ve embedded this video.)

Wing review: Bulldog Northeast and Ginger Hop: Happy Hour Battle

Those who know me or happen to be my friend on Foursquare sure know my favorite hang around town is The Bulldog Northeast. A new favorite since opening last autumn is Ginger Hop.

Despite one being gastro-American and the other Asian-fusion, the two places have some things in common: They’re two blocks apart, part of what I’ve dubbed the Beer Triangle (Mac’s Industrial Sports Bar is the third point of the triangle), have the same 3 p.m. to 6 p.m. happy hour and they both have wings on the menu. Until today, I’d never tried the wings at the Bulldog or Ginger Hop, but what else is one to do on a President’s Day afternoon but eat wings?

Ginger Hop Chicken Wings

Ginger Hop’s wings are listed on the happy hour menu as Hop Wings for just $5. Along with $3 dollar tap beers (I had a Bell’s Two Hearted), I’ll cut the suspense, Ginger Hop wins this battle by a long shot. The wings are just slightly spice and a little bit sticky-sweet; they came covered with scallions and served with blue cheese and celery, and were some of the best wings I’ve had in weeks.

The Bulldog’s wings aren’t on the happy hour menu and go for $8. I got another Two Hearted for $4. Now there aren’t many things on the Bulldog’s menu that I don’t like, but these buffalo wings were way below acceptable. They came in the Bulldog’s signature checkered paper basket; the chicken tasted less than fresh and wasn’t so much coated in a nice thick sauce as swimming in a pool of buffalo juice.

(An aside: try the Bulldog’s short rib sandwich, which is a special this week, but I’m told has a good chance of making it on to the regular menu; absolutely delicious.)

If you’re hungry for wings in Northeast Minneapolis late in the afternoon, do your wallet and a stomach a favor and head over to Ginger Hop.

Which Asus netbook should I buy?

I’m in the market for a netbook; I’ve pretty much got my choice narrowed down to two quite different Asus models: the 1005PE or 1201N.

For all of last year, a netbook was pretty much defined by the following features:

Intel Atom N270 or N280 processor
10.1 display, 1024×600 resolution
1 GB RAM
160 GB hard drive
An inability to play streaming HD Flash video
A webcam
Windows XP
6-10 hours of battery life

The new year brought a couple small changes:

Intel Atom N450 processor
250 GB hard drive
Windows 7 Starter

Here’s how the 1005PE and 1201N stack up:

Feature Asus 1005PE Asus 1201N
Processor N450 1.66 GHz N330 1.6 GHz dual core
Memory 1GB 2GB
Storage 250GB 250GB
Advertised Battery Life
(Real-Life Test Time)
14 hours
(11 hours)
5 hours
(4 hours)
Operating System Windows 7 Starter Windows 7 Home Premium
Display 10.1 inches, 1024×600 12.1 inches, 1366×768
Can Stream HD Flash Video No Yes
HDMI No Yes
Price $390 $484

If you have any experience with either model I’d love to hear it. For me, the main apps for a netbook will be Google Reader, Gmail and Windows Live Writer. An app like Reader definitely benefits from the extra 168 horizontal lines of vertical resolution of the 1201N, and the ability to stream HD Flash video and output it through HDMI is certainly a nice-to-have; but I’m not sure either of those make up for the less-than-half-as-good battery life, the not-as-compact package or the $100 premium.

What do you think?



(Google Reader in Chrome at 1024×600 resolution)

Facebook vs. Twitter vs. Buzz

Facebook vs. Twitter vs. Google Buzz is far from a perfect comparison, but won’t you help me fill in this chart?

  Biggest Strength Biggest Weakness
Facebook Full integrated. Photos, videos, status updates, groups, profiles in one (fairly) easy-to-navigate interface.  ?
Twitter Crazy simple.  ?
Google Buzz  ? Unruly. Still doesn’t know my social circle after years of Google trying to figure it out.

We can has 1 Gbps Internet in Minneapolis?

Earlier on Google Buzz I asked if cities can make pitches to Google to get in on the 1Gbps fiber-to-the-home connections that company plans on bringing to as many as 500,000 people. I assume no one saw that because it was on Google Buzz, but I digress.

Turns out cities can apply here.

Stacey Higginbotham suggested Austin, Texas get in on the action.

I think Austin needs to let Minneapolis have some fun for once.

Cities have until March 26 to respond. So how do me make this happen? Who do we call? Who gets the City Hall interested in this?

Amazon cloud gets cheaper

Amazon Web Services is the only business I can think of that regularly e-mails me to tell me they’ve lowered their prices. Kudos.

———- Forwarded message ———-
From: Amazon Web Services <no-reply-aws@amazon.com>
Date: Tue, Feb 2, 2010 at 2:20 AM
Subject: AWS Lowers Outbound Data Transfer Pricing
To: “doughamlin@gmail.com” <doughamlin@gmail.com>

Dear AWS Customer,

As you know, we are constantly working to drive our costs down and become more operationally efficient. We then pass on those cost savings to our customers in the form of lower prices. Today, we are pleased to announce that we are lowering AWS pricing for outbound data transfer by $0.02 across all of our services, in all usage tiers, and in all Regions. These changes are effective February 1, 2010.

The new outbound data transfer pricing will be:

  • First 10 TB per Month: $0.15 per GB
  • Next 40 TB per Month: $0.11 per GB
  • Next 100 TB per Month: $0.09 per GB
  • Over 150 TB per Month: $0.08 per GB

Amazon CloudFront, the easy-to-use content delivery service, continues to have its own outbound data transfer pricing schedule in order to offer the lowest possible rates for each edge location. Effective February 1, Amazon CloudFront will also reduce its outbound data transfer prices by $0.02 per GB across all edge locations and for each usage tier.

Please see the pricing section for any of the AWS infrastructure services on the AWS website for more information. Thank you, as always, for your support.

Sincerely,

The Amazon Web Services Team

We hope you enjoyed receiving this message. If you wish to remove yourself from receiving future product announcements or the AWS Newsletter, please update your communication preferences.

Amazon Web Services LLC is a subsidiary of Amazon.com, Inc. Amazon.com is a registered trademark of Amazon.com, Inc. This message produced and distributed by Amazon Web Services, LLC, 1200 12th Ave South, Seattle, WA 98144.

iPad reading link dump

Since you didn’t ask, here’s just a handful of things I’ve read about the iPad the past few days. Be sure to read the top two. 

In the New World, computers are task-centric. We are reading email, browsing the web, playing a game, but not all at once. Applications are sandboxed, then moats dug around the sandboxes, and then barbed wire placed around the moats. As a direct result, New World computers do not need virus scanners, their batteries last longer, and they rarely crash, but their users have lost a degree of freedom. New World computers have unprecedented ease of use, and benefit from decades of research into human-computer interaction. They are immediately understandable, fast, stable, and laser-focused on the 80% of the famous 80/20 rule.

Is the New World better than the Old World? Nothing’s ever simply black or white.

Old World vs. New World computing 

Used to be you could argue that Flash, whatever its merits, delivered content to the entire audience you cared about. That’s no longer true, and Adobe’s Flash penetration is shrinking with each iPhone OS device Apple sells.

What’s Hulu going to do? Sit there and wait? Whine about the blue boxes? Or do the practical thing and write software that delivers video to iPhone OS? The answer is obvious. Hulu doesn’t care about what’s good for Adobe. They care about what’s good for Hulu. Hulu isn’t a Flash site, it’s a video site. Developers go where the users are.

Who Can Do Something About Those Blue Boxes?

The Killer App: iPad Board Games

Why My Mom’s Next Computer Is Going To Be An iPad

HTML5 is Great for Mobile, Developers Say

Web developers can rule the iPad

Why Bigger Is Better: The iPad And The Arc of Computing

How many icons on that iPad dock?

What the iPad Tells Us About Mobile Broadband Pricing

The iPad Will Make Apple’s Acquisition Of Quattro Wireless Look Even Smarter

5 Things The iPhone Could Learn From The iPad

Various and Assorted Thoughts and Observations Regarding the Just-Announced iPad

A new class of content for a new class of device