memonic

Applidium — Cracking Siri

Save

On October 14, 2011, Apple introduced the new iPhone 4S. One of its major new features was Siri, a personal assistant application. Siri uses a natural language processing technology to interact with the user.

Interestingly, Apple explained that Siri works by sending data to a remote server (that’s probably why Siri only works over 3G or WiFi). As soon as we could put our hands on the new iPhone 4S, we decided to have a sneak peek at how it really works.

Today, we managed to crack open Siri’s protocol. As a result, we are able to use Siri’s recognition engine from any device. Yes, that means anyone could now write an Android app that uses the real Siri! Or use Siri on an iPad! And we’re goign to share this know-how with you.

Demo

The best demo probably is Siri’s speech-to-text feature. We made a simple recording of us saying “autonomous demo of Siri”, and got a perfect result !

Sample_Siri_speech_to_text.zip

70.78 Ko Download

This sound sample never went through any iPhone, but nonetheless we got Siri to analyze it for us.

Understanding the protocol – A brief technical history

At Applidium we’re used to building mobile applications. The best way to chat with a remote server is HTTP, as it’s the protocol that is the more likely to work in any case.

The easiest way to sniff HTTP traffic is to setup a proxy server, configure your iPhone to use it, and look at what goes through the proxy. Surprisingly, when we did, we wouldn’t gather any traffic when using Siri. So we ressorted to using tcpdump on a network gateway, and we realised Siri’s traffic was TCP, on port 443, to a server at 17.174.4.4.

Going to https://17.174.4.4/ on a desktop machine we noticed that this server was presenting a certificate for guzzoni.apple.com. So it seemed like Siri was communicating with a server named guzzoni.apple.com over HTTPS.

As you know, the “S” in HTTPS stands for “secure” : all traffic between a client and an https server is ciphered. So we couldn’t read it using a sniffer. In that case, the simplest solution is to fake an HTTPS server, use a fake DNS server, and see what the incoming requests are. Unfortunately, the people behind Siri did things right : they check that guzzoni’s certificate is valid, so you cannot fake it. Well… they did check that it was valid, but thing is, you can add your own “root certificate”, which lets you mark any certificate you want as valid.

So basically all we had to do was to setup a custom SSL certification authority, add it to our iPhone 4S, and use it to sign our very own certificate for a fake “guzzoni.apple.com”. And it worked : Siri was sending commands to your own HTTPS sever! Seems like someone at Apple missed something!

That’s when we realised how Siri’s protocol is opaque. Let’s have a look at a Siri HTTP request. The request’s body is binary (we’ll get into that later), and here are the headers :

            ACE /ace HTTP/1.0
            Host: guzzoni.apple.com
            User-Agent: Assistant(iPhone/iPhone4,1; iPhone OS/5.0/9A334) Ace/1.0
            Content-Length: 2000000000
            X-Ace-Host: 4620a9aa-88f4-4ac1-a49d-e2012910921
            

A few interesting things :

  • The request is using a custom “ACE” method, instead of a more usual GET.
  • The url requested is “/ace”
  • The Content-Length is nearly 2GB. Which is obviously not conforming to the HTTP standard.
  • X-Ace-host is some form of GUID. After trying with several iPhone 4Ses, it seems to be tied to the actual device (pretty much like an UDID).

Now let’s move on to the body. The body is some raw binary content. When we first looked at it with an hex editor, we noticed it started with 0xAACCEE. Oh, seems like header ! Unfortunately, we couldn’t understand anything of what was after that.

That’s when we took some time to think. As people who are used to designing mobile application, we know there’s one thing which is very important when talking over a network : compression. The bandwidth is often limited, so it’s usually a very good idea to compress your data. And what is the most ubiquitous compression library around ? zlib:“http://zlib.net/”. It’s a very solid library, really efficient and powerful (makes sense, it’s half french!). So we tried to pipe that binary data through zlib. But nothing came out, we were missing a zlib header. That’s when we thought “hmm, so there’s already this AACCEE header in the request body. Maybe there’s some more ?”. We developpers like to keep things packed. 3 bytes is not a good length for a header. 4 would be. So we tried un-zipping after the 4th byte. And it worked!

Now when we unziped the content, we got onto some new binary data. Not very understandable either, but some parts were text. Among them, one caugh our attention : bplist00. Hurray, it seems like the data is some binary plist. After fiddling a little bit with that binary stream, we figured out it was made out of chunks :

  • Chunks starting with 0x020000xxxx are “plist” packets, xxxx being the size of the binary plist data that follows the header.
  • Chunks starting with 0x030000xxxx are “ping” packets, sent by the iPhone to Siri’s servers to keep the connection alive. Here xx is the ping sequence number.
  • Chunks starting with 0x040000xxxx are “pong” packets, sent by Siri’s server as a reply to ping packets. Without surprise, xx is the pong sequence number.

And deciphering the content of binary plists is very easy, you can do it on Mac OS X with the “plutil” command-line tool. Or in ruby with the CFPropertyList gem on any platform.

What we learned

We did really learn a few interesting things about how the iPhone 4S talks to Apple’s servers :

The audio data

The iPhone 4S really sends raw audio data. It’s compressed using the Speex audio codec, which makes sense as it’s a codec specifically tailored for VoIP.

Signature

The iPhone 4S sends identifiers everywhere. So if you want to use Siri on another device, you still need the identfier of at least one iPhone 4S. Of course we’re not publishing ours, but it’s very easy to retrieve one using the tools we’ve written. Of course Apple could blacklist an identifier, but as long as you’re keeping it for personal use, that should be allright!

The actual content

The protocol is actually very, very chatty. Your iPhone sends a tons of things to Apple’s servers. And those servers reply an incredible amount of informations. For example, when you’re using text-to-speech, Apple’s server even reply a confidence score and the timestamp of each word.

What’s next ?

Here’s a collection of tools we wrote to help us understand the protocol. They’re written mostly in Ruby (because that’s a wonderfully simple language), some parts are in C and some in Objective-C. Those aren’t really finished, but should be very sufficient for anyone technically inclined to write a Siri-enabled application.

Let’s see what fun application you guys get to build with it! And let’s see how long it’ll take Apple to change their security scheme! Follow us on twitter for updates on that subject : we’re “@applidium:”http://twitter.com/applidium .

seriously - Objective-C HTTP library

Save
seriously

The Objective-C HTTP library that Apple should have created, seriously. Read more

Seriously
---------
The iPhone needs a better way to make HTTP requests, specifically calls to
REST web services. Seriously mixes Blocks with NSURLConnection &
NSOperationQueue to do just that. It also will automatically parse the JSON
response into a dictionary if the response headers are set correctly.

Install
-------
Just drag the files from the "src" directory into your project. You can also try
using the included "Seriously.framework" file

Parse JSON EXAMPLE
------------------
    NSString *url = @"http://api.twitter.com/1/users/show.json?screen_name=probablycorey;"

    [Seriously get:url handler:^(id body, NSHTTPURLResposne *response, NSError *error) {
        if (error) {
            NSLog(@"Got error %@", error);
        }
        else {
            NSLog(@"Look, JSON gets parsed into an dictionary");
            NSLog(@"%@", [body objectForKey:@"profile_background_image_url"]);
        }
    }];

Simple Queue Example
--------------------
    NSArray *urls = [NSArray arrayWithObjects:
                     @"http://farm5.static.flickr.com/4138/4744205956_1f08ae40e3_o.jpg,"
                     @"http://farm5.static.flickr.com/4123/4744238252_d11d0df5a3_b.jpg,"
                     @"http://farm5.static.flickr.com/4097/4743596319_50cce97d80_o.jpg,"
                     @"http://farm5.static.flickr.com/4099/4743581287_7c50529b36_o.jpg,"
                     @"http://farm5.static.flickr.com/4123/4743587437_78f0906e8a_o.jpg,"
                     @"http://farm5.static.flickr.com/4136/4743562971_d5f5c6d5b1_o.jpg,"
                     @"http://farm5.static.flickr.com/4073/4744205142_be44e64ab7_o.jpg,"
                     nil];

    // By default the NSOperation will only do 3 requests at a time
    for (NSString *url in urls) {
        NSOperation *o = [Seriously request:url options:nil handler:^(id body,
        NSHTTPURLResponse *response, NSError *error) {
            NSLog(@"got %d (%@)", [urls indexOfObject:url], url);
        }];
    }

Why Are You Using Blocks?
-------------------------
Welcome to the future dude!

TODO
----
- Document
- Add XML parsing
- Add more options for NSOperationQueue management

Making setup satisfying

Save

Making setup satisfying

When you launch Highrise for iPhone for the first time it will download all your contacts and tasks to your phone. It doesn’t replace your iPhone address book – it just pulls the contacts down into the Highrise database so everything is local and fast.

One of the downsides to the initial download is that it can take some time depending on your connection and the number of contacts you have. Waiting for anything sucks, but what sucks more is being bored while waiting.

So we decided to give you something to do while the initial download is in progress. You can play tic-tac-toe while you wait. Just tap the button and the screen flips to a tic-tac-toe board. The download progress bar remains at the bottom so you see where you are while you tap away your time trying to beat the computer.

iPhone web development

Save
My notes from the Nocember 8, 2009 Webtuesday by Adrian Kosmaczewski.

Other:

ZSync

Save

ZSync

ZSync is a new Mac/iPhone library that uses my BLIP P2P networking protocol:

“ZSync is an open source syncing library designed to allow easy syncing of data between an iPhone/iPod Touch and the OS X Desktop.
ZSync utilizes the BLIP library and Apple’s Sync Services to allow easy and seamless syncing of data.”

It’s still in early development though, with a first public release expected in January:

Right now the code is in a private GitHub repository while the initial framework and protocols are fleshed out. This is expected to go public in January of 2010. Until then we are keeping the development team very small so that we can flesh out the design without a lot of overhead.

This looks like it’ll be super useful for iPhone apps that want to integrate with their Mac siblings, especially since their design won’t require you to have the Mac app running while you sync.