Developers unravel protocol Siri-speech recognition

The French app developer Applidium has the protocol behind Siri-the voice recognition of the iPhone 4S revealed. The company has a code released that the Siri servers from any device can be accessed.

The company says that with the released tools and code a random audio stream to the servers of Apple can be managed and that these then can be converted to text. This would, in theory, next to Siri on the iPhone 4S, other applications can make use of the text-to-speech algorithms on Apple’s servers.

Although the tools of Applidium make it possible to get from any device audio streams to the Siri servers to send, an iPhone 4S or, in any case, the unique code of that device, is required. Each audio stream that is sent to Apple, should have a specific header. Part of this header is the variable ‘X-Ace-Host’; a value for each iPhone is unique.

The French company is doing in a blog post explained how the protocol works. By tcp traffic from an iPhone 4S to listen to was outdated which server to Siri, the audio streams sent to analysis. Then put the business locally on a server and spoofde the dns data so that Siri displays communicated in place of the official Apple servers. To do this, the company had to own a ssl-certificate created for the local server, and the iPhone 4S into thinking that this was legitimate.

When the server, once operational, could begin with decoding the data packets that Siri sent. The software compresses the voice commands with the Speex audio codec before sending. Apple’s servers send a extremely comprehensive answer with not only the textual version of the audio stream, but also a score that indicates how accurate the conversion is, and the timestamp at which the individual words were pronounced.