UPDATE 12/22/15: IBM Recently released a new iOS SDK for Watson that makes integration with Watson services even easier. You can read more about it here.
I recently gave a presentation at IBM Insight on Cognitive Computing in mobile apps. I showed two apps: one that uses Watson natural language processing to perform search queries, and another that uses Watson translation and speech to text services to take text in one language, translate it to another language, then even have the app play back the spoken audio in the translated language. It’s this second app that I want to highlight today.
In fact, it gets much cooler than that. I had an idea: “What if we hook up an OCR (optical character recognition) engine to the translation services?” That way, you can take a picture of something, extract the text, and translate it. It turns out, it’s not that hard, and I was able to put together this sample app in just under two days. Check out the video below to see it in action.
To be clear, I ended up using a version of the open source Tesseract OCR engine targeting iOS. This is not based on any of the work IBM research is doing with OCR or natural scene OCR, and should not be confused with any IBM OCR work. This is basic OCR and works best with dark text on a light background.
The Tesseract engine lets you pass in an image, then handles the OCR operations, returning you a collection of words that it is able to extract from that image. Once you have the text, you can do whatever you want from it.
So, here’s where Watson Developer Cloud Services come into play. First, I used the Watson Language Translation Service to perform the translation. When using this service, I make a request to my Node.js app running on IBM Bluemix (IBM’s cloud platform). The Node.js app acts as a facade and delegates to the Watson service for the actual translation.
You can check out a sample on the web here:
On the mobile client, you just make a request to your service and do something with the response. The example below uses the IMFResourceRequest API to make a request to the server (this can be done in either Objective C or Swift). IMFResourceRequest is the MobileFirst wrapper for networking requests that enables the MobileFirst/Mobile Client Access service to capture operational analytics for every request made by the app.
[objc]NSDictionary *params = @{
@"text":text,
@"source":@"en",
@"target":language
};
IMFResourceRequest * imfRequest =
[IMFResourceRequest requestWithPath:@"https://translator.mybluemix.net/translate"
method:@"GET" parameters:params];
[imfRequest sendWithCompletionHandler:^(IMFResponse *response, NSError *error) {
NSDictionary* json = response.responseJson;
NSArray *translations = [json objectForKey:@"translations"];
NSDictionary *translationObj = [translations objectAtIndex:0];
self.lastTranslation = [translationObj objectForKey:@"translation"];
// now do something with the result – like update the UI
}];[/objc]
On the Node.js server, it is simply taking the request and brokering it to the Watson Translation service (using the Watson Node.js SDK):
[js]app.get(‘/translate’, function(req, res){
language_translation.translate(req.query, function(err, translation) {
if (err) {
console.log(err)
res.send( err );
} else {
console.log(translation);
res.send( translation );
}
});
});[/js]
Once you receive the result from the server, then you can update the UI, make a request to the speech to text service, or pretty much anything else.
To generate audio using the Watson Text To Speech service, you can either use the Watson Speech SDK, or you can use the Node.js facade again to broker requests to the Watson Speech To Text Service. In this sample I used the Node.js facade to generate Flac audio, which I played in the native iOS app using the open source Origami Engine library that supports Flac audio formats.
You can preview audio generated using the Watson Text To Speech service using the embedded audio below. Note: In this sample I’m using the OGG file format; it will only work in browsers that support OGG.
English: Hello and welcome! Please share this article with your friends!
Spanish:
Hola y bienvenido! Comparta este artículo con sus amigos![js]app.get(‘/synthesize’, function(req, res) {
var transcript = textToSpeech.synthesize(req.query);
transcript.on(‘response’, function(response) {
if (req.query.download) {
response.headers[‘content-disposition’] = ‘attachment; filename=transcript.flac’;
}
});
transcript.on(‘error’, function(error) {
console.log(‘Synthesize error: ‘, error)
});
transcript.pipe(res);
});[/js]On the native iOS client, I download the audio file and play it using the Origami Engine player. This could also be done with the Watson iOS SDK (much easier), but I wrote this sample before the SDK was available.
[objc]//format the URL
NSString *urlString = [NSString stringWithFormat:@"https://translator.mybluemix.net/synthesize?text=Hola!&voice=es-US_SofiaVoice&accept=audio/flac&download=1", phrase, voice ];
NSString* webStringURL = [urlString stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSURL *flacURL = [NSURL URLWithString:webStringURL];//download the contents of the audio file
NSData *audioData = [NSData dataWithContentsOfURL:flacURL];
NSString *docDirPath = NSTemporaryDirectory() ;
NSString *filePath = [NSString stringWithFormat:@"%@transcript.flac", docDirPath ];
[audioData writeToFile:filePath atomically:YES];//pass the file url the the origami player and play the audio
NSURL* fileUrl = [NSURL fileURLWithPath:filePath];
[self.orgmPlayer playUrl:fileUrl];[/objc]Cognitive computing is all about augmenting the experience of the user, and enabling the users to perform their duties more efficiently and more effectively. The Watson language services enable any app to greater facilitate communication and broaden the reach of content across diverse user bases. You should definitely check them out to see how Watson services can benefit you.
MobileFirst
So, I mentioned that this app uses IBM MobileFirst offerings on Bluemix. In particular I am using the Mobile Client Access service to collect logs and operational analytics from the app. This lets you capture logs and usage metrics for apps that are live “out in the wild”, providing insight into what people are using, how they’re using it, and the health of the system at any point in time.
Be sure to check out the MobileFirst on Bluemix and MobileFirst Platform offerings for more detail.
Source
You can access the sample iOS client and Node.js code at https://github.com/triceam/Watson-Translator. Setup instructions are available in the readme document. I intend on updating this app with some more translation use cases in the future, so be sure to check back!