Connected Second-Screen App Experiences with PhoneGap & Audio Watermarks

In January 2012, I started the year with a post on multi-screen applications developed with PhoneGap. In that post, I describe an approach for creating mobile applications that extend the app experience onto a second screen - where the mobile device drives the content on that second screen… essentially having an “external monitor” capability on a mobile device.

Now, I’m going to turn things around… I’ve been experimenting with a few ideas of connected secondary-experience applications, and I figured this would be a great way to come full circle and end 2012. I see the secondary app experience as having huge potential for our connected/media-centric world. The secondary app experience is different in that the application is your “second screen”, perhaps a companion to something else that you are doing. For example, the secondary screen is a mobile application that augments the experience of watching television. Perhaps it is a mobile application that augments the experience of playing a video game, along the same concept as Xbox Smart Glass though not tied to a particular platform. The key element is that the mobile application is not only an augmented experience to the television-based content, but that it is also updated in real time as you watch the program, or as you play the game.

In this post I’ll show a proof-of-concept second screen experience where the content of a mobile PhoneGap application is being driven by an external source (a video) using audio watermarks. In this case, the mobile application is the “second screen”, and your TV is the primary screen. I’d also like to emphasize that this is just a proof of concept - the methods and code in this example are not yet suitable for a production-quality use case for reasons I’ll describe below, but are a great starting point for further exploration.

The Concept

Let’s start with the core concept: a synchronized experience between a content source (TV or other) and a mobile application. Since we are talking about TV or media-based content, you can’t rely on being able to create a client-server architecture for synchronization between the media source and the mobile app. This just wouldn’t be possible due to legacy TV hardware, and the fact that there is no way to digitally synchronize the content. However, TVs are great at producing sounds, and it is very possible to use sound-based cues to invoke actions within a mobile application.

Now let’s focus on audio watermarks: Audio watermarks are markers embedded within an audio signal. They may be either human-imperceptible or within the range of human hearing. In general, humans can hear frequencies between 20Hz and 20kHz, with that range decreasing with age. While we may not be able to hear the markers, mobile devices are able to detect them. When these markers are “heard” by your device, they can invoke an action within your application.

Next, Let’s take a look at my proof of concept application. The proof of concept exemplifies a mobile application themed with content from the HBO series Game of Thrones, synchronized with the opening scene from the TV series. As castles & cities are shown in the video, the content within the mobile applications is updated to show details about each location. In a nutshell:

In the video below, you can see the proof of concept in action. It shows the synchronization between the video and the PhoneGap-based application, with a brief description from yours truly.

Note: I have no association with Game of Thrones, HBO, George R.R. Martin, or the book series “A Song of Fire and Ice“. I just thought this made a compelling example. I enjoyed both the book series and the show and recommend them. Full credit for video and mobile content available.

The Application Implementation

There are several ways that you can do audio watermarks. The most basic of which is to embed a single tone in the audio stream, and check for the presence of that tone. The first thing that I started exploring was how to identify the dominant frequency of a sound. A quick Google search yielded an answer in the first result. This post not only describes how to detect the dominant sound frequency on iOS, but also has a downloadable project in github that you can use to get started. No exaggeration, I had this project up and running within minutes. It operates kind of like a guitar tuner… the application detects the dominant frequency of a sound and displays that in the UI.

At a much later time, I also discovered this sample project from Apple, which also demonstrates how to detect frequencies from an audio stream (used for the frequency waveform visualization). This will be useful for maturing the concepts shown here.

Creating Watermarks

Once I had the sample native iOS project up and running, I started exploring inaudible audio “tones” and testing what the devices could accurately detect. I initially started using tones above the 20kHz frequency range, so that humans would not be able to hear the watermark. Tones in the range of 20-22kHz worked great for the iPhone, but I quickly realized that the iPad microphone was incapable of detecting these watermarks, so I dropped down to the 18-20kHz range, which the iPad was able to pick up without any problems. Most adults won’t be able to hear these frequencies, but small children may hear them, and they may drive your pets crazy.

The first thing I did was create “pure” audio tones at specific frequencies using Adobe Audition. In Audition, create a new waveform, then go to the “Effects” menu and select “Generate Tones”. From here, you can create audio tones at any frequency. Just specify your frequency and the tone duration, and hit “OK”. I used 3-second tones to make sure that the tone was long enough to be perceived by the device.

I did this for tones in the range of 18-22kHz, and saved each in a separate wav file. Some of which you can find in the GitHub sample. These files were used for testing, and were embedded in the final video.

To embed the audio watermarks in the video, I fired up Adobe Premiere and started adding the inaudible tones at specific points in time within the video.

By playing specific tones at specific times, you can synchronize events within your application to those specific tones. This means that you can reliably synchronize in-app content with video content.

Let me reiterate… this is only a proof of concept implementation. These watermarks worked great locally, but wouldn’t work in a real-world solution as-is. I also ran into a few major issues when embedding the watermarks - See the “lessons learned” below for details.

The PhoneGap Implementation

The next logical step was to take the native code example and turn it into a PhoneGap native plugin so that it can be used within a PhoneGap application. I stripped out the native user interface and exposed an API that would allow the PhoneGap/JavaScript content to register to listen for specific frequencies. If these frequencies are registered as the dominant frequency of a sound, the native plugin invokes the JavaScript callback JavaScript function that is mapped to that particular frequency. Using this approach, a unique JavaScript function can be assigned to each frequency.

The final step was to build a user interface using HTML, CSS, & JavaScript that could respond to the audio watermarks. This was the easy part. First, I created a basic project that showed the reception and handling of specific audio frequencies. Next, I created the actual application content themed around the Game Of Thrones video.

The Final Product

You can view the completed project, the sample tones, and the video containing the embedded tones on GitHub.

Lessons Learned

This was a really fun experiment, and I definitely learned a lot while doing it. Below are just a few of my findings:

Dominant Frequency

Dominant Frequency watermarks are not the way to go in a real-world solution for many reasons. The main reason being that the watermark has to be the loudest and most predominant frequency in the captured audio spectrum. If there is a lot of other audio content, such as music, sound effects, talking, etc…, then the watermark has to be louder than all of the other content, otherwise it will not be detectable. This alone is problematic. If you are normalizing or compressing your audio stream, this can cause even more problems. A multi-frequency watermark that is within the audible range, but is un-noticeable would be a more reliable solution.

High-Frequency Watermarks

High-frequency watermarks are also problematic. High-pitch frequencies may be beyond the capabilities of hardware devices. Speakers may have problems playing these frequencies, or microphones may have problems detecting these frequencies, as I discovered with the iPad. High-pitch frequencies also may have issues when encoding your media. Many compression formats/codecs will remove frequencies that are beyond human hearing, thus removing your watermarks. Without those watermarks, there can be no synchronization of content.

Time-Duration or Sequential Tones

The current implementation only detects for a dominant frequency, without a duration. If that frequency is encountered, it triggers the listening JavaScript function regardless of how long the sound was actually being played. All of my experimental tones lasted 3 seconds, so I could ensure it played long enough to be detected. However, I noticed that some of my frequency listeners would be triggered if I slid my mouse across the desk. While the action of moving my mouse across my desk was very brief and I could not hear it, the action apparently generated a frequency that the application could detect. This triggered some of the frequencies that the app was listening for. If there was a minimum duration for the watermark frequency, this erroneous triggering of the event would not have occurred. You could also prevent misfires of audio watermarks by requiring specific series of tones in a sequence to trigger the action.

Media Production and Encoding

If you are using audio frequencies that are near the upper-range of human hearing, you have to be careful when you encode your media content. If the “inaudible” sound waveforms are over-amplified and are clipped, it has the potential to cause an extremely unpleasant high frequency noise that you can hear. I strongly recommend that you do not do this - I learned this from experience.

Additionally, if you are using high-frequency tones, be careful if you transcode between 16 and 32 bit formats or if you transcode sample rates. Transcoding between 16/32 bit depth or between sample rates can cause the inaudible sounds to become audible with very unpleasant artifacts. I found that I had the best results if the Sequence settings in Premiere, the export format, and the source waveform all had the exact same bit depth (32) and sample rate (41000kHz).

Findings

From this exploration, some reading, and a lot of trial and error, I think it would be better to have a multi-frequency watermark for a minimum duration. Rather than having one specific frequency that dominates the audio sample, the application would detect for elevated levels of specific frequencies for a minimum period of time. This way the watermark frequencies don’t have to overpower any other frequencies, and the watermark frequencies can be within the normal range of human hearing without being noticed. This also gives you the ability to have significantly more watermarks by using combinations of frequencies. Since the watermark tones would be within the normal range of human hearing, you also would be better able to rely on common hardware to be able to accurately detect those watermarks.

Conclusion

The main conclusion: not only is it really cool to control your mobile app from an audio source, this can be incredibly powerful for connected experiences. There are already TV shows and apps out in the real world employing the audio-watermarking technique to achieve a synchronized multi-screen experience. My guess is that you will start to see more of these experiences in the not-so-distant future. This is an inexpensive low-fi solution that has potential to work extremely well, and has applications far beyond just the synchronization of an app content with a TV show.

Here are just a few ideas where this could be applied:

TV & Movies: connected app and media experiences
Video games: connected “companion” applications to augment the gaming experience
Targeted advertising: Imagine you are using an app while in a retail store & you receive advertisements just be being in the store. The watermarks could be embedded within the music playing in the store.
Product placement: Imagine that you area watching a movie, and your favorite actor is drinking your favorite soda… you look down at your device, and you also see an advertisement for that same brand of soda.
Museums: Imagine you have a mobile app for your favorite museum. While in the museum, there is an audio track describing the exhibits, or just playing background music. When you approach an exhibit, your app shows you details about that exhibit, all triggered by the sound being played within the museum.

The applications of audio watermarking are only limited by our imaginations. This is a low-cost solution that could enable connected experiences pretty much everywhere that you go. The goal of this experiment was to see if these techniques are possible within PhoneGap apps, and yes, they are.

While PhoneGap is a multi-platform solution, you may have noticed that this proof of concept is iOS only. I’m planning on developing this idea further on iOS, and if successful, I’ll considering porting it to other platforms.

Enjoy!

iPad designed by Jason Schmitt from The Noun Project
Television designed by Andy Fuchs from The Noun Project