Wireless Week: Mobile Video Calling: A Way Forward

View the article: Wireless Week, April 12, 2011

by Shamim Naqvi, founder CTO and chairman of Aylus Networks.

With Skype recently reporting that, on average, 40 percent of Skype-to-Skype calls include video and Cisco projecting that video will be 66 percent of the world's mobile traffic by 2014, it is clear that consumption of mobile video is driving both the opportunities and the problems in today's networks.

It is an inescapable fact of life today that narrowband is not enough, plain and simple. The genie has come out of the bottle and it cannot be put back in. The drivers for this change started several years ago, and most are accelerating:

First, mobile devices continue to become "smarter" with every new device that hits the market.

Second, multimedia in general and mobile video in particular has come to play an increasingly fundamental role in business and social communications. Voice and text-based communications are evolving into "rich media" communications.

Third, service providers are beginning to see that they can and should play a crucial and central role in the emerging arena of rich media communications, especially if these communications are to serve a countrywide or worldwide subscriber base.

Voice calls are available on a worldwide basis, still work as flawlessly as any current technology and remain a vital component of any enterprise or person's daily actions. Nobody on the planet needs to be taught how to make a voice call. In contrast, a video call today still requires the launching of a mobile application on a mobile device or computer, knowledge of how to make and sustain the video call, understanding of how to reach (i.e., address) the other party (or parties) and pre-knowledge of whether the other parties are available for this service, either at the moment of the call or at all.

Video calling coverage has been spotty, with many attempts resulting in failure because either the receiving party is unavailable, does not have the right device or application, is in a bad coverage area, or is just not in a situation that allows a video call. If and when the lottery of making a video call succeeds, there is a complete lack of supplementary services Answering a video call often means that any other incoming voice calls are lost, and that voice mail, call forwarding, and other voice calling features — capabilities most of us take completely for granted — are unavailable.

Monetizing Mobile Video

Setting up video calling as a competing service to voice calling is bound to fail in the long run. The voice call infrastructure has decades of experience, features and reliability at its core. The most viable option for monetizing video calls would be to enable video to be integrated into a voice call — to harmoniously "integrate" the voice and video calling infrastructures so that features and services of voice and video calling work seamlessly across both domains. In other words, the best possible option is for mobile video to become just one of several options or modalities that can be employed within a communication session between parties.

It is likely that mobile video services will first be adopted by enterprise customers, then by social communities, and, lastly by the general public, loosely paralleling the monetization of email from enterprise networks to the general public. Supporting video calling on mobile services requires more than large amounts of bandwidth; it also requires good latency and congestion control mechanisms, among other networking services. The availability of such facilities in a mobile networking environment not only requires newer radio technologies (e.g., the forthcoming LTE technology) but also general policy-based quality-of-service (QoS) mechanisms, a technology feature that will be expensive and, initially anyway, not broadly available. High-value, high-ARPU enterprise customers will likely be able to afford and adopt the technology before it is affordable by the general public. In addition, where the general public still has some privacy concerns related to video calling, enterprises and social networks do not exhibit such reticence.

Essential Technology for Rich Communication Sessions

When a subscriber initiates a session, what happens is essentially invisible to the subscriber, but it is of paramount importance to the service provider. The subscriber expects the session to be established, but the service provider does not know at that point what "modality" will start the proceedings. Is the subscriber going to use voice, text, or video? Session initiation is a signal to the service provider to be prepared for any and all of these modalities — voice, text, video, etc. It is the task of the session manager to acquire line up, utilize and release resources that may be needed during the lifetime of the session, regardless of what type of communication will be done during that session.

The essential piece of technology for enabling this is a robust session manager, or in other words a switch, which serves the same general purpose as the switches used in voice telephony. This switch needs to support traditional call flows, e.g., two-party calls, single-party calls, multi-party calls, network-initiated third-party calls, etc. (Note that a communication session wherein a subscriber requests a video from a web content server such as YouTube falls under one of these call flows.) It needs to support supplementary features and interactions with other telephony services such as voice mail and call forwarding. It must provide or utilize the media processing services that may be needed for a variety of devices, which may be using incompatible media formats, codecs, etc. And it needs to do all this in as reliable and stable a fashion as the voice switches of today.

The above is not to be interpreted as a call to arms for a massive engineering feat involving the design of such a switch. Such an endeavor is entirely possible, though the magnitude of the effort must not be underestimated. Introduction of a new technology is hard enough; replacing an existing and working technology is doubly so. As an example, VoIP has had a decade to mature as a technology and it is still a fraction of the voice communications infrastructure and market.

Instead, the preferred approach is one that harmoniously lives with and seamlessly uses the existing and current voice infrastructure—an ancillary switch (an 'adjunct', to use an old-fashioned term) that supplements existing voice switches. Moreover, the adjunct should be deployable in today's PSTN/IP networks, IMS networks and LTE/EPC environments.

Integration? Or Service Control...

Proposing an adjunct to a Mobile Switching Center (MSC) pre-supposes an integration effort that "ties" the two switches together so that they may cooperate in handling a rich media communication session.

Or does it? One of the hardest technical things to do in software is to undertake an integration effort between a legacy code base (developed over decades) and a set of new features that are themselves complicated. An additional complication is the existence of two radio networks, the circuit-switched radio network and the IP radio network. A simple voice call only requires a circuit-switched connection, whereas a video call will be best handled by a VoIP+Video connection on the IP network (synchronizing a circuit-switched voice connection with video on the IP network would be a difficult engineering problem). Furthermore, certain radio modulation technologies do not support multiple simultaneous radio access bearers.

Thus, the service control function needs to decide what radio network to choose when in an ongoing session. To make matters worse, dozens of supplementary services (voicemail, call forwarding, etc.) have been defined over decades for voice telephony, services people have come to depend on. These services may and will impinge on an ongoing video call, necessitating a changeover from an IP network to a circuit-switched network to handle the feature. This means that the service control function not only needs to make a choice of the radio network, it may have to re-make the choice several times during an ongoing session.

It is therefore evident that a very complicated control and bearer integration is required between the MSC and the 'adjunct', leading to a very tight and custom integration between the two switches. And as mentioned above, this would not be an easy software development effort, technically, politically and commercially.

What is more possible and easier is an approach that requires virtually no integration between the two switches (MSC and 'adjunct'), other than a standard SCF (Service Control Function) interface and a 3PCC (3rd-Party Call Control) interface on both the MSC and the 'adjunct'. Both of these interfaces are defined by current standards and are available on all switches manufactured today. These two interfaces can be driven by special logic incorporated into a handset client that initiates, dictates and controls the message flow across the two interfaces. In essence this innovation allows two sessions to be maintained as one virtual session, utilizing only one radio access network at a given time, with seamless back-and-forth session migration across the two switches. The subscriber is never made aware of any of these migrations, and may use any of the regular telephony services and features at any time in mid-call.

In Summary

Current video calling services suffer from two immediate problems. First, most of them are completely separate from voice services, setting up an entirely unnecessary and ultimately fatal clash with voice telephony. Second, as totally separate services, they will need decades of maturity to achieve the reliability, stability and richness enjoyed by voice communications today.

A better way forward, and more lucrative proposition altogether, is to integrate voice and video into a single uniform and seamless set of services, whether they come directly from the telephony carriers or from OTT service providers using APIs to gain access to the service control functions that enable high-quality rich media services. Some of these services could initially target enterprises, where QoS could be offered in a more affordable setting. Other services will utilize the early adoption trends of social networks, undoubtedly followed by the general public, as soon as they are shown to be as secure, straightforward and reliable as the mobile services they use today.