October 4, 2016
It was philosopher Abraham Kaplan, who in 1964, introduced the behavioral theory that when the only tool you have is a hammer, you tend to treat every problem like it’s a nail. He called it the “law of the instrument” and since then it’s entered mainstream consciousness as a popular warning on the dangers of singular, dogmatic approaches to innovation.
RIFT.io was founded because we saw the law of the instrument being applied in introducing network virtualization for telcos and mobile operators. That is, we saw that the network virtualization solutions being offered were fundamentally based on IT-oriented models that were more relevant for enterprise and Web 2.0 apps. Our thinking was validated at Mobile World Congress 2015 where Telefonica presented a slide that summarized the flaws in blindly applying existing cloud computing solutions to the NVF problem space:
These two key differences highlighted by Telefonica are difficult for standard cloud computing platforms to address. From our experience in carrier environments, we know what is required to make the transition to a telco cloud:
Data plane workloads – Workload placement must be tailored to data plane functions. To support high data plane workloads, you need to scale I/O elastically and also distribute it among a potentially large number of distributed VMs. The best way to do this is to leverage the networking infrastructure and to use techniques such as Equal Cost Multi Path (ECMP) and Port Groups. Lastly, the NFV environment must support SLA monitoring and recovery.
Network requires shape – Because networks have complex shapes and functions (as opposed to host applications such as web servers and databases), it is imperative to have an end-to-end view of the network. This “shape” (which includes the role of each network function) must also be made visible to the orchestrator.
Digging a little deeper, let’s look at how Web applications and network functions differ across a range of characteristics and requirements:
Web applications have a fairly high tolerance for outages since sessions can be restarted or retried. Generally speaking, they are stateless applications, or state can be kept within associated databases (e.g. shopping carts, backups). Transaction times are on the order of multiple seconds, and typical recovery time can also be a few seconds. On the other hand, access services such as voice, video, mobile, and real time communication systems have ZERO tolerance for outages. There are stateful applications and outages and disconnections are tracked and often regulated. Plus, recovery time needs to be nearly instantaneous – in the milliseconds.
Web applications are supported by general-purpose compute and storage. An array of Web applications can be supported by a farm of generic compute and storage resources. Network functions, due to their unique performance and resiliency requirements, rely on general-purpose compute that also includes specialized attributes (specialized NICs, CPUs, FPGAs, memory, storage, etc.). While a Web application workload can be placed virtually anywhere in the data center, individual network functions depend on unique hardware attributes (e.g.: encryption assist, DPDK, etc.) to perform best, at the optimal cost while maintaining SLAs.
For Web applications, configuration is applied on a per node (per VM) basis. In contrast, the configuration of a network function is applied hierarchically. Configuration is applied at each layer as resources are allocated and interconnected. These layer include: datacenter fabric, VM, VNF (per VDU), networking, network service, PNFs, and WAN overlay. In addition, network functions are dynamic entities that often require reconfiguration at multiple levels is response to lifecycle management events (e.g. scale in/out).
Both Web applications and network functions need to scale in multiple dimensions (up/down, in/out) but Web applications in an IaaS environment essentially scale themselves – that is, the cloud provider adds more servers based on simple thresholds (user traffic, storage, etc. And this is much simpler when the general compute/storage environment is basically the same for all Web applications. In a NFV environment, the orchestrator scales a network service (with multiple VNFs) based on thresholds mapped to SLAs. Rather than application-level resiliency, the orchestrator must have a global view and take automated actions based on SLAs.
Networking and Routing
For Web applications, networking consists of host applications and HTTP load balancers. The networking technology that supports Web applications consists of IP and host TCP/IP stack. In contrast, network functions require support for L2 – L7 packet transit functions and virtual IPs and MACs for resiliency. The networking technology that supports network functions includes IP, Tunnels, IPSec, Network Service Headers, etc. Web applications have no routing requirements (this is handled by the network, not the application), whereas network functions and services often depend on direct SDN control of vSwitches, and participate in BGP and MPLS VPNs.
Integration with Carrier Operations
Related to end-to-end view of the network, NFV orchestration has a unique requirement to integrate with existing legacy systems and network operators’ OSS/BSS. It’s more than cloud orchestration (spinning up VMs, templates, sharing resources across applications, etc.) or just an API. Service assurance and operational efficiency demand that the orchestrator serve a critical, bi-directional role: capturing data on end-to-end service (and service components) for analytics, and also taking automated remedial actions based on threshold and service level violations identified by those same analytics. Furthermore, operational efficiency requires fully automated network service lifecycle management to handle geographically distributed carrier-scale deployments.
In order to provide the proper support for the massive scalability and availability requirements of these SP applications in a telco cloud, what’s required is a whole-toolbox approach that provides a wide set of advanced capabilities including:
- Highly resilient virtual network function (VNF) and network recovery methods that are engineered specifically for the complex service level agreements typical of SP networks
- Support for enhanced platform awareness (EPA)-based hardware support for intelligent and secure workload placement
- Integration with SDN architectures capable of multi-site deployment and management of workloads
- Fully MANO compliant, supporting end-to-end network services, VNFs, and nested VNF forwarding graphs
Many orchestration tools started in the data center as cloud computing/IaaS orchestrators and are adding carrier network features to expand into the next big, growing market – NFV. If only it were that easy. As I’ve pointed out, Web applications and network functions have very different characteristics and requirements. What’s important to keep in mind here though, is that the comparisons between enterprise/web-scale and SP apps is not necessarily an “and/or” proposition. Certainly, there are many aspects of IT-based network virtualization such as agile and low-cost deployment methods that can benefit SPs well. However, it is important to distinguish the meaningful differences between the two application realms. For example, enterprise and even web-scale applications have been designed to operate well in highly consolidated/centralized data center environments to maximize management and operational simplicity. SP applications, on the other hand, were designed to operate in highly complex and distributed environments comprising both physical network and multi-cloud locations.
As a next-gen network services virtualization platform engineered specifically for SP networks and apps, RIFT.ware delivers on these requirements by automating and simplifying the composition and management of complex network services. RIFT.ware achieves this as a model-driven, ETSI-compliant NFV solution that serves as a common MANO platform across multiple cloud platforms and multi-vendor network functions and services. In essence, RIFT.ware provides everything an SP would require for highly automated, end-to-end virtualized network service design, delivery, and life cycle management of their apps and workloads.
Using the right tool(s) required to do a job correctly is not a trivial concern. For SPs and their goal to adopt NFV in critical mass, having the right tools will make all the difference. In that sense, overcoming the “law of the instrument” effect as it applies to deploying and managing SP apps is job No. 1 for RIFT.io and the industry as a whole.