Filling the Gaps in Edge Computing with Distributed SQL Databases

The data that drives businesses and other organizations is increasingly being created outside traditional data centers or the cloud. It can be from smartphones, IoT devices, wearables, point-of-sale systems, and any other sources at the edge, often distributed across wide geographic areas. Karthik Ranganathan, CTO, Yugabyte, shares the need for distributed SQL databases and discusses that operations at the edge can suffer from problems with latency, availability, and a lack of scalability without an effective, stateful and secure architecture.Â

Gartner predictsOpens a new window that by 2025, 75% of enterprise-generated data will be created and processed outside conventional boundaries. To keep pace as more data and workloads become decentralized, edge computing is emerging as a model for taking compute and data storage closer to the action to speed up response times and reduce network costs. By taking processing to the edge, rather than relying on traditional means of keeping computing and storage in a data center, organizations can reap a range of benefits, from giving customers a seamless, reliable transaction experience to early detection of cyber incidents.

But those benefits don’t happen on their own. Without a proper approach, the potential benefits of edge computing can become weaknesses.Â

Too often, the critical data layer that sits between infrastructure and apps gets overlooked. There have been advancements in distributed databases, but the innovations they feature have all come with their own set of tradeoffs and compromises. Ultimately, developers need to embrace a new data architecture that considers the scalability, latency, geo-distribution, productivity, and security needs of modern edge applications.Â

An emerging class of databases, distributed SQL, is a perfect match for the edge data layer, combining the best features of traditional RDBMSs and NoSQL databases for running transactional applications. No single database reference architecture can work for all applications in an edge computing environment. But distributed SQL databases provide a versatile and powerful data layer that can support the needs of computing in the cloud and edge environments.Â

The Tiers of Edge Computing

Enterprises can deploy applications and databases across several infrastructure tiers, each with its own set of properties.Â

- Device edge: Mobile phones, IoT devices, wearables and sensors in buildings or machinery. They require a lightweight database since they typically are low-powered.
- Far edge: Devices that have limited compute and storage, such as machines deployed near mobile base stations, inside shopping malls and retail locations, bank branches or factories. Databases in the far edge primarily operate in private clouds.
- Near edge: Covering the infrastructure between the far edge and the cloud, such as a public cloud region, a provider’s colocation, a tier 2 cloud service or a private datacenter. The near edge, usually in private or hybrid clouds, provides low latency but must contend with network partitions.
- Cloud: Including a multi-cloud, managed DBaaS, ideal for enterprises since multi-cloud strategies are shared, and databases should be able to run in any public cloud. Cloud services like AWS, GCP and Microsoft Azure offer unlimited computing and storage. Still, organizations need to avoid the high latency and throughput constraints that can come with getting data to and from the cloud.Â

See More: Three Innovations That Show How Edge Computing Is the Future of the Cloud

Designs for Stateful Edge Applications

Working across these environments, the architecture for stateful edge applications needs to address several key areas.

1. Accounting for the data lifecycle: An organization needs to know where data is produced and consumed, what’s being done with it (such as analysis) and how it will be stored (such as stored locally or stored then forwarded). Local storage, for instance, would reduce latency and increase throughput compared with sending data across broad areas.

Data is replicated in the cloud, most commonly in one of three configurations. In a hub-and-spoke pattern, data is generated and stored at the edges and aggregated in a central cluster in the cloud. A configuration pattern stores data in the cloud, with read replicas at edge locations. And in an edge-to-edge pattern, data is synchronously or asynchronously replicated or partitioned within a tier.Â

2. Identifying workloads: Different types of workloads tend to run in different locations. Workloads such as streaming data and streaming data with analytics, event data and small data sets with read-only queries typically will run at the edge. Transactional and relational workloads, workloads requiring full-fledged analytics, and those that need long-term data storage usually occur in the cloud.

3. Scaling requirements: The pace at which data is growing, the number of users and devices involved, and the amount of compute power necessary to process the data can significantly impact how information is managed. Edge locations, for example, usually don’t have the compute and storage resources to run deep analytics of large amounts of data. Online transactional processing (OLTP) databases at the edge may need to scale throughput to handle large write volumes from devices.

4. Preparing for failures: Failures will happen because of network partitions or infrastructure outages, especially node/pod failures that are common at the far edge. Applications and databases should be designed for their appropriate operating modes. The cloud runs in mostly-connected mode (though the impact of a cloud outage can be severe). Near-edge applications should be in the most connected or semi-connected mode, the latter of which could provide an extended network partition that lasts several hours. Applications at the far edge should be designed for semi-connected or disconnected mode, in which they run independently of any external site.

5. Addressing security vulnerabilities: The distributed nature of edge computing increases the attack surface. It’s important to consider applying least-privilege practices throughout and zero-trust security. Other essential factors include encryption both in transit and at rest, multi-tenancy support at the database layer and per-tenant encryption, and the regional locality of data to ensure compliance.Â

Why Distributed SQL Is a Preferred Solution

A distributed SQL database can run across different tiers of the cloud and edge, making it a suitable platform for transactional applications in an edge environment. The characteristics of distributed SQL include:

Continuous availability: Designed for resiliency, a distributed SQL database replicates data across notes, keeping services available during node, zone, region and data center failures. There are no single points of failure.
Horizontal scaling: A database cluster can be scaled with zero impact by simply adding nodes, allowing enterprises to scale on demand.
Flexible geo-distribution: A distributed SQL database offers synchronous and asynchronous replication within a region, and between the core and edge, with built-in geo-partitioning and data pinning capabilities for compliance.
Advanced RDBMS features: Distributed SQL databases offer standard RDBMS features that allow developers to build data-driven applications.Â

See More: Life on the Edge: Solving the Optimization Problem

The SQL Way Ahead

A well-designed distributed SQL database also bolsters security with encryption at rest and in transit, multi-tenancy support at the database layer and per-tenant encryption and regional locality of data to ensure compliance. With high performance and operational simplicity, distributed SQL offers the power and simplicity organizations need to operate in the cloud and at the edge.

Do you think distributed SQL databases can ensure both security and compliance in the years to come? Share with us on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d love to hear from you!

Filling the Gaps in Edge Computing with Distributed SQL Databases

The Tiers of Edge Computing

Designs for Stateful Edge Applications

Why Distributed SQL Is a Preferred Solution

The SQL Way Ahead

MORE ON EDGE COMPUTING

Contact ESSID Solutions

Reach out to us for a free consultation on big data consultancy and development services.