Data Containers Demystified: A Reliable Data Movement Solution

The Data Management industry has seen a significant rise in the recent interest of data containers. As Cloud Computing has gained popularity, methods for transporting data and its processing instructions, have been investigated, with data containers coming in as a viable a solution.

Data containers solve the problem of getting software to run reliably, while moving from one computer system to another. A Data container stores and organizes virtual objects (a virtual object is a self-contained entity that consists of both data, and procedures to manipulate the data). There are, however, limitations. Containers are easy to transport, but can only be applied to servers with compatible operating system “kernels,” limiting the servers that can be used.

The earliest version of container technology was called chroot, and in 1982, became a part of the Unix toolbox. Chroot provided a dedicated file system focused on isolating files within the chroot environment from other computing environments. From chroot, containers evolved into a bundled package containing an application, including all its libraries, dependencies, and the configuration files for running it.

Typically, data containers are either associative containers or single-value containers. Single-value containers store each object independently. These objects can be accessed directly, or by using an iterator. An associative container, on the other hand, is more complicated, and uses a map, dictionary, or associative array, which is composed of key-value pairs, with each key appearing once in the container. These keys are used to find objects stored within the container.

The size of a container is determined by the amount of data within it, and is typically more efficient than a virtual machine. A container may only be 40 megabytes in size, while a virtual machine can contain several gigabytes. This means a server can host several containers, or a few virtual machines. Additionally, a virtual machine can take several minutes to boot up its operating systems, while a container will start almost instantly.

Data Containers vs Virtual Machines

When compared, a virtual machine (VM) and container are significantly different. VMs are essentially a computer simulation, and have had the effect of removing “application dependency” on the computer hardware (almost any “prepped” computer will run the VM program). This, in turn, allows people to work from a variety of locations, using different systems. However, people are turning to containers as a reliable and efficient alternative to Virtual Machines. They offer different strengths and weaknesses.

Technically, the primary differences between containers and VMs are the virtualization layer location, and how operating system resources get used. Virtual Machines typically require a hypervisor be installed as part the bare-metal system hardware. VMs are completely isolated from each other, and, as a consequence, malware and application problems only impact the affected VM. Virtual Machines can also be transferred from one system to another without fear of infection.

Containers are generally considered more efficient in coordinating resources than Virtual Machines. A single system can host many more containers than it can virtual machines. (Because VMs require a guest operating system, they use much more space than containers.) Cloud providers are very excited about using containers, because more containers can be deployed on the same amount of hardware (increasing their profits)

Containers offer another advantage, in that they can be stopped and started much much more quickly than virtual machines. Starting a container normally takes less than a second. An individual server can operate multiple containers simultaneously. Data containers are generally described as completely isolated.

Merging VMs and Data Containers

Merging virtual machines and containers improves performance and speed (per containers) and security (per VMs.) Installing a container within a VM develops another abstraction layer, improving security. Virtual machines and containers can coexist in the same environment making the two technologies complementary. Joining the two can also expand the tools and tactics used by data center administrators and application architects, providing significant advantages for compatible workloads.

A Container named Docker

Docker is a tool that is designed to benefit both developers and system administrators. It is part of an open source platform, meaning anyone can help in its evolution, and alter its design to match their specific needs. It allows developers to write code, while not worrying about the kind system it will run on. Docker containers have thousands of predesigned programs available, simplifying their use. Docker packages the application and libraries into a single container (using the Docker Image), allowing applications to be deployed consistently across many computer systems. Windocks delivers Docker SQL server containers with security and governance already built in.

CoreOS and rkt

CoreOS Linux distribution operates a minimalist system that is tailored for running “development containers.” Recently, they developed rkt, an alternative container that follows the Unix philosophy of simple command-line tools. Also, rkt supports using multiple container formats, which can be very useful for certain types of server and system applications. This system is still evolving, but promises to be a highly functional alternative to Docker.

Linux

Linux containers are known for being efficient in terms of CPU utilization, drive space, memory, and hardware virtualization because they save the cost of the OS-overhead in each virtual machine. However, most Linux distributions are feature-heavy when they are meant simply to act as a container host, and run containers. As a consequence, many Linux distributions are specifically designed to run containers.

Kubernetes (Container Management)

Google created Kubernetes to orchestrate Docker containers. Kubernetes is an open source system, too, and is designed to automate deployment, scale and manage containerized applications. It is designed to provide container orchestration on the Google Cloud Platform, “as a service.” The Google Cloud Platform provides a variety of containers, with some designed specifically for big data.

Apache Mesos (Container Management)

Apache Mesos is used to manage containers, and has the ability to manage a diverse variety of workloads. It is a cluster manager and offers resource sharing and isolation. The program is located between the application layer and the operating system, making it efficient at managing applications in substantially clustered environments. Apache Mesos supports big data research, microservices, real time analytics, and offers elastic scaling. It, too, is an open-source project.

Container Security

Security is a concern for data containers. The container’s host kernel, and the hypervisor, are both potential access points for hackers, when they are being shared. However, in the last few years, significant efforts have been focused on developing security software for containers. Docker (and other containers) now come with a “signing infrastructure,” requiring administrators a signature on container images. This prevents the use and deployment of untrusted containers.

However, a trusted, signed container may not be secure. Corrupted software, installed after the signature, may become may provide access for hackers. To counter this, companies are offering container “scanning solutions” which can notify an administrator of any vulnerabilities which might cause problems, or provide a hacker with access.

Additionally, new container security software has been developed. Twistlock provides software which profiles a container’s predictable behavior and “approved” processes, including networking activities and storage practices, watching for any malicious behavior or surprises, and flagging them when found.

Polyverse has taken a different approach to container security. They take advantage of a containers ability to start in less than a second, to relaunch the container’s applications every few seconds, This minimizes the time available for a hacker to exploit a container’s running application.

Data Containers and the Cloud

Containers are ideal for use in the cloud, but planning is important. When choosing cloud resources, be sure it is based on the “same framework and container host OS.” Caution should be used when using OS or middleware features that are not supported by “all” OS distributions or versions. A standard base, “should” move containerized applications across all cloud platforms, with no execution problems.

Businesses and startups are using the cloud to access big data, analytics, and microservices. Benefits of contracting a Cloud infrastructure include self-service, programmability, automation, and pay-by-use. Generally speaking, Cloud users want an infrastructure guaranteeing the same, or better, performance as the physical servers they normally use.

Image used under license from Shutterstock.com

BECOME A DATAVERSITY INSIDER FOR ACCESS TO 160+ COURSES

Data Topics

Leave a Reply Cancel reply