A Brief History of Data Containers

Data containers have become crucial for the efficient transport of data to a public cloud and back to a private platform. Containers are software packages containing everything needed for the software to run. This includes libraries, system tools, and settings, as well as an executable program. Additionally, containers offer another layer of security because the software does not affect host operating systems.

A container is isolated from other containers, though it can communicate with those containers through well-defined channels. All containers within a system work through a single kernel and, consequently, are much more cost effective than virtual machines. The primary difference between containers and virtual machines is that containers will share the same kernel of the host system, and virtual machines will not.

A kernel is a program acting as a computer’s operating system core, providing complete control over everything in the system. It assists and expedites interactions between software and hardware components. In most systems, it typically downloads shortly after the boot loader, and then handles the start-up and input/output requests from the software by translating the requests into data-processing commands for the CPU. The kernel also controls peripherals like keyboards and printers, and memory.

Raghu Kishore Vempati, Director for Technology, Research, and Innovation at Altran, said:

“2020 will see some acceleration by organizations for transformation to a microservices-based architecture based on containers, from a service-oriented architecture (SOA). The adoption of Kubernetes as an orchestration platform will hence see a significant rise.”

Early Days of Data Containers

The origin of containers stems from the personal project of a Finnish student, Linus Torvalds, who in 1991 created a new operating system kernel and made it free to use in 1992. The resulting “Linux kernel” has been marked by constant growth throughout its history. He and Shinya Yamanaka were honored with the 2012 Millennium Technology Prize for “recognition of his creation of a new open source operating system for computers leading to the widely used Linux kernel,” by the Technology Academy Finland. (Microsoft, after a major internal battle regarding competition and profits versus open source technology and profits, or, more concisely, products versus services, began supporting and using the open-sourced Linux kernel in the year 2009.)

In 2000, FreeBSD (a free and open-source Unix-like operating system) “jails” became available. The jail mechanism allows system administrators to separate a FreeBSD computer system into a number of independent mini-systems – called jails – each sharing the same kernel with minimal overhead costs. The ability to establish multiple jails provides excellent flexibility regarding software management. An administrator can provide application separation simply by installing different applications within each jail. This can create one jail holding all installed applications or mix and match the software installed in each jail.

It should be noted that jails are still popular (they are free). FreeBSD jails can increase the security of a server by creating separation between the jail and other jails as well as the base system. FreeNAS® offers two ways to create a jail. The Jail Wizard offers an easy way to create a jail, quickly. Advanced Jail Creation offers an alternative, with every possible jail option being configurable. This version is recommended for more advanced users with specific needs.

Solaris Containers

In 2004, Solaris containers were released by Sun Microsystems. While Solaris containers aren’t as adaptable or flexible as Linux containers, they are fairly easy to work with and offer some powerful features. These containers combine system resource controls with boundaries called “zones.” Each zone has an individual node name, offers access to physical or virtual network interfaces, and assigned storage. Zones do not require a minimum amount of dedicated hardware, with the exception of the necessary disk storage used for its configuration. Solaris containers do not need a dedicated CPU, a physical network interface, memory, or HBA. Each zone is surrounded by a security boundary that prevents one zone from observing the events in other zones, or interacting with them. Individual zones can be configured with a separate user list.

Process Containers

In 2006, Paul Menage and Rohit Seth, working for Google, adapted the cpusets mechanism contained within the Linux kernel. Their development of process containers moved containerization forward significantly, by requiring that changes be minimally intrusive and have little impact on the complexity, performance, code quality, and future compatibility. In late 2007, the name was changed to “control groups” in an unsuccessful effort to avoid the potential confusion caused by the multiple meanings of the word “container.”

2013 Was a Big Year

Let Me Contain That For You was introduced to the public in 2013. It was an open-source form of Google’s container stack and provided Linux “application containers.” The applications design can include “container awareness,” allowing the application to create and manage their own subcontainers. LMCTFY was dropped in 2015, when Google started donating core LMCTFY concepts to an open source organization called libcontainer, now a part of GitHub.

The Docker project began in France as a goal of dotCloud (now Docker, Inc.), a platform-as-a-service company. In March of 2013, Docker was introduced as open source software to the public, and the popularity of containers exploded. Docker proved its superiority by offering a complete ecosystem for managing containers. Currently, Docker uses a container platform that supports traditional applications and microservices, and uses Linux and Windows-based applications. Docker is currently very popular.

Kubernetes

In 2014, Google launched Kubernetes as an open source version of Borg. (Borg is Google’s cluster management system, developed in 2003.) The decision to launch was partially based on the philosophy, “Everything at Google runs in a container,” which supported their various service offerings, and sparked their own internal battle of competition vs open source behavior. Kubernetes is now maintained by the Cloud Native Computing Foundation. Docker, Microsoft, IBM, and RedHat are members of the open source Kubernetes community. Organizations and businesses continue to use containerized software at an accelerating rate, fueling Kubernetes’ success.

Kubernetes is a container orchestration system. It automates scaling, management, and application deployment; supports a broad range of container tools; and works well with Docker. Its purpose is to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts.” Many public clouds offer Kubernetes or provide infrastructure as a service.

Josh Komoroske, a senior DevOps engineer for StackRox, stated:

“As more and more organizations continue to expand on their usage of containerized software, Kubernetes will increasingly become the de facto deployment and orchestration target moving forward.”

The Container Ecosystem

rkt (pronounced “rocket”) was adopted by the Cloud Native Computing Foundation (CNCF) in 2017. This was the same year Docker donated the Containerd project to the CNCF. rkt is an application container engine designed for cloud-native environments. Containerd focuses on runtime, and is described as everything needed to construct a container platform. The container ecosystem has become a community-wide effort with a commitment to support open source projects. This has, in turn, led to increased collaboration between projects, and a community focused on improving container usage.

Security is a significant issue with an open ecosystem that shares container images easily. One development is the emergence of several container registries. A registry will scan and store container images and container repositories for security weaknesses. Docker uses this as a security measure, and provides an alternative to public repositories from unverified publishers, which could be a security threat. This type of security helps to minimize distorted or manipulated data, improving data quality.

The most popular IaaS providers provide their own container registries. This is especially useful for projects heavily invested in AWS, Azure, or Google Cloud platforms. These come with storage, default repository scanning, monitoring, more evolved access controls, and several other tools for networking. Some third-party registries, such as Quay and GitLab, are also gaining in popularity. The options for registries are more numerous than orchestration tools, and the market is wide open. Alternatively, third-party security services for containers (TwistLock and Aqua Security) provide security beyond the standard defaults.

KubeEdge

KubeEdge is an open source system designed for using native containerized applications to organize the internet of things (IoT) at the edge. It is still in its early stages. It is based on Kubernetes and provides the fundamental infrastructure support needed for network and metadata synchronization between the cloud and the edge. KubeEdge was licensed under Apache 2.0 and is free for commercial and personal use. The goal of KubeEdge is to create an open platform that supports edge computing and extends containerized application orchestration services to hosts on the edge.

Vempati of Altran commented:

“As IoT and edge computing continue to gain traction in 2020, there will be an increased focus on hosting Kubernetes on devices and environments with a very low resource – CPU, memory – footprint.”

Image used under license from Shutterstock.com

A Brief History of Data Containers

Early Days of Data Containers

Solaris Containers

Process Containers

2013 Was a Big Year

Kubernetes

The Container Ecosystem

KubeEdge

Keith D. Foote

The Multimodal Lakehouse: Why Your Data Strategy Needs to Evolve Beyond Structured Data

The Data-Centric Revolution: The Strangler Fig Pattern

AI Is Increasing the Strategic Importance of Data Modeling

Thanks!

A Brief History of Data Containers

Early Days of Data Containers

Solaris Containers

Process Containers

2013 Was a Big Year

Kubernetes

The Container Ecosystem

KubeEdge

Keith D. Foote

Related Articles

The Multimodal Lakehouse: Why Your Data Strategy Needs to Evolve Beyond Structured Data

The Data-Centric Revolution: The Strangler Fig Pattern

AI Is Increasing the Strategic Importance of Data Modeling

Lead the Data Revolution from Your Inbox.

Thanks!