A Brief History of Open Source Data Technologies

By on
open source

Openly sharing information has been a part of human culture since the beginning of civilization. Information would be shared with the general community and the practice has had a powerful impact on the development of tools and machinery.

In opposition to this practice, is the concept of ownership and control over new ideas and concepts, also known as “intellectual property.” Patents and copyrights, for example, are based on the belief inventors should receive payment when others use, or imitate, their novel creations.

While the open sharing of new ideas is difficult to abuse, patents can be abused, with a reasonable amount of planning. For example, in 1879, George B. Selden, a patent lawyer, applied for a patent, claiming ownership of the “idea” of a 2-cycle gasoline-powered engine, and, for devious monetary reasons, delayed its patent office approval until “1895.”

While he was delaying the patent’s approval, (but not its application date) cars were being designed and built. In 1899, the Electric Vehicle Company purchased exclusive rights to his patent for $15 per car (or $420 per car by today’s inflated standards), and then began successfully suing the manufacturers of gasoline powered vehicles for patent infringement.

By 1904, 30 automobile manufacturers were paying 1.25 percent of their vehicle sales to the Electric Vehicle Company, with a fifth of the payments going to Selden. This went on until 1911, when Henry Ford, using a four cylinder, “four cycle” engine, broke free of the patent-based stranglehold, and Selden’s unethical patent suddenly became worthless. (People owning website names, and charging for their use, have a similar stranglehold on website owners).

The Automobile Board of Trade (eventually known as the Motor Vehicle Manufacturers Association) came into being the same year, and developed a cross-licensing agreement shared by all U.S. automobile manufacturers. Each company could develop new technology and file for patents, but the patents were openly shared and there was no exchange of money between the manufacturers, nor any lawsuits. Clearly, auto manufacturers did not want to be caught in a similar patent-based stranglehold.

The Free Software Movement

“Free software” can mean no payment required, but in terms of the Free Software Movement, the term refers to software which users can freely copy, change, improve upon, run, and distribute. The Free Software Movement is more about liberty and freedom of actions, than price. Free software comes with very few restrictions, and no “profit seeking intent.” To further these goals, the Free Software Foundation was formed in 1985. The Free Software Foundation’s mission statement is:

“To preserve, protect, and promote the freedom to use, study, copy, modify, and redistribute computer software, and to defend the rights of Free Software users.”

The Open Source Movement

While “free software” can be described as a social movement with an emphasis on freedom, Open Source software would be described as a collective effort to improve and develop software, by using the public as a resource.

In a sense, Open Source relies on people’s “better angels” to develop software and technology. Generally speaking, Open Source describes software whose source code is published and made available to the public, allowing anyone to use, copy, modify, and redistribute without payment of royalties or fees. This allows Open Source code to evolve organically, by way community cooperation.

The Open Source Initiative, as an official organization, was created in 1998, and acts as an advocate, an educator, and a steward of Open Source activities. The development of Open Source software (collaborative software development), by multiple independent programmers, provides more “original” designs than any single company could ever hope to provide. For some commercial software vendors, this situation was seen as a threat. In 2001, Jim Allchine, a former Microsoft executive publicly stated:

“Open Source is an intellectual property destroyer. I can’t imagine something that could be worse than this for the software business and the intellectual-property business.”

Microsoft has since reversed its position on Open Source, and, along with Google, IBM, Oracle, and State Farm, is now establishing an official Open Source presence on the internet. Needless to say, this has created significant confusion on the meaning of a changing capitalist model. (The ownership and unethical use of intellectual property can be considered extreme capitalism, while Open Source initiatives qualify as a form of personal synergy).

UNIX Sharing Code (IBM)

UNIX played a major role in the evolution of modern computing. In 1969, AT&T Bell Labs began developing a small operating system, called UNIX. The goal was to design a portable, multi-tasking system designed for multi-users in a time-sharing configuration. In 1972, UNIX was rewritten, using the program language C, which allowed the program and data to be transferred “from its original hardware,” making the data portable.

An antitrust case blocked AT&T from entering the computer business, and required them to license their system’s source code to all requesting it. This resulted in academic institutions and businesses quickly taking advantage of the UNIX program. Programmers at the University of California at Berkeley developed their own evolutionary version of the operating system, titled the Berkeley Software Distribution, which became accessible to the general public. (The full story is much more complicated.)

Mozilla is Born from Netscape Source Code in 1998

Netscape Communications launched the browser, Netscape Navigator, the first true commercial web browser, in 1995. At the time, they had no real competition. Microsoft, however, was working on Internet Explorer, and in 1996, came up with a browser capable of competing with Netscape’s. The new competition prompted Netscape to release their source code to the public, in 1998, with the intent of imitating UNIX and using the public as a development resource.

Unfortunately, this step stalled development of their newest browser platform, giving Microsoft the edge needed for Internet Explorer to become “the most used browser.” Netscape Communications never recovered, and was purchased by AOL. On March 1, 2008 Netscape was officially discontinued, terminating support for all Netscape clients (who were stunned and frustrated), products, and browsers. However, Netscape’s Open Source release of their source code prompted the creation of the Mozilla Organization.


The program Linux provides one of the most obvious examples of Open Source software collaboration in the history of computing. Linux was invented by Linus Torvalds, in 1991. He was a student attending the University of Helsinki, where he had been working with Minix, a Unix-like system, and started designing his own kernel. Torvalds started by designing hard-drive access and device drivers, laying out a basic design he called, Version 0.01. The kernel, which came to be called Linux, was later combined with the Open Sourced GNU system (pronounced g’noo) to produce a totally free operating system.

The Linux source code may be used, modified and distributed by anyone. The majority of work done on Linux is performed by the Linux community, which includes thousands of programmers from around the world, who send suggestions for improvement to the maintainers. Companies, also, have helped with the development of the Linux kernels, and with developing the “extra” software normally used with the program.

Apache Software Foundation (ASF)

The Apache Software Foundation‘s mission is to provide software for the public good. It was established in 1999, as a charitable organization. It received funding from individuals and corporate sponsors and uses an all-volunteer board of directors. The organization oversees over 350 Open Source projects.

Apache Hadoop

Apache Hadoop was initially called Nutch, and was designed by two people, Doug Cutting and Mike Cafarella. They were designing a search engine system capable of indexing one billion pages, which they later combined with MapReduce. Hadoop’s cost benefits come from using computer clusters built on commodity hardware. Large data sets are broken up, and then stored on local disks. Any failures are corrected by software, rather than expensive servers. Hadoop became an immediate success because it is:

  • Free
  • Very Scalable: It stores large data sets across hundreds of low-cost servers
  • Flexible: Provides access to new data sources and can access different kinds of data
  • Very Fast: Can efficiently handle terabytes of data within minutes, and petabytes within hours
  • Backed Up: Data is sent to individual nodes, and then replicated to other nodes, as backup

Apache Spark

Apache Spark has gained in popularity quite quickly since its release. It is faster and has more scalability than Apache Hadoop. Businesses, such as Yahoo, Netflix, and eBay have started using it on a massive scale. Spark, combined with Hadoop, has very quickly become one of the largest Open Source communities working with big data. (Spark comes with no file management, but can manage files using Hadoop’s Distributed File System.)

Apache Storm

Apache Storm comes free, and is an Open Source real-time computation system. It can process large amounts of big data in, essentially, real-time, is scalable, and easy to operate. Storm can be used for:

Apache Hive

Apache Hive is an Open Source system designed to query data using an SQL-based language. It will summarize and analyze data, turning it into useful business insights. Hive is compatible with traditional data integration and data analytics tools.

Many data warehousing applications are compatible with SQL-based querying languages, and Hive supports the portability and transfer of SQL-based data to Hadoop. Though originally developed by Facebook, Hive has been used and developed by companies such as the Financial Industry Regulatory Authority and Netflix.

Apache Pig

Apache Pig is used for analyzing large data sets written in a high-level language and designed to express Data Analysis programs, combined with infrastructure to evaluate the programs. The important property of Apache Pig is its ability to provide substantial parallelization. This enables the system to process “very” large big data sets.

Open Source Hardware

Open Source hardware is a piece of hardware with specifications that were published and made accessible to the general public, allowing individuals to copy, modify and redistribute it without the paying of royalties or fees. This policy applies to Open Source Robotics, as well. Open Source hardware is based on community cooperation. The communities are generally made up of hobbyists, hardware/software developers, and some large businesses.

Image used under license from

Leave a Reply

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept