R: A free software project.
R was announced to the world on August 4th, 1993, when Ross Ihaka sent the following email to the “S-news” email list:
About a year ago Robert Gentleman and I considered the problem of obtaining decent statistical software for our undergraduate Macintosh lab. After considering the options, we decided that the most satisfactory alternative was to write our own. We started by writing a small lisp interpreter. Next we expanded its data structures with atomic vector types and altered its evaluation semantics to include lazy evaluation of closure arguments and argument binding by tag as well as order. Finally we added some syntactic sugar to make it look somewhat like S. We call the result “R”.
As referred by prof. Ihaka in the “R: Past and future” article, only in june of 1995 R became Free Software, when the source code was released under the GNU General Public License (GPL), from the Free Software Foundation. Following the FSF definition, “a program is free software, for you, a particular user, if:
- You have the freedom to run the program as you wish, for any purpose.
- You have the freedom to modify the program to suit your needs. (To make this freedom effective in practice, you must have access to the source code, since making changes in a program without having the source code is exceedingly difficult.)
- You have the freedom to redistribute copies, either gratis or for a fee.
- You have the freedom to distribute modified versions of the program, so that the community can benefit from your improvements.”
Free and non-free extensions (a.k.a Packages)
Following the conditions of the GPL, to create a commercial (i.e. non free) package is perfectly possible, as long as you are not modifiyng or including code from another package which is licensed under GPL. The R Foundation clarify their position respects to non free packages in this statement.
Nevertheless, the use of FOSS (Free or open source software) is encouraged in CRAN, and some of the common options for developers are the GPL, MIT, or BSD licenses. A similar guideline is followed by Bioconductor, and Github.
Let’s write some code to analyze the different licenses used by the packages submitted to CRAN:
Now we have a data frame with the information of the available packages on CRAN. We included a “License” column where we will store the information about the package
I am not particularly proud of that code, since it hides a for loop, and is quite slow. Nevertheless, the “ReadLines” function is not vectorized, then I need to open one connection (text file) at a time. If somebody knows a better way to do, keep it for your self… or better share it in the comments :).
We can find out how many different licenses are used in CRAN:
Or check the most common ones…
We can group some licenses into a reduced set. To do so, we consider just the first word of each License:
We make some corrections with acronyms…
And we plot.
The vast majority of packages chose GPL, LGPL, or other free software licenses. Although there are some with restrictions, such as those who use the NC (NonCommercial) option of the Creative Commons license:
Other special license cases are those given by a “file License”, where the license is specified in a file (Example), or the “Unlimited” license(Example), which is actually not unlimited, but restricted to national laws (where most cases “All rights reserved” is implied). For those packages is better to contact the authors to modify the code or use it for commercial purposes.
The other open and commercial flavors of R
Two of the problems that companies can found when using R for business are its lack of commercial support, and that it is not ready to use with big data (At least not directly). Then, a few firms offer enterprise oriented modified versions of R, generally under commercial licenses.
Some of these versions are:
Revolution R Open. This is the Open Source version of Revolution R. It includes the MKL library for faster computation (specially in Windows), the Reproducible R Toolkit(a set of tools to ensure reproducibility), and support.
Revolution R Plus. Available via an annual subscription, it includes Revolution R Open, and some open source packages developed by Revolution: DeployR, ParallelR, and RHadoop; plus technical support service.
TIBCO Enterprise Runtime for R (TERR). A TIBCO version of R to be integrated on their TIBCO Spotfire analytics software.
Oracle R Enterprise. This version integrates R with the Oracle data base.
Oracle R Advanced Analytics for Hadoop. Equivalent to R Enterprise but to be integrated with Hadoop.
Renjin. This is an Open Source implementation of R for the Java Virtual Machine (JVM). Commercial support is available.
pqR. A “pretty quick version” of R, based on R-2.15.0 and licensed under GPL.
FastR. This version is an Open Source implementation of the R Language in Java.
Riposte. C++ JIT Open Source implementation of R.
R-Plus. This company claims their product is the “Real R”, but no detailed information can be found in their webpage.
The fact that R is a free software has been crucial to its development and adoption among the data analysis community. When the code is open, anybody can verify that a certain algorithm is well implemented, collaborate with improvements, or correct bugs in a fast way. This spirit of collaboration promoted by open source/free software licences is inherited by R users, which have contributed to improve the characteristics of the software, developing packages making that any theoretical development of statistics or data analysis is almost immediately available to the world.
Nevertheless, those who require technical support or flawless integration with big data implementations can find high standard solutions provided by reputed companies such as Revolution R or Oracle. The adoption of R in the business world is now a reality, and besides the referred examples, many other big IT/analytics firms are recommending to integrate R with their systems. But we will speak about that integration in a further post…
Thank you for reading, and if you like the post please let your best friends know!