The basics, the classics, and the magics about lists
I think lists are the fourth preferred class of objects to store data in R after vectors, data frames, and matrices (take that arrays!). Nevertheless, is one of the most flexible ways to manage our data since in lists we can combine several dimensions and classes in an elegant way.
In this post I will cover three aspects of the work with lists: the basics, including creation, subsetting and common operations; the classics, including the family of “*pply” functions over lists; and finally the magics, about a recent package to work with lists.
Creation. To create a list from a set of objects we use the command list
If we will use lists inside a function or we will populate it in a loop, is more efficient preallocate the list, i.e. create an empty list with the desired length, we can do it with the function vector(?!). This is the “phantom” 8.1.53 from the R-inferno:
Single and double brackets to subset. To extract one element of the list we can use single or double brackets, single brackets returns the element as a list, and double brackets returns directly the element we are extracting
Single brackets works in the same way as in the vector case
The classics: Apply family and plyr package
To do more advanced subsetting, or apply a function to all elements a list we can make use of the apply family. Specifically three commands are suitable for lists: lapply, sapply, and mapply.
Let’s create first a list of matrices:
lapply. Apply a function to each element of a list and get a list.
sapply. Apply a function to each element of a list and get a vector.
mapply. Apply a function to each element of several lists and get a list. Is the multivariate version of sapply so it will return a vector when is possible. To return a list use the argument SIMPLIFY=FALSE
This last example takes one list of indexes, and a list of matrices, then returns the sum of the first row of the first matrix, the sum of the second row of the second matrix, and so on. mapply accept multiple lists as inputs, just be careful that the function accepts also multiple inputs in the desired order.
And speaking about functions and lists, the subsetting [ is also a function and can be used in lapply to make more complex subsettings. For example if we want to select the element [2,3] of each matrix:
is equivalent to:
Since in lapply, after the function we can write the corresponding parameters, in this case x[2,3] is equivalent to '['(x,2,3)
Finally, the package plyr makes more flexible the input and output of the apply family:
lapply returns an array, ldply returns a data frame, llply returns a list, and l_ply returns… nothing. (But is useful to use intermediate results inside functions, such as plots)
The magics: rlist
Recently, Kun Ren developed rlist, a new package with several utilities oriented to lists. In a similar way as dplyr is an upgrade of plyr focused in data frames, we can think in rlist as an upgrade for lists, that can also be used together with pipes as those provided by magrittr or pipeR (I will write about pipes in another post). Here are just some of the functions included in rlist, where x ~ x[2,1]<0 is the way rlist have to note an element x of the list such that x[2,1]<0 :
In conclusion, lists are another way to manage our data inside R, and sometimes there are not too many alternatives (For example, for unstructured data bases as in the example of the rlist webpage). Lists provides a flexible way of dealing with data of different classes and dimensions, and a knowing the basics, the classics and the magics, can save us time and headaches.