class: center, middle, inverse, title-slide # Tips, trucos, y algunos paquetes para programacion eficiente en R ### Adolfo Alvarez (Collegium Da Vinci, Analyx) ### 2019-11-15 --- class: center, middle # http://adolfoalvarez.cl/r_eficiente/ ## Pero antes, un pequeño avance... --- class: center, middle # ... y las presentaciones --- background-image: url("img/quien_soy.png") background-size: cover --- class: inverse, center, middle # I. The art of reusable code ![](img/reuse.gif) --- # Code that you can reuse is code you don't need to write again! Reusable code is: - Easy to understand - Easy to share - Easy to adapt --- # Easy to understand - Use proper names - Comment your code! - Use a code style > Does making code efficient conflict with wanting to make code easy to understand? Making your code easy to understand often leads to code that is well architected and easy to test. --- # The Fundamental Theorem of Readability > Code should be written to minimize the time it would take for someone else to understand it. > > --- The art of reusable code -- Someone else can be yourself after a while! -- .center[ <img src="img/theart.png" width="40%" /> ] --- class: inverse, center, middle # II. Code style ![](img/style.gif) --- # Naming objects ```r function1 <- function(v){ retval <- 0 for (i in 1:length(v)){ retval = retval + v[i]*v[i] } return(sqrt(retval)) } ``` What does this function do? -- The names function1 and retval doesn’t pack much information. Instead, use a name that describes the variable’s value.! --- # Naming objects ```r euclidean_norm <- function(v){ sum_squares <- 0 for (i in 1:length(v)){ sum_squares = sum_squares + v[i]*v[i] } return(sqrt(sum_squares)) } ``` -- ## This is not about speed yet! --- # Coding styles - A good recommendation is to use a code style. - This will make your code easier to understand and reproduce - And also freeing your mind about many naming decisions, so you can focus on the analysis. > Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. .footnote[ [1] https://style.tidyverse.org/ ] --- # Coding styles Two of the most used R coding styles: - The tidyverse style guide [https://style.tidyverse.org/](https://style.tidyverse.org/) - The Google’s R Style Guide [https://google.github.io/styleguide/Rguide.html](https://google.github.io/styleguide/Rguide.html) Other styles: - Bioconductor [https://bioconductor.org/developers/how-to/coding-style/](https://bioconductor.org/developers/how-to/coding-style/) - Jean Fan style [https://jef.works/R-style-guide/](https://jef.works/R-style-guide/) --- # Coding style - Names - Spacing (e.g. x[, 1] x[,1]) - Indenting (e.g. tabs, 2/4 spaces) - Organization (e.g. loading libraries sequentially load all libraries at the start) - Documentation(e.g. commented code use comments to record important findings and analysis decisions) https://beautifyrstats.netlify.com --- # Some tidyverse recommendations for naming stuff in R - Variable and function names should use only lowercase letters, numbers, and `_`. - Use underscores (`_`) (so called snake case) to separate words within a name. ```r # Good day_one day_1 # Bad DayOne dayone ``` --- # Some tidyverse recommendations for naming stuff in R Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful (this is not easy!). ```r # Good day_one # Bad first_day_of_the_month djm1 ``` --- # Some tidyverse recommendations for naming stuff in R Where possible, avoid re-using names of common functions and variables. This will cause confusion for the readers of your code. ```r # Bad T <- FALSE c <- 10 mean <- function(x) sum(x) ``` --- # Some tidyverse recommendations for naming stuff in R - R has strict rules about what constitutes a valid name. - A syntactic name must consist of letters, digits, `.` and `_` but can’t begin with `_` or a digit. - Additionally, you can’t use any of the reserved words like TRUE, NULL, if, and function (see the complete list in ?Reserved). - A name that doesn’t follow these rules is a non-syntactic name; if you try to use them, you’ll get an error: ```r _abc <- 1 if <- 10 ``` It’s possible to override these rules and use any name, i.e., any sequence of characters, by surrounding it with backticks: ```r `_abc` <- 1 `if` <- 10 ``` --- # Two tricks for changing names - In your code: The "Rename in Scope" function of RStudio ```r function1 <- function(v){ retval <- 0 for (i in 1:length(v)){ retval = retval + v[i]*v[i] } return(sqrt(retval)) } ``` --- # Two tricks for changing names - In your code: The "Rename in Scope" function of RStudio ```r function1 <- function(v){ retval <- 0 for (i in 1:length(v)){ retval = retval + v[i]*v[i] } return(sqrt(retval)) } ``` --- # Two tricks for changing names - In your data: The janitor package clean_names() and make_clean_names() ```r library(janitor) ugly_names <- c("var 1", "var 20%", "var...3", "vár 4") make_clean_names(ugly_names) ``` ``` ## [1] "var_1" "var_20_percent" "var_3" "var_4" ``` --- # A harder example, what does this function do? ```r function1<-function(p1=5,p2=1,p3=2){output<-matrix(ncol=p1,nrow=p1,0) for(row in 1:(p1-1)){output[row,row-1]=p3 output[row,row+1]=p3} output[row-1,row]=p3 output[row+1,row]=p3 diag(output)=p2 return(output) } ``` -- ```r function1(5,4,3) ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 4 3 0 0 0 ## [2,] 3 4 3 0 0 ## [3,] 0 3 4 3 0 ## [4,] 0 0 3 4 3 ## [5,] 0 0 0 3 4 ``` --- # Good names are not enough ```r create_tridiagonal<-function(n=5,d1=1,d2=2){output_matrix<-matrix(ncol=n,nrow=n,0) for(row in 1:(n-1)){output_matrix[row,row-1]=d2 output_matrix[row,row+1]=d2} output_matrix[row-1,row]=d2 output_matrix[row+1,row]=d2 diag(output_matrix)=d1 return(output_matrix) } ``` -- ```r create_tridiagonal(5,4,3) ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 4 3 0 0 0 ## [2,] 3 4 3 0 0 ## [3,] 0 3 4 3 0 ## [4,] 0 0 3 4 3 ## [5,] 0 0 0 3 4 ``` --- #More info about the tidyverse coding style https://style.tidyverse.org/ Two R packages support this style guide: - styler allows you to interactively restyle selected text, files, or entire projects. It includes an RStudio add-in, the easiest way to re-style existing code. - lintr performs automated checks to confirm that you conform to the style guide. --- # Demo of the styler package ```r create_tridiagonal<-function(n=5,d1=1,d2=2){output_matrix<-matrix(ncol=n,nrow=n,0) for(row in 1:(n-1)){output_matrix[row,row-1]=d2 output_matrix[row,row+1]=d2} output_matrix[row-1,row]=d2 output_matrix[row+1,row]=d2 diag(output_matrix)=d1 return(output_matrix) } ``` --- # Demo of the lintr package ```r lintr::lint("lynter_example.R") ``` --- # Ahora vosotros El problema previo fue originalmente propuesto por Juan Jose Gibaja y Carlos Gil Bellosta en su segunda edicion del "Curso de R basico": **Create a function to create a tridiagonal matrix, with input parameters:** - `n` (the dimension of the matrix should be `n x n`) - `d1` (the diagonal of the matrix should be filled with this value) - `d2` (the upper an lower diagonal of the matrix should be filled with this value) So, for example, `create_tridiagonal(3,1,2)` should return: ``` ## [,1] [,2] [,3] ## [1,] 1 2 0 ## [2,] 2 1 2 ## [3,] 0 2 1 ```
15
:
00
--- class: inverse, center, middle # III. Measuring the efficiency of your code ![](img/speed.gif) --- # Measuring efficiency - Readability is a subjective measure, and we already cover it. - Most of the time you will easily know if your code is working or not. - Test your code! (shinytest, testthat...) But today we will focus on: - Memory - Time --- # Memory By default in R you need to limit your analysis to RAM memory, but what to do if you are short of memory? .pull-left[ - Buy more RAM - Ask your company/university/city/country permission to use more powerful servers. (e.g. RES - Red Española de Supercomputación https://www.bsc.es/res-intranet) - Use cloud companies - Use Big data packages (bigmemory, Matrix) ] -- .pull-right[ OR - Optimize your code so some memory intensive calculations are not needed ] --- # How to measure memory in R ```r a <- rnorm(1000000) print(object.size(a), units = "Mb") ``` ``` ## 7.6 Mb ``` ```r library(pryr) ``` ``` ## Registered S3 method overwritten by 'pryr': ## method from ## print.bytes Rcpp ``` ```r object_size(a) ``` ``` ## 8 MB ``` ```r b <- a object_size(b) ``` ``` ## 8 MB ``` What will be the result of `object.size(a,b)` ? --- # How to measure memory in R ```r object_size(a,b) ``` ``` ## 8 MB ``` ```r b[1] <- 0 object_size(a,b) ``` ``` ## 16 MB ``` ```r mem_used() ``` ``` ## 62.5 MB ``` ```r mem_change(x <- 1:1e6) ``` ``` ## 784 B ``` ```r mem_change(rm(x)) ``` ``` ## 384 B ``` --- # Garbage collector Does some of you use the gc() function? -- ![](img/excelent.gif) --- # How fast is your computer? ```r #install.packages("benchmarkme") library("benchmarkme") ``` Things you can check with the benchmarkme package: The package has a few useful functions for extracting system specs: - RAM: get_ram() - CPUs: get_cpu() - BLAS library: get_linear_algebra() - Is byte compiling enabled: get_byte_compiler() - General platform info: get_platform_info() - R version: get_r_version() Try yours! --- # Perform tests and compare with other machines ```r res <- benchmark_std(runs = 3) rank_results(res) plot(res) ``` --- # Measuring time - As usual, there are several ways to measure execution time of your code. - The R base provide with Sys.time() and system.time() ```r start_time <- Sys.time() Sys.sleep(3) end_time <- Sys.time() end_time - start_time ``` ``` ## Time difference of 3.00689 secs ``` ```r system.time({ Sys.sleep(3)}) ``` ``` ## user system elapsed ## 0.001 0.000 3.001 ``` - "User CPU time” gives the CPU time spent by the current process and “system CPU time” gives the CPU time spent by the kernel (the operating system) on behalf of the current process. (Post by William Dunlap) --- # Another alternatives for measuring time The tictoc package may be familar for those coming from Matlab world ```r library(tictoc) tic("sleeping") Sys.sleep(3) toc() ``` ``` ## sleeping: 3.005 sec elapsed ``` --- # Measuring the time: Benchmark packages - The bench package ```r x <- runif(100) (lb <- bench::mark( sqrt(x), x ^ 0.5 )) ``` ``` ## # A tibble: 2 x 6 ## expression min median `itr/sec` mem_alloc `gc/sec` ## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> ## 1 sqrt(x) 546ns 976ns 1051848. 848B 0 ## 2 x^0.5 13µs 18.8µs 56409. 848B 0 ``` --- # Measuring the time: Benchmark packages - The microbenchmark package ```r library(microbenchmark) x <- runif(100) microbenchmark( sqrt(x), x ^ 0.5 ) ``` ``` ## Unit: microseconds ## expr min lq mean median uq max neval ## sqrt(x) 1.025 1.2150 1.44257 1.2830 1.544 4.026 100 ## x^0.5 17.209 17.5015 18.95846 17.7265 18.010 51.235 100 ``` --- # Measuring the time: Benchmark packages - And the rbenchmark package ```r library(rbenchmark) x <- runif(100) benchmark( "method1" = sqrt(x), "method2" = x ^ 0.5 ) ``` ``` ## test replications elapsed relative user.self sys.self user.child ## 1 method1 100 0.002 1.0 0.001 0 0 ## 2 method2 100 0.003 1.5 0.003 0 0 ## sys.child ## 1 0 ## 2 0 ``` --- # Your turn again - Compare your solution for the tridiagonal matrix problem with the one we presented before (file create_tridiagonal.R). - Benchmark them by generating a tridiagonal matrix of dimension 5000x5000 --- class: inverse, center, middle # IV. How to improve? ![](img/turtle.gif) --- # Profiling - In case we want to find out which part of our code is the most time consuming we can use profiling tools. - Again, there are many options available but I recommend you the profvis package - You can profile a piece of code by using ```r profvis({ #The code to be evaluated }) ``` ## Two tips - If the code is too fast, there is nothing to profile, and it will not run! - If you want to profile a single function, save that function in an .R file and profile it with: ```r source("my_function.R") profvis({ myfunction(x) }) ``` --- # What are the two main time consuming parts of our "create_tridiagonal" function? ```r create_tridiagonal <- function(n = 5, d1 = 1, d2 = 2) { output_matrix <- matrix(ncol = n, nrow = n, 0) for (row in 1:(n - 1)) { output_matrix[row, row - 1] <- d2 output_matrix[row, row + 1] <- d2 } output_matrix[row - 1, row] <- d2 output_matrix[row + 1, row] <- d2 diag(output_matrix) <- d1 return(output_matrix) } ``` ```r source("create_tridiagonal.R") profvis::profvis(create_tridiagonal(5000,1,-1)) ``` --- # What are the two main time consuming parts of our "create_tridiagonal" function? ![](img/profile_example.png) --- # Three things to improve - Create the matrix() and assign the diag() in one step - Inside the for() cycle fill the upper and low diagonal in one step ```r create_tridiagonal2 <- function(n = 5, d1 = 1, d2 = 2) { #We create the matrix and assign the diagonal in one step output_matrix <- diag(d1, n) #"row" is a function in base r, so I prefer "i" as counter # We put the d2 value in one step instead of 2 for (i in 1:(n - 1)) { output_matrix[i, c(i - 1, i + 1)] <- d2 } output_matrix[c(i - 1, i + 1), i] <- d2 return(output_matrix) } ``` --- #Now let's compare ```r library(rbenchmark) benchmark(m1 <- create_tridiagonal(5000,1,-1), m2 <- create_tridiagonal2(5000,1,-1), replications = 3) ``` ``` ## test replications elapsed relative ## 1 m1 <- create_tridiagonal(5000, 1, -1) 3 1.476 2.295 ## 2 m2 <- create_tridiagonal2(5000, 1, -1) 3 0.643 1.000 ## user.self sys.self user.child sys.child ## 1 0.686 0.790 0 0 ## 2 0.298 0.344 0 0 ``` --- # Second approach: We create the vector of elements and shape it into a matrix ```r diagonal <- function(n, d1, d2){ matrix(c( c(d1,d2,rep(0, n-2)), rep(c(d2,d1,d2, rep(0,n-2)), n-2), c(d2,d1) ), ncol=n, byrow = T) } ``` --- # How to improve it? ```r source("diagonal.R") profvis::profvis(diagonal(5000,1,-1)) #Not very informative, everything is just one function: matrix() ``` ![](img/profvis2.png) --- #Things to improve: - Reduce the number of calls for rep(). Currently 3 - Use rep.int which is faster ```r diagonal2 <- function(n, d1, d2){ #Notice that we now create only a short version of the vector, #the matrix() function is filling it out #But it will raise a warning! matrix(c(d1, d2, rep.int(0,n-2), d2), nrow = n, ncol = n) } ``` --- #Now let's compare ```r benchmark(m3 <- diagonal(5000,1,-1), m4 <- diagonal2(5000,1,-1), replications = 3) ``` ``` ## Warning in matrix(c(d1, d2, rep.int(0, n - 2), d2), nrow = n, ncol = n): ## data length [5001] is not a sub-multiple or multiple of the number of rows ## [5000] ## Warning in matrix(c(d1, d2, rep.int(0, n - 2), d2), nrow = n, ncol = n): ## data length [5001] is not a sub-multiple or multiple of the number of rows ## [5000] ## Warning in matrix(c(d1, d2, rep.int(0, n - 2), d2), nrow = n, ncol = n): ## data length [5001] is not a sub-multiple or multiple of the number of rows ## [5000] ## Warning in matrix(c(d1, d2, rep.int(0, n - 2), d2), nrow = n, ncol = n): ## data length [5001] is not a sub-multiple or multiple of the number of rows ## [5000] ``` ``` ## test replications elapsed relative user.self ## 1 m3 <- diagonal(5000, 1, -1) 3 3.573 5.985 2.423 ## 2 m4 <- diagonal2(5000, 1, -1) 3 0.597 1.000 0.269 ## sys.self user.child sys.child ## 1 1.147 0 0 ## 2 0.329 0 0 ``` ### Around 6 times faster! --- # Third approach: Sum of three matrices, one for main diagonal, one for lower, one for upper. ```r tridiagonal_matrix <- function(n, d1, d2){ main_diag <- diag(n) * d1 lower_diag <- cbind(diag(n)[,c(2:n)],rep(0,n)) * d2 upper_diag <- rbind(diag(n)[c(2:n),],rep(0,n)) * d2 return(main_diag + lower_diag + upper_diag) } ``` --- # How to improve it? ```r source("tridiagonal_matrix.R") profvis::profvis(tridiagonal_matrix(5000,1,-1)) ``` ![](img/profvis3.png) --- # Things to change - Three calls to diag() - Two calls to rep() ```r tridiagonal_matrix2 <- function(n, d1, d2){ base_diag <- diag(n) main_diag <- base_diag * d1 lower_diag <- cbind(base_diag[,-1],rep.int(0,n)) * d2 upper_diag <- t(lower_diag) return(main_diag + lower_diag + upper_diag) } ``` --- # Let's compare ```r benchmark(m5 <- tridiagonal_matrix(5000,1,-1), m6 <- tridiagonal_matrix2(5000,1,-1), replications = 3) ``` ``` ## test replications elapsed relative ## 1 m5 <- tridiagonal_matrix(5000, 1, -1) 3 9.061 1.48 ## 2 m6 <- tridiagonal_matrix2(5000, 1, -1) 3 6.123 1.00 ## user.self sys.self user.child sys.child ## 1 5.626 3.432 0 0 ## 2 3.368 2.721 0 0 ``` --- # Fourth approach: Rcpp ```r Rcpp::cppFunction('NumericMatrix create_tridiagonal_rcpp(int n, int d1, int d2){ NumericMatrix m(n, n); for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) m(i, j) = 0; for (int i = 0; i < n; i++) m(i, i) = d1; for (int i = 0; i < (n - 1); i++){ m(i, i + 1) = d2; m(i + 1, i) = d2; } return m; }') ``` --- # Things to improve - Too many for cycles ```r Rcpp::cppFunction('NumericMatrix create_tridiagonal_rcpp2(int n, int d1, int d2){ NumericMatrix m(n, n); for (int i = 0; i < (n - 1); i++){ m(i, i) = d1; m(i, i + 1) = d2; m(i + 1, i) = d2; } m(n-1, n-1) = d1; return m; }') ``` --- # Let's compare ```r benchmark(m7 <- create_tridiagonal_rcpp(5000,1,-1), m8 <- create_tridiagonal_rcpp2(5000,1,-1), replications = 3) ``` ``` ## test replications elapsed ## 1 m7 <- create_tridiagonal_rcpp(5000, 1, -1) 3 2.049 ## 2 m8 <- create_tridiagonal_rcpp2(5000, 1, -1) 3 0.539 ## relative user.self sys.self user.child sys.child ## 1 3.801 1.696 0.352 0 0 ## 2 1.000 0.154 0.385 0 0 ``` --- # Fifth approach, use of row() and col() functions ```r tridiagonal_rowcol <- function(n, d1, d2) { x <- diag(x = d1, n, n) x[row(x)-1 == col(x)] <- d2 x[row(x) == col(x)-1] <- d2 return(x) } ``` --- # How to improve it? ```r source("tridiagonal_rowcol.R") profvis::profvis(tridiagonal_rowcol(5000,1,-1)) ``` ![](img/profvis4.png) --- #Things to improve - Assign the d2 diagonal in one step ```r tridiagonal_rowcol2 <- function(n, d1, d2) { x <- diag(d1, n) # x[row(x)-1 == col(x)] <- d2 # x[row(x) == col(x)-1] <- d2 x[abs(row(x) - col(x)) == 1] <- d2 return(x) } ``` --- # Let's compare ```r benchmark(m9 <- tridiagonal_rowcol(5000,1,-1), m10 <- tridiagonal_rowcol2(5000,1,-1), replications = 3) ``` ``` ## test replications elapsed relative ## 2 m10 <- tridiagonal_rowcol2(5000, 1, -1) 3 5.699 1.000 ## 1 m9 <- tridiagonal_rowcol(5000, 1, -1) 3 11.165 1.959 ## user.self sys.self user.child sys.child ## 2 4.278 1.421 0 0 ## 1 7.868 3.289 0 0 ``` --- # Original functions comparison ```r benchmark(create_tridiagonal(5000,1,-1), diagonal(5000,1,-1), tridiagonal_matrix(5000,1,-1), create_tridiagonal_rcpp(5000,1,-1), tridiagonal_rowcol(5000,1,-1), replications = 3 ) ``` ``` ## test replications elapsed relative ## 1 create_tridiagonal(5000, 1, -1) 3 1.168 1.000 ## 4 create_tridiagonal_rcpp(5000, 1, -1) 3 2.065 1.768 ## 2 diagonal(5000, 1, -1) 3 3.401 2.912 ## 3 tridiagonal_matrix(5000, 1, -1) 3 9.493 8.128 ## 5 tridiagonal_rowcol(5000, 1, -1) 3 10.835 9.277 ## user.self sys.self user.child sys.child ## 1 0.428 0.740 0 0 ## 4 1.667 0.396 0 0 ## 2 2.290 1.111 0 0 ## 3 5.998 3.460 0 0 ## 5 7.787 3.019 0 0 ``` --- # Original and new functions comparison ``` ## test replications elapsed relative ## 9 create_tridiagonal_rcpp2(5000, 1, -1) 3 0.557 1.000 ## 6 create_tridiagonal2(5000, 1, -1) 3 0.589 1.057 ## 7 diagonal2(5000, 1, -1) 3 0.669 1.201 ## 1 create_tridiagonal(5000, 1, -1) 3 1.360 2.442 ## 4 create_tridiagonal_rcpp(5000, 1, -1) 3 2.119 3.804 ## 2 diagonal(5000, 1, -1) 3 3.446 6.187 ## 10 tridiagonal_rowcol2(5000, 1, -1) 3 5.597 10.048 ## 8 tridiagonal_matrix2(5000, 1, -1) 3 6.354 11.408 ## 3 tridiagonal_matrix(5000, 1, -1) 3 9.478 17.016 ## 5 tridiagonal_rowcol(5000, 1, -1) 3 11.477 20.605 ## user.self sys.self user.child sys.child ## 9 0.157 0.401 0 0 ## 6 0.205 0.385 0 0 ## 7 0.281 0.388 0 0 ## 1 0.490 0.858 0 0 ## 4 1.790 0.327 0 0 ## 2 2.335 1.111 0 0 ## 10 4.225 1.371 0 0 ## 8 3.511 2.810 0 0 ## 3 5.950 3.495 0 0 ## 5 8.150 3.277 0 0 ``` --- class: inverse, center, middle # V. Iterations ![](img/iterations.gif) --- # Exercise Generate 100 matrices where n = 500, and d1, and d2 are predefined random integers between -10 and 10: ```r d1_vector <- sample(-10:10, 100, replace = T) d2_vector <- sample(-10:10, 100, replace = T) ``` --- # Our function ```r create_tridiagonal2 <- function(n = 5, d1 = 1, d2 = 2) { #We create the matrix and assign the diagonal in one step output_matrix <- diag(d1, n) #"row" is a function in base r, so I prefer "i" as counter # We put the d2 value in one step instead of 2 for (i in 1:(n - 1)) { output_matrix[i, c(i - 1, i + 1)] <- d2 } output_matrix[c(i - 1, i + 1), i] <- d2 return(output_matrix) } ``` Is also available at create_tridiagonal_2.R --- # What alternatives do we have? .pull-left[ ## Serial approach - For cycle - lapply - mapply - purrr::map - purrr::map2 ] .pull-right[ ## Parallel computation - foreach %dopar% - mclapply - furrr ] ## Other ideas? --- # For cycle ```r system.time({ # We start by creating a list for the matrices matrices1 <- vector(mode = "list", length = length(d1_vector)) for (i in 1:length(d1_vector)){ matrices1[[i]] <- create_tridiagonal2(500, d1 = d1_vector[i], d2 = d2_vector[i]) } }) ``` ``` ## user system elapsed ## 0.168 0.036 0.204 ``` --- # lapply To use lapply, first we have to create a new function to make it compatible with a single list: ```r tridiagonal_i <- function(i) { create_tridiagonal2(n = 500, d1 = d1_vector[i], d2 = d2_vector[i]) } system.time({ matrices2 <- lapply(seq_along(d1_vector),tridiagonal_i) }) ``` ``` ## user system elapsed ## 0.135 0.077 0.211 ``` --- # mapply mapply allows us to use the two vectors as parameters for the function directly ```r system.time({ matrices3 <- mapply(create_tridiagonal2, d1 = d1_vector, d2 = d2_vector, n = 500) }) ``` ``` ## user system elapsed ## 0.344 0.203 0.547 ``` --- # purrr::map purrr provides a set of tools to work with functions and vectors. The "map" function allows to pass a function to each element of a vector/list ```r system.time({ matrices4 <- purrr::map(seq_along(d1_vector),tridiagonal_i) }) ``` ``` ## user system elapsed ## 0.180 0.043 0.223 ``` --- # purrr::map2 Similar to mapply, map2 let us to use two different lists/vectors as inputs to the function ```r system.time({ matrices5 <- purrr::map2(d1_vector, d2_vector, create_tridiagonal2, n = 500) }) ``` ``` ## user system elapsed ## 0.146 0.053 0.200 ``` --- # Parallel choices: foreach First we load the package, set up the cluster and the number of cores we will use ```r library(foreach) cl <- parallel::makeForkCluster(8) doParallel::registerDoParallel(cl) system.time({ matrices6 <- foreach(i = 1:length(d1_vector)) %dopar% { create_tridiagonal2(500, d1=d1_vector[i], d2 = d2_vector[i]) } }) ``` ``` ## user system elapsed ## 0.198 0.250 0.664 ``` ```r parallel::stopCluster(cl) ``` --- # mclapply, mcmapply These functions are not available on Windows systems. ```r system.time({ matrices7 <- parallel::mclapply(seq_along(d1_vector),tridiagonal_i, mc.cores = 8) }) ``` ``` ## user system elapsed ## 0.035 0.639 0.720 ``` ```r system.time({ matrices8 <- parallel::mcmapply(create_tridiagonal2, d1 = d1_vector, d2 = d2_vector, n = 500, mc.cores = 8) }) ``` ``` ## user system elapsed ## 0.486 1.716 1.074 ``` --- # furrr package - The furrr package is providing a future api for the purrr package. - The future package let you to evaluate R expressions asynchronously (for instance, by evaluating expressions in parallel) using various resources available to the user. ```r library(purrr) library(furrr) ``` ``` ## Loading required package: future ``` --- #furrr package - We have two common option, multisession or multicore ```r plan(multisession) ``` ```r system.time({ matrices9 <- future_map(seq_along(d1_vector), tridiagonal_i) }) ``` ``` ## user system elapsed ## 0.311 0.111 1.525 ``` ```r system.time({ matrices10 <- future_map2(d1_vector, d2_vector, create_tridiagonal2, n = 500) }) ``` ``` ## user system elapsed ## 0.276 0.144 0.855 ``` --- # furrr package ```r plan(multicore) #Not supported in Windows ``` ```r system.time({ matrices11 <- future_map(seq_along(d1_vector), tridiagonal_i) }) ``` ``` ## user system elapsed ## 0.124 0.028 0.153 ``` ```r system.time({ matrices12 <- future_map2(d1_vector, d2_vector, create_tridiagonal2, n = 500) }) ``` ``` ## user system elapsed ## 0.058 0.040 0.098 ``` --- # Results ``` ## # A tibble: 12 x 5 ## method number replications elapsed relative ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 map 4 1 0.043 1 ## 2 map2 5 1 0.065 1.51 ## 3 lapply 2 1 0.066 1.54 ## 4 for 1 1 0.069 1.60 ## 5 multicore_map2 12 1 0.091 2.12 ## 6 multicore_map 11 1 0.112 2.60 ## 7 mapply 3 1 0.187 4.35 ## 8 mclapply 7 1 0.338 7.86 ## 9 mcmapply 8 1 0.523 12.2 ## 10 foreach 6 1 0.629 14.6 ## 11 multisession_map2 10 1 7.16 166. ## 12 multisession_map 9 1 7.19 167. ``` -- ## Parallel is not always faster! --- class: center # Conclusions ![](img/code.png) --- class: inverse, middle, center # Thank you! ## Keep in touch: ## adalvarez@gmail.com / adolfo.alvarez@analyx.com ## adolfoalvarez.cl ## Twitter: @adolfoalvarez