R-Guru Smile design: how to one in SIlhouette Studio ยป Smart Silhouette Resource Hub


   

    

Introduction to R programming, Interactive Mind Map. Statistical computing and graphics

R Programming Best Practices Checklist


Base R

  1. Good Programming Style for Beginners
  • Keyboard shortcuts: 
    • <- from Alt +
    • %>% from Ctrl + Shift + m
    • Ctrl + Enter – Will run current line and jump to the next one, or run selected part without jumping further.

    • Alt + Enter – Allows running code without moving the cursor to the next line if you want to run one line of code multiple times without selecting it.
    • Ctrl + Alt + R to run whole script and
    • Ctrl + Alt + B/E combinations to run it from Beginning to the current line and from the current line to the End.
  • R program names should have '.R' suffix, ex. mycode.R
  • R variable names should not contain dot '.' and not the same name as R functions, ex. split_1
  • Since variable names are case sensitive, it is best to rename them all to be lower case for easy or R programming
  • To help standardize character variable conditions, apply TOLOWER() or TOUPPER() functions to lower or upper case valuesnames(aslx)= tolower(names(aslx))
  • New R function names should be verbs
  • Use '<-' instead of '=' for creating vectors, data frames and assignments, ex. x <- 10.  Left side of '<-' should be valid vector or data frame name and right side should be valid R function and syntax.
  • Assignments to single variables in data frames may be referenced by [index position], taadmin[[6]] <- (!duplicated(taadmin$PT)) 
  • Best to write R comments in the beginning of the line instead of middle or end, Single line comments are possible.
  • To comment multiple lines of R syntax, one technique is to wrap it around an R function. The alternative within R Studio is to select the block of lines and select Control + Shift + C to insert # in front of each line.        
  • skip <- function(x) {   
  •   myquery <- ToothGrowth %>% # 1. source df
  •   select(len, supp, dose) %>% # 2. select variables, - drop
  •   filter(supp == 'VC') %>% # 3. subset records
  •   mutate(dose2 = (dose*2)) %>% # 4. derive variables
  •   arrange(supp, dose) # 5. sort records, desc()
  •   }
  • For small number of variables and records, create each variable vector and then combine them to create data frame 
  • For large data files, it is best to load data into data frame, for csv files, missing values are stored as 'NA', for reading SAS datasets, missing values may be saved as ., '' or 'NA', best to double check missing values in data frames
  • In general, create vectors and data frames instead of matrices
  • select(dm, studyid, subjid, age) to include source data frame as first parameter instead of using default data frame
  • Import only variables and records of interest from data files
  • Subset data frames as soon as possible
  • For each R function, always write '(' and ')' and then fill in parameters to not forget the ending ')' 
  • For R functions that support multiple variables, separate variables by ','
  • Create intermediate data frames instead of writing R multi-tasking
  • Create test data frames with R functions before replacing data frames
  • Create and test each R task component (filter, select, mutate, arrange, summarize) before assembling them using pipe %>%.  In general R task components can be in any order but best practices are to filter records, select variables, derive variables  and then arrange and summarize at the end.  In addition, group_by() should be before summarize() for by group summaries instead of outputting overall summaries.  Nesting R functions is an alternative to using pipe %>% but not ideal for more than 2 R task components.  For nesting R functions, it is best to indent each R component in new lines.  
  • Leverage popular R package such as Tidyverse and DPLYR for data management operations such as joins.  
  • Common non-by variables in joins are renamed to <name>.x and <name>.y.
  • Install and load R packages before calling R functions
  • Apply common R functions instead of applying other R functions for similar tasks
  • Copy/paste/update working R example for tasks instead of trying to remember exact R function syntax
  • Cearch for common R solutions to address common R errors
  • if (missing(df)) stop("parameter df is required.") for confirming dataframe exists
  • dim(df) check the number of variables and rows
  • is.na(df) check if the data frame has missing values
  • if (any(is.na(df))) print("missing values exist in at least one variable")
  • In a function call, arguments can be specified by position, complete name, or partial name. Never specify by partial name and never mix by position and complete name, ex. mean(x, na.rm = TRUE)
  • rnorm(10, 0.2, 0.3)
  • The na.rm argument appears in many R functions. R uses NA to represent Not Available, or 

    missing values. In summary functions, best to set na.rm=TRUE to ignore missing values since this is not done by default, ex. mean(mydata,na.rm=TRUE).
  • Put spaces around all binary operations, ex. x == y
  • Put a space after comas for second parameter, ex. mtcars[1, ]
  • Put a space before left parentheses, except in a function call, ex. if (grade == 5.5)
  • No spacing around code in parenthesis or square brackets, ex. species["tiger", ]
  • Indent code but not using tabs
  • When new variables are created from variables that contain NA, then NA will be the new variable result unless you include and !is.na(df$col1) in your condition to convert NA to the label for false values.  In general, the check for NA values is not needed. 
  • When transforming values, RECODE() works similar to Proc Format
  • When writing conditions for creating new variables or subetting records, it best to use CASE_WHEN() for greater flexibility in creating multiple expressions and conditions
  • Break long R code into several lines, ex. long_function_name <- function(arg1, arg2, arg3, arg4, long_argument_name1 = TRUE)
  • Comments are useful to explain logic behind R code and placed before or on the same line as R code, ex. # Derive age
  • Do NOT put more than one statement (command) per line. 
  • Do NOT use semicolon as termination of the command, ex. x <- 1
  • x <- x + 1
  • Cleanup objects and workspace before submitting new R program, ex. remove(list=ls()) for objects and rm(list = setdiff(ls(), "data")) for data frames
  • Understand your data before performing analysis, ex. summary(data)
  • Understand your data by groups before performing analysis, ex. aggregate(y ~ group, data, mean)  
                                        For graphs, it is best to transpose data to be long and thin, ex. 
                                        • data_long <- as.data.frame( # Convert data frame to long format
                                        • pivot_longer(data = data,
                                        • cols = c("x1", "x2", "x3", "y")))
                                        • head(data_long)
                                          1. Read and Write Data
                                            1. Text, Excel, SAS
                                          2. Create, and Select Variables and Observations
                                            1. Selecting Variables
                                            2. Selecting Observations
                                            3. Converting Data Structure
                                            4. Data Conversion Functions
                                          3. Combining Variables into Data Frames for Data Management
                                            1. Transforming Variables
                                            2. Conditional Transformations
                                            3. Logical Operators
                                            4. Conditional Transformation to Assign Missing Values
                                            5. Multiple Conditional Transformations
                                            6. Renaming Variables and Observations
                                            7. Recoding Variables
                                            8. Keeping and Dropping Variables
                                            9. By Group Processing
                                            10. Concatenating Observations
                                            11. Joining Data
                                            12. Summarize Data
                                            13. Reshaping Variables and Observations
                                            14. Sorting Data
                                          4. Value Labels and Formats
                                          5. Variable Labels
                                          6. Graphics
                                          7. Analysis

                                          TIDYVERSE Package (Style Guide)

                                          1. INTRODUCTION

                                          1.1 Topics

                                          1.2 Preparing Your Computer


                                          2. INTRODUCTION TO THE TIDYVERSE

                                          2.1 Tidyverse Packages

                                          2.2 Tibble Creation

                                          2.3 Tibbles Improve Printing

                                          2.4 Other Tibble Advantages

                                          2.5 Tibble Disadvantages

                                          2.6 Tibble Conversions

                                          2.7 The dplyr Package’s Verbs

                                          2.8 dplyr Input & Output


                                          3. CHOOSING VARIABLES AND OBSERVATIONS

                                          3.1 Using Subscripts

                                          3.2 Using dplyr Functions

                                          3.3 Variations on select

                                          3.4 Dropping Variables

                                          3.5 Table of Logical Comparisons


                                          4. COMBINING PROGRAMMING STEPS

                                          4.1 Nesting Only

                                          4.2 Piping

                                          4.3 Saving Results for Re-use

                                          4.4 Piping Details

                                          4.5 Piping to a Specific Argument

                                          4.6 Think About Your Steps


                                          5. COPYING & DELETING OBJECTS

                                          5.1 Copying Objects

                                          5.2 Copying Variables

                                          5.3 Removing/Dropping/Deleting Variables

                                          5.4 Removing/Dropping/Deleting Entire Objects


                                          6. RENAMING DATA SETS, VARIABLES, & ROWS

                                          6.1 Renaming Objects

                                          6.2 Renaming Big Objects

                                          6.3 Renaming Variables with dplyr

                                          6.4 Renaming All Variables Using “names”

                                          6.5 Copying Names From Another Data Frame

                                          6.6 Renaming a Block of Names, Step 1

                                          6.7 Renaming a Block of Names, Step 2

                                          6.8 Renaming Thousands of Variables

                                          6.9 Choosing Best Variable Renaming Method

                                          6.10 Renaming Rows

                                          6.11 The tibble Approach to Row Names


                                          7. TRANSFORMING VARIABLES

                                          7.1 Prepare the Workspace

                                          7.2 Using Classic Dollar Format

                                          7.3 An Easier Way: mutate

                                          7.4 mutate & transmute Details

                                          7.5 Row-Specific Functions

                                          7.6 The Base apply Function

                                          7.7 apply Function Details

                                          7.8 Many Variables, One Transformation

                                          7.9 mutate_at Details

                                          7.10 Table of Transformations


                                          8. CONDITIONAL TRANSFORMATIONS

                                          8.1 Prepare the Workspace

                                          8.2 The ifelse Function

                                          8.3 Recode Using ifelse

                                          8.4 Recoding Many Variables with ifelse

                                          8.5 The car::Recode Function

                                          8.6 Recode Many Variables

                                          8.7 Integers vs. Double Precision


                                          9. SUMMARIZING VARIABLES

                                          9.1 Prepare the Workspace

                                          9.2 The “summarise” Function

                                          9.3 summarise Details

                                          9.4 Many R Functions Require Vectors

                                          9.5 dplyr::summarise_at Function

                                          9.6 summarise_at Details

                                          9.7 Built-In Summary Functions

                                          9.8 dplyr Summary Functions

                                          9.9 dplyr Summary Combination Functions

                                          9.10 dplyr Sequence Functions

                                          9.11 dplyr Rank Functions

                                          9.12 Comparison of mutate and summarise


                                          10. GROUP-BY CALCULATIONS

                                          10.1 Prepare the Workspace

                                          10.2 The group_by Function

                                          10.3 Printing Grouped Data

                                          10.4 Review of mutate

                                          10.5 mutate By Group

                                          10.6 Summarisation By Group

                                          10.7 summarise By Group

                                          10.8 summarise_at By Group

                                          10.9 Group By Next Level

                                          10.10 Group By Next Level…Again!

                                          10.11 Un-Grouping


                                          11. GROUP-BY ANALYSIS WITH OUTPUT MANAGEMENT

                                          11.1 Prepare the Workspace

                                          11.2 R’s Built-in Approach

                                          11.3 Recall How t.test Works

                                          11.4 broom Package Cleans it Up

                                          11.5 Simple Analysis with group_by

                                          11.6 dplyr’s do Function

                                          11.7 broom’s Functions

                                          11.8 Model-Level Regression by Group

                                          11.9 Coefficient-Level Regression by Group

                                          11.10 Observation-Level Regression By Group

                                          11.11 Advanced Features


                                          12. SORTING DATA

                                          12.1 Prepare the Workspace

                                          12.2 R’s Various Ways to Sort

                                          12.3 When Sorting is Needed in R

                                          12.4 Data Not Sorted by Workshop

                                          12.5 dplyr::arrange Sorts Data Frames

                                          12.6 desc Does Descending Order

                                          12.7 Sorting by Two Variables

                                          12.8 R’s built-in sort Function

                                          12.9 R’s order Function

                                          12.10 Using order to Sort Data Frames

                                          12.11 rev Function Reverses order

                                          12.12 order by Two Variables

                                          12.13 How Location Affects Sorting


                                          13. SELECTING FIRST OR LAST OBSERVATION PER GROUP

                                          13.1 Prepare the Workspace

                                          13.2 When to Search for These Observations

                                          13.3 When it’s Not Needed

                                          13.4 dplyr’s slice Function

                                          13.5 Finding Min/Max Observation Using Sorting

                                          13.6 Finding Min/Max Observation Using filter

                                          13.7 Finding Min/Max Observation Using Ranks

                                          13.8 dplyr Ranking Functions


                                          14. STACKING DATA SETS

                                          14.1 Prepare the Workspace

                                          14.2 Creating a Data Frame to Stack

                                          14.3 Creating a 2nd Data Frame to Stack

                                          14.4 Stacking with dplyr::bind_rows

                                          14.5 R’s Built-in rbind

                                          14.6 R’s Built-in union


                                          15. FINDING AND REMOVING DUPLICATE OBSERVATIONS

                                          15.1 Prepare the Workspace

                                          15.2 Create Some Duplicates

                                          15.3 Locating Duplicates

                                          15.4 Generate Duplicate Report

                                          15.5 Removing Duplicates

                                          15.6 Checking Subsets of Variables


                                          16. MERGING / JOINING DATA FRAMES

                                          16.1 Prepare the Workspace

                                          16.2 Creating a Data Frame to Join

                                          16.3 Creating a 2nd Data Frame to Join

                                          16.4 Join by Common Variables

                                          16.5 Joining by Different Variables

                                          16.6 Types of Joins


                                          17. RESHAPING DATA FRAMES

                                          17.1 Prepare the Workspace

                                          17.2 Transposing Rows and Columns

                                          17.3 Example Wide Data Structure

                                          17.4 Advantages of Wide Data

                                          17.5 The Long Data Structure

                                          17.6 Advantages of Long Data

                                          17.7 Reshaping Options in R

                                          17.8 Gathering Wide to Long

                                          17.9 Wide to Long Details

                                          17.10 Spreading Long to Wide

                                          17.11 Extracting Numeric Values


                                          18. COMPARING OBJECTS

                                          18.1 Prepare the Workspace

                                          18.2 Comparing Vectors

                                          18.3 Comparing Data Frames

                                          18.4 Mixing Up a Data Frame

                                          18.5 Three Ways to Compare

                                          18.6 The compare Package

                                          18.7 Visual Comparison

                                          18.8 The compareDF Package


                                          19. CHARACTER STRING MANIPULATIONS

                                          19.1 Prepare the Workspace

                                          19.2 The stringr Package

                                          19.3 Regular Expression References

                                          19.4 Generating Numeric Variable Names

                                          19.5 Impact of Trailing Blanks

                                          19.6 Trimming Blanks

                                          19.7 Setting Case

                                          19.8 Splitting at a Column

                                          19.9 Splitting at a Blank

                                          19.10 Extracting Vectors

                                          19.11 Replacing Strings

                                          19.12 Combining Strings

                                          19.13 Finding: One Sub-string

                                          19.14 Finding: Multiple Sub-strings

                                          19.15 Finding: with Regular Expressions

                                          19.16 Finding: with Table Lookups

                                          19.17 The stringi Package


                                          20. DATE & TIME MANIPULATIONS

                                          20.1 Prepare the Workspace

                                          20.2 Converting Strings to Dates

                                          20.3 Subtracting Dates

                                          20.4 The difftime Function

                                          20.5 Converting Time Differences to Numeric

                                          20.6 Measuring Time Until Today

                                          20.7 Extracting Years, Weeks, Months

                                          20.8 Extracting Days

                                          20.9 Choosing Observations by Date

                                          20.10 Dealing with 2-Digit Years

                                          20.11 Date-Time References


                                          21. USING SQL WITHIN R

                                          21.1 Prepare the Workspace

                                          21.2 The sqldf Package

                                          21.3 Printing a Data Frame

                                          21.4 Choosing and Sorting

                                          21.5 Aggregating by Gender

                                          21.6 Key Syntax Differences

                                          Powered by Wild Apricot Membership Software