R-Guru Resource Hub |
Common R FAQs: CRAN, UCLA (Submit your Common R FAQ)
Below are common R technical questions with R solutions to solve real-world tasks.
1. What are common syntax for libname, filename and reading datasets? See R paper.
sdtm <- "c:/product/study/analysis/data/sdtm" # assign libname to object named sdtm
out <- "c:/product/study/analysis/data/adam" # assign out filename to path
library(haven) # required to read SAS datasets
dm <- read_sas(file.path(sdtm,"dm.sas7bdat")) # read sas file as a data frame
#'read_sas' function from the haven package (part of the tidyverse)
taadmin <- read_sas("H:/rproject/project_y_r2/taadmin.sas7bdt")
3. Is there a website to run example R programs? Yes, see site.
4. What is R? R is a programming language that uses the concept of functions to create objects to be linked together. There are many rules to understand and follow.
5. Why should you learn R? Since R is so vast and can be confusing quickly, I suggest you identify what you plan to use R for such as data access, data management, data reporting or data analysis. In the pharma industry, R got a jump start with graphs that were easy to create. I suggest you have patients learning R since may require more syntax than you may be expecting.
6. What are useful methods to learn R? Show and Tell is useful to see and run R syntax on a command by command basis. Taking your own notes helps to retain and build understanding. Cheat sheet are only helpful if you recognize the syntax and it's purpose. Since R is very technical, try to focus on your task and master the those few sets of commands. How-To step checklist and examples help to remind users how to run R syntax.
These are best practices for learning R: Get Familiar and Gain Confidence since concepts are similar to other programming languages. Know and understand starting and ending point and how best to get there. Learn R requirements vs R Nice-to-Know Functions. Just-in-time R training, Apply R right after Learning R in 15-30 min sessions which work best. Focus only on what is required for your role instead of learning all R. Instead of mapping SAS syntax to R directly, it is better to use SAS macro approach as R commands.
7. What is compariable R syntax for Data step KEEP and WHERE statements?
# complete cases and select
taadmin2 <- taadmin %>% filter(complete.cases(taadmin[["DOSECUN"]])) %>% select(INV, PT, DCMDATE, DOSECUN, DOSETL)
data taadmin2 (keep=inv pt dcmdate dosecun dosetl flag); set taadmin (where=(dosecun is not null)); run;
8. How can you get first. and last. records in R?
# order by pt and dcmdate
taadmin <- taadmin[order(taadmin$PT, taadmin$DCMDATE),]
# add flagf and flagl constant variables
taadmin <- cbind(taadmin, flagf=0, flagl=0)
# assign index 6 variable flagf 1 for first.pt
taadmin[[6]] <- (!duplicated(taadmin$PT))
# assign index 7 variable flagl 1 for last.pt
taadmin[[7]] <- (!duplicated(taadmin$PT, fromlast=TRUE))
proc sort data = taadmin2; by pt dcmdate; run;
data taadmin3; set taadmin2; by pt dcmdate; flagf=0; flagl=0; if first.pt then flagf=1; if last.pt then flagl=1; run;
9. What is an example of a simple R plot? Below syntax will create a cty by hwy plot.
g <- ggplot(data = mpg, aes(x = cty, y = hwy))
10. How can you read csv file in R?
data1 <- read.csv("./data/DiastolicBloodPressure_initial.csv")
11. What are useful R metadata functions to display data frame attributes?
tg <- ToothGrowth # save sample data frame to tg data frame
View(tg) # browse tg
str(tg) # display tg attributes and sample data
attributes(tg) # display tg attributes
head(tg) # display tg sample records
print(tg) # display tg all records
describe(tg) # run DESCRIBE command to yield more granular percentiles and extreme observations
stats <- summary(tg) # create stats object of continuous vars
print(stats) # display tg stats object
freq <- table(tg) # create freq of categorical vars
print(freq) # display tg freq object
12. What is a simple R function?
info <- function(d) { writeLines("First display the structure of the data frame.") str(d)
writeLines("Then print the first 6 observations to see the variables and values.") print(head(d)) }
# Call the new ‘info’ function:
info(data1)
%d for day of month, %a for 3 digit day, %A for full day, %m for short month, %b for 3 digit month, %B for full month, %y for 2 digit year, %Y for 4 digit year
14. What is the difference between package and library? A package is a like a book, a library is like a library; you use library() to check a package out of the library.
15. What is your working directory? R is always pointed at a directory on your computer. Often this will be your home directory, ex. C:/Users/Sunil/Documents . When you work within a RStudio project, the working directory will be the head of that directory. You can find out which directory by running: getwd(). You can use setwd("C:/study/analysis/output") to change the default directory. system.file() displays the system file folder. system.file(='dplyr') displays the path for dplyr package.
16. What packages are installed? installed.packages(). You can use the system.file() function to check if a particular package is installed in our current R environment.
library(SASxport)
library(sas7bdat)
library(dplyr)
dat1 <- read.sas7bdat('dm.sas7bdat') # read SAS dataset
select(dat1, ”PROJECT”, ”SUBJECT”, ”SITEID”, ”RACE”, ”ETHNIC”, ”SEX”, ”AGE”) # select variables
dm1 <- merge(x=dat2,y=dat3, by=”SUBJECT”, all.x=TRUE) # merge data frames
write.xport(dm_f2,file=paste(getwd(), "dm.xpt", sep="/"), autogen.formats =FALSE # create xpt file
library(Hmisc)
adsl <- sasxport.get("adsl.xpt", lowernames=FALSE) # read xpt file
library(haven) # below is another example
# Sample data frame
df <- data.frame(ID = c(1, 2, 3), Age = c(21, 32, 45), Gender = c(""Male"", ""Female"", ""Male"") )
# Add variable labels
attr(df$ID, ""label"") <- ""Subject ID""
attr(df$Age, ""label"") <- ""Age at Visit""
attr(df$Gender, ""label"") <- ""Gender""
# Write XPT file with variable labels
write_xpt(df, ""example.xpt"")
18. How do write and run R code? Writing your first R code. Enter code in code editor window. Click Run to execute current line and Source to execute all statements.
19. What are examples of applying proc freq in R?
table(cars$Type) # proc freq data=cars; tables type; run;
table(cars$Type, cars$Cylinders) # proc freq data=cars; tables Types*Cylinders; run;
20. What are examples of appying proc means in R?
ASL_mean <- ASL %>% group_by(ARMCD) %>% summarise(avg_age = mean(AGE), avg_bmi = mean(BMI)) %>% ungroup()
# %>% is pipe to connect R functions, see magrittr package
# proc means data=ASL; by ARMCD; var mean bmi; run;
21. What are examples of proc sql in R? See R paper.
adsl <- dm %>% select(studyid, subjid, age, sex, height, weight, race, scrfl) %>% mutate(bmi = (weight*703)/height^2 ) %>%
filter(scrfl == “Y”) %>% select(-scrfl) %>% arrange(studyid, subjid)
# apply %>% to connect R functions
# select 8 dm variables
# derive bmi variable from weight and height
# subset for scrfl = ‘Y’
# sort by studyid and subjid
22. How do you convert excel files to R data frame? library(readxl) # required for read_excel function
l_cancer <- read_excel(“C:/data/l_cancer.xlsx”)
23. How do you display a list of functions once a package/library is opened? lsf.str("package:dplyr")
24. What is the syntax for keeping and dropping variables? df = subset(mydata, select = -c(x,z) ) # by variable name
df <- mydata[ -c(1,3:4) ] # by column index number
25. How best can you sort data? Note that R sorts missing values, NA, last instead of first as in SAS. One method to resolve this difference is to apply replace_na() to change NA to blanks and then sort data frame. newdata <- mtcars[order(mpg, cyl),] # sort by mpg and cyl
26. What is the syntax to create R arrays? Array_name = array(data,dim = c(row_size,column_size,matrices), dimnames = list(row_names,column_names,matrices_names))
27. What are useful metadata functions and to check for missing data? Does the data frame exists? What is the data frame class? Does the data frame contain values?
28. How best to create data frames from vectors? hospital <- c("New York", "California") patients <- c(150, 350) df <- data.frame(hospital, patients)
29. How do you name values in vectors? Access defined islands named vector > str(islands)
Named num [1:48] 11506 5500 16988 2968 16 ...
- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ..
30. %>% - R Pipe Function, Sequence by passing the left hand side of the operator to the first argument of the right hand side of the operator, requires installing magrittr package, library(magrittr) iris %>% head()
31. What is R Shiny? R Shiny is a web application framework for R and R Studio's Shiny server which makes shiny applications available over the web. See R Studio blog on R Shiny. See how to create a simple R Shiny app.
32. Can you write macros in R? With R, you can create custom and complex functions that are similar to macros. This give you flexibility in multiple methods for calling the R function. See Doing Macros in R blog. See Macros In R blog.
33. What options are there to export R? Export to CSV, Excel and SAS. See export to RTF. See other examples.
34. What interfaces are their between R and SharePoint? See blog.
35. How exactly is R an object-oriented programming language? By using symbols (<-, $, [], etc.), R language directly access data frames, variables and values. R symbols have built-in meanings that save time in writing code. All R functions are concise and process objects as input and outputs. These features help to make R programs 'cryptic' and more 'syntax heavy'.
36. Are there any symbols for ending R statements similar to ';' in SAS? There is no end of statements in R. Line breaks generally are used for readability of code. R executes line by line or in batch mode during console window. Example of batch run is "C:\Program Files\R\R-2.13.0\bin\R.exe" CMD BATCH your_r_script.R.
37. Is it possible to run R programs within SAS? Yes, see SAS Programming for R Users book for examples.
38. How can you create log file as in SAS? Load the logr library within the SASSY package.
39. What is R not? Easy to Learn, Technically Friendly, Only for Plots, Only for Statistical Modeling, Similar to SAS, Only for Data Science, Case Insensitive, Missing Data Friendly. R has functions and tools for data access, data management, statistical modeling, analysis datasets TLGs and data visualization.
40. Is there an R function similar to SAS's proc compare to compare two data frames? Yes, see comparedf function from the arsenal package. The summary function provides details similar to listall. An alternative method is the identical() function. Another alternative is all.equal(source, qc) - example.
library(arsenal) # required package
print(comparedf(prod, qc, by = "case")) # compare attributes
print(summary(comparedf(prod, qc))) # compare values
41. Does FDA use R? Yes, R is installed at FDA. FDA does not recommend specific software packages. FDA only requires sponsors to assure software used is validated. See Using R: Perspectives of a FDA Statistical RevieweR. [Validation, Presentation]
42. What are the options for saving R data? You can use save() function to save multiple objects as RData object or the saveRDS() function to save one object as RDS object. See examples.
a <- 1
b <- 2
c <- 3
save(a, b, c, file = "stuff.RData")
load("stuff.RData")
saveRDS(a, file = "stuff.RDS")
a <- readRDS("stuff.RDS")
43. How does R handle missing data?
44. What are useful R debugging functions? traceback(), debug(), browser(), trace(), recover()
45. What are common R errors?
46. What are possible R error types? syntax, symbols, program logic, parameters (input / output), data (input / output / missing / invalid), object / vector type incompatible, invalid object / data assignments / data type conversions.
47. What are best practices for debugging R programs? Review data frame metadata, missing/unique/range of data, program logic, display intermediate object values.
48. What are options for selecting variables? Yes, you can select by starting and ending position variables and variable name. taadmin2 <- select (taadmin, STUDY:SUBSETSN, starts_with("L"))
49. What is a useful R package for project management? The packrat package is useful to manage projects. See Data Flair for Data Science Projects.
50. What is the difference between '<-' and '='? See StackOverflow.com.
51. How can you display all objects created? objects() or ls() displays all object names.
52. How do you navigate within workspace? See blog1 and blog2.
53. What tools expect for creating a configuration management system for R programs? Github with security and restricted access allows organizations to check in and out for version control. See How to Check which Package Version is Loaded in R. See how to create a Github account.
54. Is it best to only use validated R packages for pharma companies?
In general, it is best to only use validated R packages. If you do use any user developed R package that is not officially validated by an R organization, then users and organizations can self validate these R packages using rigerous testing and documentation. R is supported many fortune 500 organizations and has over 2 million R users. The availability of source code helps to make over 18,000 R packages to date in the 4.1.0 is the latest version. Graduates are learning R and Python in school so the pool of talent is large. Users can test R packages with known data and results from publication. The higher the # of downloads and the frequency of updated also helps to determine who widely it is used.
55. Is R backwards compatible? Yes, in general.
56. How can you display all four panels (source, console, environment and files)? See blog 1. See blog 2.
57. What is the syntax for saving data frames? See Hands-On Programming with R examples.
setwd("C:/study/analysis/output") # change from C:/Users/Sunil/Documents
write.csv(dm, "C:/study/analysis/output/mydm.csv", row.names = FALSE) # R function to write csv file
saveRDS(asl2, file = "asl2.RDS") # R function to save R data frame
myasl <- readRDS("asl2.RDS") # R function to read R data frame
58. What is the syntax for displaying all variables and objects created so far?
is() # display all variables
objects() to display all objects in workspace (data frames, vectors, etc.)
59. What are comparable R functions for SAS's INPUT() and PUT() functions? The AS.NUMERIC() and AS.CHARACTER() functions.
mychar = '10'
mynewnum=as.numeric(mychar) to save char as numeric var
mynewnum=10
mynewchar=as.character(mynewnum)
mynewchar='10'
60. What R function is useful to display variable type? The CLASS() function.
class(mynewchar)
"character"
class(mynewchar)
"numeric"
61. What are R limitations and challenges?
R is a great tool for the validation needed in the clinical trials domain. However, there could be some challenges in more wide spread use. For example, Help in R is more technical and not organized as compared that of other software. The numerous packages in R provide the latest in theory. However, the user needs to be aware of the packages and functions available. Frequent updation of the R packages might disturb the stability of the validator and make it difficult to maintain. The regulatory agencies and review officers need to be equipped with the necessary knowledge and the required software. There is a market perception to disregard free software as not being up to the mark. Performance wise, R is slower than other programming languages such as C++. R is inefficient in handling huge datasets and complex looping.
62. Is there a website to test run R code? Yes, see rdrr (Packages). See mycompiler (tidyr, data.table, dplyr, ggplot2, plotrix, reshape2).
63. What is the best method for commenting multiple R lines? Apply Control-Shift-C.
64. Where can you access sample R data? See site.
65. What is the function to display titles in R output? Use writeLines("xxx").
66. What are examples of using ifelse() function? Ifelse() is similar to dplyr's if_else(). See ifelse(a %% 2 == 0,"even","odd") example.
67. How can you display all objects created and how to delete them? ls() displays all objects, remove(list=ls()) removes all objects, rm(df) removes df object.
68. How are dates stored in R? The number of days since 1970.
69. What methods exist for reading special characters in excel file? - read.csv("test1.csv", encoding = "latin1"), read.csv("test2.csv", encoding = "UTF-8")70. How best to resolve conflicts with duplicate function such as filter()? - apply dplyr::filter or base::filter()), ex. filter <- dplyr::filter, filter(mtcars, am & cyl == 8)
71. Is it possible to have multiple R consoles for multiple programs? - Yes, Project (Upper Right Corner) > Open Project in New Session, Session > New Session
72. When joining data frames with common non-by variables how does R handle these values? R renames left variable to var.x and right variable to var.y.
73. Does R have the equivalent of data set options in SAS? Yes, R direct variable reference is simular to SAS data set options such as keep, drop and where can be applied in row and column references using this syntax df1[ row, column]
74. How can you identify the top 10 most frequent observations? df_ings_top10 <- df_ings[order(-df_ings$Freq)[1:10],] # Sorting descending and apply index 1st to 10th
75. What is a useful function to combine year, month and days to create *DTC variables? ymd( paste(y_obj,m_obj,d_obj, sep="-") ) # results in "2022-05-10"
76. Which organizations have submitted R submission packages? See GitHub for a list of R submission packages.
77. What is the method to convert all variable names to lower case?
78. What is the differences between single and double quotes in R functions? Single and double quotes can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes. Backslash is used to start an escape sequence inside character constants.
79. Do dataset options exist in R to track source datasets in left and right joins? Yes, with [] direct variable reference, data frame option exists to for keeping, dropping or subsetting records.
dataframe3ax <- left_join(dataframe1, dataframe2[, c('team', 'assists')], by='team')
The temp in operator does not exist to track variables from source data sets.
# create constant values to track source data frames
df1$dsn <- 'ae'
df2$dsn <- 'dm'
combined <- bind_rows(df1, df2)
80. How are missing values transposed? Transpose by variables are all non-transpose variable name and value. Missing values are preserved as 'NA'. When transposing to longer version, you can apply values_drop_na to exclude NA records so the final dataset will not include missing records.
81. When reading a SAS data set created from a data frame, is there a method to convert all character and numeric variables from 'NA' and 1 to '' and .? Yes, the SAS code below converts all vairables with 'NA' and 1 values.
df$v1[df$v1== ''] <- NA
83. What is the internal process for creating data frames? In general, R does not behave like SAS data step process except for when using row_number(), custom R functions or the mutate() function since you can create multiple variables in one step. In genereal, each R function is a separate process. Lists are useful to create with the list() function. The R loop, however, behaves similar to a data step since it loops through each record.
thislist <- list("apple", "banana", "cherry") # Create a list of values
for (x in thislist) { # loop through values in list
print(x) # apply R function to all values in the list
}
The example below references the record number in the logic for labeling n_half.
Custom R Functions with logic conditions can process blocks of R functions.
84. In R, length refers to the number of records instead of the variable length. In general, R does not display the length of character variables, only the name and type.
85. Does R have auto-code fill syntax? Yes. You an also use the help() with lm() syntax: help(lm), help("lm"), ?lm, ?"lm". See How-to R tasks.
86. How are integers and numeric values stored and processed? In R, it is best to keep all whole and decimal numbers as numeric instead of integers. Numeric, ex. 1, 2, 3, 1.5, 2.91, Integers, ex. 1, 2, 3L.
87. How can you impute missing values when creating ADaMs? Use the NAFILL() function for last observation carried forward (locf) or last valid value, for example.
df(all_miss[all_miss>0]) # display vars with all missing values
88. When creating summary tables, is GT better than GTSUMMARY? GT package is similar to SAS' Proc Template procedure with keywords for each table section. GTSUMMARY offers more benefits than GT since many defults will make it easier to create tables. Below is an example of creating a demo table by TRT01A.
89. What is the differenece betweem R base merge() and DPLYR join() functions? Both merge() and join() functions are similar but have slight differences. The join() functions from dplyr preserve the original order of rows in the data frames while the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.
90. How can you join more than two data frame together? JOIN() can only join two data frames together at a time. The code below can be used to join multiple data frames. See more info on reduce().
91. How do import specific sheets in excel? Use the sheet option.
93. How can you autoload R packages? See blog.
94. When comparing numberic values, is there a method to specific a range of values or like values? The near() function with tol gives flexibility when subsetting.
95. What are factors useful for? When you need to order character values but not by alpha order such as the months, Jan, Feb and Mar. In addition, you can use factors to confirm only valid numbers are stored in variables.
96. What are difference between grep() and grepl()? grepl() returns TRUE when a pattern exists in a character string and grep() returns a vector of indices of the character strings that contain the pattern.
97. How best to resolve warning message: All formats failed to parse. No formats found? See solution for lubridate package issue.
98. 2017 SAS, R, or Python Flash Survey Results [Blog]
99. How to find the package name in R for a specific function? Install the sos package.
require("sos")
findFn("computeEstimate")
100. Does R have global and local variables? Yes, variables created outside of functions are global and variables created within functions are local.