Debugging in R
1 Introduction
2 Rationale
- Why debugging?
- We are all stupid at times
- Sometimes, we are clever
- Later, this cleverness makes us feel stupid
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? - Brian Kernighan, Unix for Beginners
3 How?
- print/message et al
- browser
- traceback
- trace
- trace on predicates
- Condition handling more generally
4 Print/REPL debugging
- Easy to start with it, intuitive
- Can be used across many languages
- In some languages, it is the only way :(
- Pretty useless for larger functions
- Makes for horrible, horrible code
5 Stupid Example
stupid_func <- function(data, indices) { train <- data[indices,] test <- data[indices] return(list(train, test)) }
- Spot the problem?
- I didn't (for more time than I'm comfortable admitting)
6 Print debugging
test <- sample(nrow(iris), 75, replace=FALSE) mytest <- stupid_func(iris, test)
- My `stupid_func` actually works well as an example, because none of the debugging methods are going to work well on this
7 Getting Better: Browser
stupid_func <- function(data, indices) { browser() train <- data[indices,] test <- data[indices] return(list(train, test)) }
- Browser will stop execution of the function at the point at which it is called
- There are then a number of things you can do
- 5 commands:
- n for next command
- s for step into the next function
- 5 commands:
this is a drawer
- f for finish execution of current loop/function
- Q for stop (exit debugging)
- <Enter> for repeat current command
- where print out current call-stack
8 Five browser commands (No. 4 will astound you!)
- n: moves to the next line
- Q: leaves the browsing setting, not evaluating the function
- c: leaves the session, evaluating the function
- s: steps into a function at point
- f finish execution of current loop/function
- Seriously though, number 4 means that you can move seamlessly from your code to the code you call, which is amazing (to me, at least ;)
- Additionally, this works better with smaller functions
9 More realistic examples
parse_quote <- function(quote) { quotecontents <- sapply(quote, function(x) content(x)) numrows <- length(quotecontents) numcols <- max(sapply(quotecontents, length)) resmat <- matrix(data=NA, nrow=numrows, ncol=numcols) nameextractors <- tolower(names(quotecontents[[1]])) for(i in 1:length(nameextractors)) { fun <- get_component( component=nameextractors[i]) part <- fun(x=quotecontents) resmat[1:length(part),i] <- part } resmat }
- WTF?
- I think I was trying to build a generic extractor for the responses returned from an API, but given the name, I definitely didn't start that way.
- Let's use browser to see what the hell actually happens in this function
- Conveniently, there's one already there :)
10 Laziness to a whole new level: trace
- I could add some browser calls to the above function
- That's going to be really annoying (especially when you run it in a script on a remote machine and it times out and it takes days to figure out what the hell even happened)
- Of course, I would never do that
- there's a better way - trace
11 Trace
- Trace with no arguments reports when a function is called
stupidfunc <- function(x, y) {res <- x + y} stupiderfunc <- function(n) { res <- vector(length=n) for(i in 1:n) { res[i] <- stupidfunc(i, 2) } } trace(stupidfunc) stupiderfunc(10)
trace: stupidfunc trace: stupidfunc trace: stupidfunc trace: stupidfunc trace: stupidfunc trace: stupidfunc trace: stupidfunc trace: stupidfunc trace: stupidfunc trace: stupidfunc
12 More Trace
- Trace can be used for debugging your own functions
- It can also be used to debug functions from packages
trace(ggplot2, tracer=browser)
- This will allow you to step through the entire function
- Upon reload of the function, this is removed
- Can also be removed using untrace
13 Other Tricks
- If an error occurs in someone else's code, there is an easy way
options(error=recover) options(error=NULL)
- This will then present you with a call stack and the ability to step into any of them using a number (or 0 to exit to the top-level)
- This is really useful for errors which are sporadic, so you can see what data actually causes the error
14 Handling errors
- We don't always have the option of dropping into a REPL to debug
- The code could be on a remote server
- The code could be running on someone else's machine
- You may wish to automate a set of reports/decisions in which case you definitely can't handle errors manually
- This is a job for R's condition system
15 Try
- The simplest way to do this is try
## a <- "a" ## 1+a
err <- try(1+a, silent=TRUE) class(err)
"try-error" =- So now we can record each of the errors and (potentially take some action based on them)
16 tryCatch
- TryCatch is a more general form of try
- Using this, we can take different actions based on what happened in the function
conditions <- function(code) { tryCatch(code, error=function(c) "error", warning=function(c) "warning", message=function(c) "message" ) } conditions(stop(1+2)) conditions(warning(1+2)) conditions(message(1+2))
[1] "error" [1] "warning" [1] "message"
- If the code is successful, the result is returned
- Otherwise, the respective condition function is evaluated
- So, for instance, if we were trying to get a bunch of webpages, then we could log errors (and potentially retry) and warnings, while using message to report on the progress which was made
17 Better Examples
message_handler <- function(m) message(m) warning_handler <- function(w) warning(w) error_handler <- function(e) simpleError( message="simple error", call=e) ok_res <- tryCatch(expr=1+1, message=message_handler, warning=warning_handler, error=error_handler) warning_res <- tryCatch(expr=as.integer(2^32+1), message=message_handler, warning=warning_handler, error=error_handler) error_res <- tryCatch(expr=1+a, message=message_handler, warning=warning_handler, error=error_handler)
Warning message: In doTryCatch(return(expr), name, parentenv, handler) : NAs introduced by coercion to integer range
18 Finally
- tryCatch also has an argument finally which is a function which is called before control is handed away from the tryCatch block
- This is normally most useful for writing out files and ensuring that connections are closed.
- In general, when something always needs to happen, regardless of any errors, it should be in a finally block.
tryCatch({ while(isTRUE(levok)) { error={function(...) message(e)}, finally={ ##because i starts at one statelist_done <- statelist[1:(i-1)] saveRDS(statelist_done, file=paste("statelist", as.character( Sys.time()), args[1], ".rds", sep="_")) change_instance(first, "stop")})
- This was code that hit an API and logged all of the data got in each session in a finally block. It also ensured that the connection to the API was closed.
19 Custom Conditions
- You can create custom conditions
- They should inherit from error, warning or message if you want them to work
- They must contain message and call components
#shamelessly stolen from advanced R, Wickham (2014) condition <- function(subclass, message, call =, ...) { structure( class = c(subclass, "condition"), list(message = message, call = call, ...) ) } is.condition <- function(x) inherits(x, "condition") myerr <- condition("error", message="this is my error") is.condition(myerr)
20 Example: getting loads of photos for CNN usage
- Because deep learning is so hot right now
- And because I suspect most of the benchmarks are horribly over-fitted
get_one_photo <- function(url, name) { download.file(url, destfile =name, mode="wb" ) message(paste("got ", url, " saved to ", name, sep="" )) } get_some_photos <- function(list, id, folder) { for (i in 1:length(list)) { nam <- paste0(folder, "/", id, "-", i, ".jpg") get_one_photo(list[i], name = nam) } } dir.create("photos_sample")
21 Explanation
- we wrap download.file for getting one url and saving as PNG
- We then call this functions repeatedly this to get all photos associated with a given row (the URLs are stored in a list-column)
22 Using messages to record state
log_results <- function(e) { if(!exists("num_processed")) { num_processed <<- 1 } else { num_processed <<- num_processed + 1 } if(!exists("messagedf")) { messagedf <<- vector(mode="list", length=1) } else { messagedf <<- c(messagedf, e) }}
- We will log each URL processed
- We can also log the number of URLs processed (just because, I guess)
- Note the (normally a bad idea) use of global variables (<<-)
23 Warnings
handle_warnings <- function(e) { message(e) if(!exists("warning_vec")) { warning_vec <<- e } else { warning_vec <<- c(warning_vec, e) } }
24 Putting it together
get_all_photos <- function(data) { for(j in 1:nrow(data)) { if(length(data$photos[[j]])==0) { next } else { tryCatch(expr={get_some_photos( unlist(data$photos[j]), id=data$listing_id[j], folder="photos_sample")}, message=function(e) log_results(e), warning=function(e) handle_warnings(e), error=function(...) message(e) ) } } }
25 Recap
- This is, I admit, definitely not best practice
- But if the errors are independent and infrequent, it does have the advantage of working.
- To make it better, we'll need to go a little further into R's condition system
26 Conclusions
- R has a variety of mechanisms for debugging
- browser is quick and easy
- options(error=recover) is useful, but annoying
- trace allows you to debug any function
- When you need to respond to unexpected events, use the condition system
27 References
- Hadley Wickham, Advanced R (this chapter)
- John Chambers, Software for Data Analysis (read all of it)
- Peter Siebel, Beyond Exception Handling (translated by Hadley)