December 20, 2018

Using tryCatch to write robust R code can be a bit confusing. I found the help file dry to read. There are some resources which explore tryCatch, linked below. Over the years, I have developed a few programming paradigms which I’ve repeatedly found useful. A quick introduction to tryCatch below, followed by three use-cases I use on a regular basis.

Syntax

tryCatch has a slightly complex syntax structure. However, once we understand the 4 parts which constitute a complete tryCatch call as shown below, it becomes easy to remember:

  • expr : [Required] R code(s) to be evaluated
  • error : [Optional] What should run if an error occured while evaluating the codes in expr
  • warning : [Optional] What should run if a warning occured while evaluating the codes in expr
  • finally : [Optional] What should run just before quitting the tryCatch call, irrespective of if expr ran succcessfuly, with an error, or with a warning
tryCatch(
    expr = {
        # Your code...
        # goes here...
        # ...
    },
    error = function(e){ 
        # (Optional)
        # Do this if an error is caught...
    },
    warning = function(w){
        # (Optional)
        # Do this if an warning is caught...
    },
    finally = {
        # (Optional)
        # Do this at the end before quitting the tryCatch structure...
    }
)

Hello World example

This is a toy example showing how a function can use tryCatch to handle execution.

log_calculator <- function(x){
    tryCatch(
        expr = {
            message(log(x))
            message("Successfully executed the log(x) call.")
        },
        error = function(e){
            message('Caught an error!')
            print(e)
        },
        warning = function(w){
            message('Caught an warning!')
            print(w)
        },
        finally = {
            message('All done, quitting.')
        }
    )    
}

If x is a valid number, expr and finally are executed:

log_calculator(10)
## 2.30258509299405
## Successfully executed the log(x) call.
## All done, quitting.

If x is an invalid number (negative, zero, NA), expr is attempted, and warning and finally are executed:

log_calculator(-10)
## Caught an warning!
## <simpleWarning in log(x): NaNs produced>
## All done, quitting.

If x is an invalid entry which raises an error, expr is attempted, and error and finally are executed:

log_calculator("log_me")
## Caught an error!
## <simpleError in log(x): non-numeric argument to mathematical function>
## All done, quitting.

More useful examples

Use tryCatch within loops

There are cases at work where I have quite large datasets to pre-process before model building can begin. The sources of these data can be varied and thus the quality of these data can vary. While each dataset should conform to our data quality standards (datatypes, data dictionaries, other domain-specific constraints), very often these isn’t the case. As a result, common data preprocessing functions might fail on few datasets. We can use tryCatch within the for loop to catch errors without breaking the loop.

Another toy example: Say, we have a nested dataframe of the mtcars data, nested on the cylinder numbers, and say, we had a few character values in mpg which is our response variable.

# Example nested dataframe
(df_nested <- mtcars %>% as_tibble() %>% tidyr::nest(-cyl))
## # A tibble: 3 x 2
##     cyl data              
##   <dbl> <list>            
## 1     6 <tibble [7 × 10]> 
## 2     4 <tibble [11 × 10]>
## 3     8 <tibble [14 × 10]>

df_nested$data[[2]][c(4,8),"mpg"] <- "Missing"

We wish to run a few custom preprocessors, including taking the log of mpg.

convert_gear_to_factors <- function(df){ df %>% mutate(gear = factor(gear, levels = 1:5, labels = paste0("Gear_",1:5))) }
transform_response_to_log <- function(df){ df %>% mutate(log_mpg = log(mpg)) %>% select(-mpg) }

How do we run our preprocessors over all the rows without error-ing out?

for (indx in 1:nrow(df_nested)) {
    tryCatch(
        expr = {
            df_nested[[indx, "data"]] <-  df_nested[[indx, "data"]] %>% 
                convert_gear_to_factors() %>% 
                transform_response_to_log()
            message("Iteration ", indx, " successful.")
        },
        error = function(e){
            message("* Caught an error on itertion ", indx)
            print(e)
        }
    )
}
## Iteration 1 successful.
## * Caught an error on itertion 2
## <Rcpp::eval_error in mutate_impl(.data, dots): Evaluation error: non-numeric argument to mathematical function.>
## Iteration 3 successful.

We’re able to handle the error on iteration 2, let the user know, and run the remaining iterations.

Catch issues early, log progress often

An important component of preparing ‘development’ code to be ‘production’ ready is implementation of good defensive programming and logging practices. I won’t go into details of either here, except to showcase the style of programs I have been writing to prepare code before it goes to our production cluster.

preprocess_data <- function(df, x, b, ...){
    message("-- Within preprocessor")
    df %>% 
        assertive::assert_is_data.frame() %>% 
        assertive::assert_is_non_empty()
    x %>% 
        assertive::assert_is_numeric() %>% 
        assertive::assert_all_are_greater_than(3.14)
    b %>% 
        assertive::assert_is_a_bool()
    
    # Code here...
    # ....
    # ....
    
    return(df)
}
build_model <- function(...){message("-- Building model...")}
eval_model  <- function(...) {message("-- Evaluating model...")}
save_model  <- function(...) {message("-- Saving model...")}

main_executor <- function(...){
    tryCatch(
        expr = {
            preprocess_data(df, x, b, more_args,...) %>% 
                build_model() %>% 
                eval_model() %>% 
                save_model()
        },
        error = function(e){
            message('** ERR at ', Sys.time(), " **")
            print(e)
            write_to_log_file(e, logger_level = "ERR") #Custom logging function
        },
        warning = function(w){
            message('** WARN at ', Sys.time(), " **")
            print(w)
            write_to_log_file(w, logger_level = "WARN") #Custom logging function
        },
        finally = {
            message("--- Main Executor Complete ---")
        }
    )
}

Each utility function starts with checking arguments. There are plenty of packages which allow run-time testing. My favorite one is assertive. It’s easy to read the code, and it’s pipe-able. Errors and warnings are handled using tryCatch - they are printed to the console if running in interactive mode, and then written to log files as well. I have written my own custom logging functions, but there are packages like logging and log4r which work perfectly fine.

Use tryCatch while model building

tryCatch is quite invaluable during model building. This is an actual piece of code I wrote for a kaggle competition as part of my midterm work at school. Github link here. The details of what’s going on isn’t important. At a high level, I was fitting stlf models using forecast for each shop, among 60 unique shop-ID numbers. For various reasons, for some shops, an stlf model could not be be fit, in which case a default seasonal naive model using snaive was to be used. tryCatch is a perfect way to handle such exceptions as shown below. I used a similar approach while building models at an “item” level: the number of unique items was in the 1000s; manually debugging one at a time is impossible. tryCatch allows us to programatically handle such situations.

stlf_yhats <- vector(mode = 'list', length = length(unique_shops))
for (i in seq_along(unique_shops)) {
    cat('\nProcessing shop', unique_shops[i])
    tr_data <- c6_tr %>% filter(shop_id == unique_shops[i])
    tr_data_ts <-
        dcast(
          formula = yw ~ shop_id,
          data = tr_data,
          fun.aggregate = sum,
          value.var = 'total_sales',
          fill = 0
        )
    tr_data_ts <- ts(tr_data_ts[, -1], frequency = 52)

    ##################
    # <--Look here -->
    fit <- tryCatch(
      expr = {tr_data_ts %>% stlf(lambda = 'auto')},
      error = function(e) { tr_data_ts %>% snaive()}
      )
    ##################
  
    fc <- fit %>% forecast(h = h)
    stlf_yhats[[i]] <- as.numeric(fc$mean)
    stlf_yhats[[i]] <- ifelse(stlf_yhats[[i]] < 0, 0, stlf_yhats[[i]])
}

Hope this is useful to others learning tryCatch. Cheers.

April 12, 2018

September 27, 2017

September 17, 2017
comments powered by Disqus