Dark Mode

Fork it

Advanced Features

One of LINEA’s main advantages is the simplicity with which it can capture non-linear relations. Capturing non-linear relations is fundamental when applying regression as these relationships are more realistic representations of the real world.

This page covers:

We will run a simple model on some fictitious data sourced from Google trends to understand what variables seem to have an impact on the ecommerce variable.

We start by importing linea, some other useful libraries, and some data.

library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization

data_path = 'https://raw.githubusercontent.com/paladinic/data/main/sales_ts.csv'

data = read_xcsv(file = data_path)
data = data %>% 
  get_seasonality(date_col_name = 'week',date_type = 'weekly starting') %>% 
  gt_f(kw = 'bitcoin',date_col = 'week')

data %>%
  datatable(rownames = NULL,
            options = list(scrollX = TRUE))

Non-linear Transformations

linea provides a few default transformations meant to capture non-linear relationships in the data:

  • Decay
  • Diminish
  • Hill
  • Lag
  • Moving Average


The linea::decay() function applies a decay by adding to each data point a percentage of the previous. This transformation is meant to capture the impact, over time, of an event. This function only makes sense on time-bound models.

raw_variable = data$vod_spend
dates = data$week

plot_ly() %>%
  add_lines(y = raw_variable, x = dates, name = 'raw') %>%
  add_lines(y = decay(raw_variable, decay = 0.5),
            x = dates,
            name = 'transformed: decay 50%') %>%
  add_lines(y = decay(raw_variable, decay = 0.75),
            x = dates,
            name = 'transformed: decay 75%') %>%
  add_lines(y = decay(raw_variable, decay = 0.95),
            x = dates,
            name = 'transformed: decay 95%') %>%
  layout(title = 'decay',
         xaxis = list(showgrid = F),
         plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)")


The linea::diminish() function applies a negative exponential function:

\[\ 1 - e^{-v/m} \]


\[\ 1- \frac{1}{e^{v/m}} \] Where v is the vector to be transformed and m defines the shape of the transformation. Here is a visualization of the transformation.

raw_variable = data$gtrends_bitcoin
dates = data$week[!is.na(raw_variable)]
raw_variable = raw_variable[!is.na(raw_variable)]

plot_ly() %>%
  add_lines(y = raw_variable, x = dates, name = 'raw') %>%
    y = diminish(raw_variable, m = 0.3, abs = F),
    x = dates,
    name = 'transformed: diminish 30%',
    yaxis = "y2"
  ) %>%
  layout(title = 'diminish',
         yaxis2 = list(overlaying = "y",
                       showgrid = F,
                       side = "right"), 
         xaxis = list(showgrid = F),
         plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)")

This transformation can also be visualized by placing the raw and transformed variable on the horizontal and vertical axis.

plot_ly() %>% 
      x = raw_variable,
      y = diminish(raw_variable,.25,F),
      name = 'diminish 25%',
      line = list(shape = "spline")
    ) %>%   
      x = raw_variable,
      y = diminish(raw_variable,.5,F),
      name = 'diminish 50%',
      line = list(shape = "spline")
    ) %>% 
      line = list(shape = "spline"),
      x = raw_variable,
      y = diminish(raw_variable,.75,F),
      name = 'diminish 75%'
    ) %>% 
  layout(title = 'raw vs. diminished', 
         yaxis = list(title = 'diminished'),
         xaxis = list(showgrid = F,title = 'raw'),
         plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)")


The linea::hill_function() function applies a similar transformation to linea::diminish() as it captures diminishing returns. The function requires for more inputs though, and allows to generate a s-curve.

\[\ 1 - \frac{k^m}{k^m + v^m}\]

plot_ly() %>%
  add_lines(y = raw_variable, x = dates, name = 'raw') %>%
    y = hill_function(raw_variable, m = 5,k = 50),
    x = dates,
    name = 'transformed: hill_function m = 5,k = 50',
    yaxis = "y2"
  ) %>%
  layout(title = 'diminish',
         yaxis2 = list(overlaying = "y",
                       showgrid = F,
                       side = "right"), 
         xaxis = list(showgrid = F),
         plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)")

This transformation can also be visualized by placing the raw and transformed variable on the horizontal and vertical axis. The charts below also illustrate the impact of changing the functions parameters: k and m.

plot_ly() %>% 
      line = list(shape = "spline"),
      x = raw_variable,
      y = hill_function(raw_variable,m = 1,k = 50),
      name = 'm = 1,k = 50'
    ) %>% 
      line = list(shape = "spline"),
      x = raw_variable,
      y = hill_function(raw_variable,m = 2,k = 50),
      name = 'm = 2,k = 50'
    ) %>% 
      line = list(shape = "spline"),
      x = raw_variable,
      y = hill_function(raw_variable,m = 5,k = 50),
      name = 'm = 5,k = 50'
    ) %>% 
  layout(title = 'raw vs. hill_function (m)', 
         yaxis = list(title = 'diminished'),
         xaxis = list(showgrid = F,title = 'raw'),
         plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)")
plot_ly() %>% 
      line = list(shape = "spline"),
      x = raw_variable,
      y = hill_function(raw_variable,m = 5,k = 25),
      name = 'm = 5,k = 25'
    ) %>% 
      line = list(shape = "spline"),
      x = raw_variable,
      y = hill_function(raw_variable,m = 5,k = 50),
      name = 'm = 5,k = 50'
    ) %>% 
      line = list(shape = "spline"),
      x = raw_variable,
      y = hill_function(raw_variable,m = 5,k = 75),
      name = 'm = 5,k = 75'
    ) %>% 
  layout(title = 'raw vs. hill_function (k)', 
         xaxis = list(showgrid = F),
         plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)")


The linea::lag() function applies a lag to the data. This transformation is meant to capture relationships that are lagged in time. This function only makes sense on time-bound models.

plot_ly() %>% 
  add_lines(y = raw_variable, x = dates, name = 'raw') %>%
    y = linea::lag(raw_variable, l = 5),
    x = dates,
    name = 'transformed: lag 5',
  ) %>%
    y = linea::lag(raw_variable, l = 10),
    x = dates,
    name = 'transformed: lag 10',
  ) %>%
    y = linea::lag(raw_variable, l = 20),
    x = dates,
    name = 'transformed: lag 20',
  )  %>% 
  layout(plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)",
         title = 'lag',
         xaxis = list(showgrid = F))

Moving Average

The linea::ma() function applies a moving average to the data. This transformation is meant to capture relationships that are smoothed over time. This function only makes sense on time-bound models.

plot_ly() %>% 
  add_lines(y = raw_variable, x = dates, name = 'raw') %>%
    y = ma(raw_variable, width = 5),
    x = dates,
    name = 'transformed: ma 5',
  ) %>%
    y = ma(raw_variable, width = 15),
    x = dates,
    name = 'transformed: ma 15',
  ) %>% 
    y = ma(raw_variable, width = 25),
    x = dates,
    name = 'transformed: ma 25',
  ) %>% 
    y = ma(raw_variable, width = 25,align = 'left'),
    x = dates,
    name = 'transformed: lag 25 left',
  ) %>%
    y = ma(raw_variable, width = 25,align = 'right'),
    x = dates,
    name = 'transformed: lag 25 right',
  ) %>% 
  layout(plot_bgcolor  = "rgba(0, 0, 0, 0)",
         paper_bgcolor = "rgba(0, 0, 0, 0)",
         xaxis = list(showgrid = F),

Non-linear Models

linea can capture non-linear relationships by applying transformations to the raw data, and then generating the regression for the transformed data. This can be accomplished using a model table which specifies each variable’s transformation parameters. The function linea::build_model_table() can be used to generate the blank model table.

ivs =  c('vod_spend','relative_price','promotions','trend')

model_table = build_model_table(ivs =  ivs)

model_table %>%
  datatable(rownames = NULL,
            options = list(scrollX = T,
                           dom = "t"))

The model table can be written as a CSV or Excel and modified outside of R, or using dplyr as shown below. In this example the model run will apply the linea::diminish() function (with a parameter of 0.5, to the “covid” variable.

model_table = model_table %>%
  mutate(hill = if_else(variable ==  'vod_spend','10000,1',hill)) %>% 
  mutate(decay = if_else(variable ==  'vod_spend','.5',decay))

model_table %>%
  datatable(rownames = NULL,
            options = list(scrollX = T,
                           dom = "t"))

The model table can be used as an input in the linea::run_model() function. The linea::response_curves() function will display the non-linear relationship captured by the model.

dv = 'sales'

model = run_model(data = data,
                  dv = dv,
                  model_table = model_table)

model %>% 
    x_max = 1e5,
    x_min = 0,
    y_max = 1e5,
    y_min = -1e5,
    interval = 1

Custom Transformations

The default transformations cover an extensive range of non-linear relationships, but linea allows users to input their own transformations through the trans_df. The trans_df is effectively a table mapping functions, expressed in R, to their name, and order of execution.

trans_df = default_trans_df()

trans_df %>%
  datatable(rownames = NULL,
            options = list(scrollX = T,
                           dom = "t"))

In the example below, the function base::sin(x*a) is added to the default transformations as sin_func. The parameters that can be passed to the transformations need to be expressed as letters starting starting from a, b, c and so on…

trans_df = default_trans_df() %>% 

trans_df %>%
  datatable(rownames = NULL,
            options = list(scrollX = T,
                           dom = "t"))

This trans_df can now be used to generate a model table and run models.

model_table = build_model_table(ivs = ivs,
                                trans_df = trans_df) %>% 
  mutate(sin_func = if_else(variable == 'trend','5e-2',''))

model_table %>% 
  datatable(rownames = NULL)
model = run_model(data = data,
                  dv = dv,
                  model_table = model_table,
                  trans_df = trans_df,
                  verbose = T)
##       name    ts                        func order
## 1     hill FALSE linea::hill_function(x,a,b)     1
## 2    decay  TRUE           linea::decay(x,a)     2
## 3      lag  TRUE             linea::lag(x,a)     3
## 4       ma  TRUE              linea::ma(x,a)     4
## 5 sin_func FALSE                    sin(x*a)     5
model %>% 
    verbose = T,
    interval = 1,
    x_max = 1500,
    x_min = -1500

Advanced Testing

Similarly to the linea::what_next() function, described in the Additional Features page, linea has functions to run multiple models from specified combinations of variables and transformations:

  • what_trans()
  • what_combo()

Parameter Tuning

To find the right parameters for the non-linear relationship, the function linea::what_trans() can be used to run multiple models with a range of parameters. If parameters are passed for multiple transformations, the function will run models for all combinations. The inputs for this function are:

  • a starting model
  • a variable name from the data
  • a table (i.e. trans_df) specifying the values of the parameters

In this case, the trans_df can must contain the parameters to be tested for each transformations, separated by a comma:

trans_df = data.frame(
  name = c('diminish', 'decay', 'lag', 'ma'),
  func = c(
  order = 1:4,
  val = c('0.5,10,100,1000,10000','0,0.5,0.8','','')

trans_df %>% 
  datatable(rownames = NULL)

Once the trans_df is ready, it can be passed to the linea::what_trans() function, to return the table of results of all combinations.

model %>% 
  what_trans(trans_df = trans_df,
             variable ='display_spend') %>% 
  datatable(rownames = NULL)

All Combinations

When modelling, testing one variable at the time can be time consuming and inconclusive. For this reason it is useful to be able to test wider ranges of models that span across different variables and transformations.

Using a similar set of transformations as before, here we need to specify the possible parameter values for each function, for each variable.

trans_df = data.frame(
 name = c('diminish', 'decay', 'hill', 'exp'),
 func = c(
 order = 1:4
) %>%
 mutate(display_spend = if_else(condition = name == 'hill',
                                              '')) %>%
 mutate(display_spend = if_else(condition = name == 'decay',
                                             '0,.1,.7 ',
                                display_spend)) %>%
 mutate(vod_spend = if_else(condition = name == 'hill',
                                              '')) %>%
 mutate(vod_spend = if_else(condition = name == 'decay',
                                             '0,.1,.7 ',

trans_df %>% 
  datatable(rownames = NULL)

We can now use that to test the specified combinations with linea::what_combo. Due to the complexity of the combinations, across transformations, parameters, and variables, the results are stored in a list of data frames.

combinations = what_combo(model = model,trans_df = trans_df)

## [1] "results"          "trans_parameters" "long_trans_df"    "variables"       
## [5] "model"
combinations$results %>% 
  datatable(rownames = NULL)
## $display_spend
##    decay_a hill_a hill_b      variable
## 1      0.0      1      1 display_spend
## 2      0.1      1      1 display_spend
## 3      0.7      1      1 display_spend
## 4      0.0     50      1 display_spend
## 5      0.1     50      1 display_spend
## 6      0.7     50      1 display_spend
## 7      0.0    100      1 display_spend
## 8      0.1    100      1 display_spend
## 9      0.7    100      1 display_spend
## 10     0.0      1      5 display_spend
## 11     0.1      1      5 display_spend
## 12     0.7      1      5 display_spend
## 13     0.0     50      5 display_spend
## 14     0.1     50      5 display_spend
## 15     0.7     50      5 display_spend
## 16     0.0    100      5 display_spend
## 17     0.1    100      5 display_spend
## 18     0.7    100      5 display_spend
## $vod_spend
##    decay_a hill_a hill_b  variable
## 1      0.0      1      1 vod_spend
## 2      0.1      1      1 vod_spend
## 3      0.7      1      1 vod_spend
## 4      0.0     50      1 vod_spend
## 5      0.1     50      1 vod_spend
## 6      0.7     50      1 vod_spend
## 7      0.0    100      1 vod_spend
## 8      0.1    100      1 vod_spend
## 9      0.7    100      1 vod_spend
## 10     0.0      1      5 vod_spend
## 11     0.1      1      5 vod_spend
## 12     0.7      1      5 vod_spend
## 13     0.0     50      5 vod_spend
## 14     0.1     50      5 vod_spend
## 15     0.7     50      5 vod_spend
## 16     0.0    100      5 vod_spend
## 17     0.1    100      5 vod_spend
## 18     0.7    100      5 vod_spend

Using the function linea::run_combo_model() you can run and visualize individual models within the combinations, by speci

combinations %>% 
  run_combo_model(results_row = 1) %>% 
    x_min = 0,
    x_max = 1500

Next Steps

  1. The Getting Started page is a good place to start learning how to build linear models with linea.

  2. The Additional Features page illustrates all other functions of the library.