Dark Mode
LINEA
One of LINEA’s main advantages is the simplicity with which it can capture non-linear relations. Capturing non-linear relations is fundamental when applying regression as these relationships are more realistic representations of the real world.
This page covers:
linea
’s default transformationsWe will run a simple model on some fictitious data sourced from
Google trends to understand what variables seem to have an impact on the
ecommerce
variable.
We start by importing linea
, some other useful
libraries, and some data.
library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization
data_path = 'https://raw.githubusercontent.com/paladinic/data/main/sales_ts.csv'
data = read_xcsv(file = data_path)
data = data %>%
get_seasonality(date_col_name = 'week',date_type = 'weekly starting') %>%
gt_f(kw = 'bitcoin',date_col = 'week')
data %>%
datatable(rownames = NULL,
options = list(scrollX = TRUE))
linea
provides a few default transformations
meant to capture non-linear relationships in the data:
The linea::decay()
function applies a decay by adding to
each data point a percentage of the previous. This transformation is
meant to capture the impact, over time, of an event. This function only
makes sense on time-bound models.
raw_variable = data$vod_spend
dates = data$week
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(y = decay(raw_variable, decay = 0.5),
x = dates,
name = 'transformed: decay 50%') %>%
add_lines(y = decay(raw_variable, decay = 0.75),
x = dates,
name = 'transformed: decay 75%') %>%
add_lines(y = decay(raw_variable, decay = 0.95),
x = dates,
name = 'transformed: decay 95%') %>%
layout(title = 'decay',
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
The linea::diminish()
function applies a negative
exponential function:
\[\ 1 - e^{-v/m} \]
or..
\[\ 1- \frac{1}{e^{v/m}} \] Where
v
is the vector to be transformed and m
defines the shape of the transformation. Here is a visualization of the
transformation.
raw_variable = data$gtrends_bitcoin
dates = data$week[!is.na(raw_variable)]
raw_variable = raw_variable[!is.na(raw_variable)]
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = diminish(raw_variable, m = 0.3, abs = F),
x = dates,
name = 'transformed: diminish 30%',
yaxis = "y2"
) %>%
layout(title = 'diminish',
yaxis2 = list(overlaying = "y",
showgrid = F,
side = "right"),
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
This transformation can also be visualized by placing the raw and transformed variable on the horizontal and vertical axis.
plot_ly() %>%
add_lines(
x = raw_variable,
y = diminish(raw_variable,.25,F),
name = 'diminish 25%',
line = list(shape = "spline")
) %>%
add_lines(
x = raw_variable,
y = diminish(raw_variable,.5,F),
name = 'diminish 50%',
line = list(shape = "spline")
) %>%
add_lines(
line = list(shape = "spline"),
x = raw_variable,
y = diminish(raw_variable,.75,F),
name = 'diminish 75%'
) %>%
layout(title = 'raw vs. diminished',
yaxis = list(title = 'diminished'),
xaxis = list(showgrid = F,title = 'raw'),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
The linea::hill_function()
function applies a similar
transformation to linea::diminish()
as it captures
diminishing returns. The function requires for more inputs though, and
allows to generate a s-curve.
\[\ 1 - \frac{k^m}{k^m + v^m}\]
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = hill_function(raw_variable, m = 5,k = 50),
x = dates,
name = 'transformed: hill_function m = 5,k = 50',
yaxis = "y2"
) %>%
layout(title = 'diminish',
yaxis2 = list(overlaying = "y",
showgrid = F,
side = "right"),
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
This transformation can also be visualized by placing the raw and
transformed variable on the horizontal and vertical axis. The charts
below also illustrate the impact of changing the functions parameters:
k
and m
.
plot_ly() %>%
add_lines(
line = list(shape = "spline"),
x = raw_variable,
y = hill_function(raw_variable,m = 1,k = 50),
name = 'm = 1,k = 50'
) %>%
add_lines(
line = list(shape = "spline"),
x = raw_variable,
y = hill_function(raw_variable,m = 2,k = 50),
name = 'm = 2,k = 50'
) %>%
add_lines(
line = list(shape = "spline"),
x = raw_variable,
y = hill_function(raw_variable,m = 5,k = 50),
name = 'm = 5,k = 50'
) %>%
layout(title = 'raw vs. hill_function (m)',
yaxis = list(title = 'diminished'),
xaxis = list(showgrid = F,title = 'raw'),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
plot_ly() %>%
add_lines(
line = list(shape = "spline"),
x = raw_variable,
y = hill_function(raw_variable,m = 5,k = 25),
name = 'm = 5,k = 25'
) %>%
add_lines(
line = list(shape = "spline"),
x = raw_variable,
y = hill_function(raw_variable,m = 5,k = 50),
name = 'm = 5,k = 50'
) %>%
add_lines(
line = list(shape = "spline"),
x = raw_variable,
y = hill_function(raw_variable,m = 5,k = 75),
name = 'm = 5,k = 75'
) %>%
layout(title = 'raw vs. hill_function (k)',
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
The linea::lag()
function applies a lag to the data.
This transformation is meant to capture relationships that are lagged in
time. This function only makes sense on time-bound models.
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = linea::lag(raw_variable, l = 5),
x = dates,
name = 'transformed: lag 5',
) %>%
add_lines(
y = linea::lag(raw_variable, l = 10),
x = dates,
name = 'transformed: lag 10',
) %>%
add_lines(
y = linea::lag(raw_variable, l = 20),
x = dates,
name = 'transformed: lag 20',
) %>%
layout(plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)",
title = 'lag',
xaxis = list(showgrid = F))
The linea::ma()
function applies a moving average to the
data. This transformation is meant to capture relationships that are
smoothed over time. This function only makes sense on time-bound
models.
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = ma(raw_variable, width = 5),
x = dates,
name = 'transformed: ma 5',
) %>%
add_lines(
y = ma(raw_variable, width = 15),
x = dates,
name = 'transformed: ma 15',
) %>%
add_lines(
y = ma(raw_variable, width = 25),
x = dates,
name = 'transformed: ma 25',
) %>%
add_lines(
y = ma(raw_variable, width = 25,align = 'left'),
x = dates,
name = 'transformed: lag 25 left',
) %>%
add_lines(
y = ma(raw_variable, width = 25,align = 'right'),
x = dates,
name = 'transformed: lag 25 right',
) %>%
layout(plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)",
xaxis = list(showgrid = F),
title='ma')
linea
can capture non-linear relationships by applying
transformations to the raw data, and then generating the regression for
the transformed data. This can be accomplished using a model table which
specifies each variable’s transformation parameters. The function
linea::build_model_table()
can be used to generate the
blank model table.
ivs = c('vod_spend','relative_price','promotions','trend')
model_table = build_model_table(ivs = ivs)
model_table %>%
datatable(rownames = NULL,
options = list(scrollX = T,
dom = "t"))
The model table can be written as a CSV or Excel and modified outside
of R, or using dplyr as shown below. In this example the model run will
apply the linea::diminish()
function (with a parameter of
0.5, to the “covid” variable.
model_table = model_table %>%
mutate(hill = if_else(variable == 'vod_spend','10000,1',hill)) %>%
mutate(decay = if_else(variable == 'vod_spend','.5',decay))
model_table %>%
datatable(rownames = NULL,
options = list(scrollX = T,
dom = "t"))
The model table can be used as an input in the
linea::run_model()
function. The
linea::response_curves()
function will display the
non-linear relationship captured by the model.
dv = 'sales'
model = run_model(data = data,
dv = dv,
model_table = model_table)
model %>%
response_curves(
x_max = 1e5,
x_min = 0,
y_max = 1e5,
y_min = -1e5,
interval = 1
)
The default transformations cover an extensive range of non-linear
relationships, but linea
allows users to input their own
transformations through the trans_df
. The
trans_df
is effectively a table mapping functions,
expressed in R, to their name, and order of execution.
trans_df = default_trans_df()
trans_df %>%
datatable(rownames = NULL,
options = list(scrollX = T,
dom = "t"))
In the example below, the function base::sin(x*a)
is
added to the default transformations as sin_func
. The
parameters that can be passed to the transformations need to be
expressed as letters starting starting from a, b, c and so on…
trans_df = default_trans_df() %>%
rbind(c('sin_func',FALSE,'sin(x*a)',5))
trans_df %>%
datatable(rownames = NULL,
options = list(scrollX = T,
dom = "t"))
This trans_df
can now be used to generate a model table
and run models.
model_table = build_model_table(ivs = ivs,
trans_df = trans_df) %>%
mutate(sin_func = if_else(variable == 'trend','5e-2',''))
model_table %>%
datatable(rownames = NULL)
model = run_model(data = data,
dv = dv,
model_table = model_table,
trans_df = trans_df,
verbose = T)
## name ts func order
## 1 hill FALSE linea::hill_function(x,a,b) 1
## 2 decay TRUE linea::decay(x,a) 2
## 3 lag TRUE linea::lag(x,a) 3
## 4 ma TRUE linea::ma(x,a) 4
## 5 sin_func FALSE sin(x*a) 5
model %>%
response_curves(
verbose = T,
interval = 1,
x_max = 1500,
x_min = -1500
)
Similarly to the linea::what_next()
function, described
in the Additional Features
page, linea
has functions to run multiple models from
specified combinations of variables and transformations:
what_trans()
what_combo()
To find the right parameters for the non-linear relationship, the
function linea::what_trans()
can be used to run multiple
models with a range of parameters. If parameters are passed for multiple
transformations, the function will run models for all combinations. The
inputs for this function are:
trans_df
) specifying the values of the
parametersIn this case, the trans_df
can must contain the
parameters to be tested for each transformations, separated by a
comma:
trans_df = data.frame(
name = c('diminish', 'decay', 'lag', 'ma'),
func = c(
'linea::diminish(x,a)',
'linea::decay(x,a)',
'linea::lag(x,a)',
'linea::ma(x,a)'
),
order = 1:4,
val = c('0.5,10,100,1000,10000','0,0.5,0.8','','')
)
trans_df %>%
datatable(rownames = NULL)
Once the trans_df
is ready, it can be passed to the
linea::what_trans()
function, to return the table of
results of all combinations.
model %>%
what_trans(trans_df = trans_df,
variable ='display_spend') %>%
datatable(rownames = NULL)
When modelling, testing one variable at the time can be time consuming and inconclusive. For this reason it is useful to be able to test wider ranges of models that span across different variables and transformations.
Using a similar set of transformations as before, here we need to specify the possible parameter values for each function, for each variable.
trans_df = data.frame(
name = c('diminish', 'decay', 'hill', 'exp'),
func = c(
'linea::diminish(x,a)',
'linea::decay(x,a)',
"linea::hill_function(x,a,b)",
'(x^a)'
),
order = 1:4
) %>%
mutate(display_spend = if_else(condition = name == 'hill',
'(1,50,100),(1,5)',
'')) %>%
mutate(display_spend = if_else(condition = name == 'decay',
'0,.1,.7 ',
display_spend)) %>%
mutate(vod_spend = if_else(condition = name == 'hill',
'(1,50,100),(1,5)',
'')) %>%
mutate(vod_spend = if_else(condition = name == 'decay',
'0,.1,.7 ',
vod_spend))
trans_df %>%
datatable(rownames = NULL)
We can now use that to test the specified combinations with
linea::what_combo
. Due to the complexity of the
combinations, across transformations, parameters, and variables, the
results are stored in a list of data frames.
combinations = what_combo(model = model,trans_df = trans_df)
names(combinations)
## [1] "results" "trans_parameters" "long_trans_df" "variables"
## [5] "model"
combinations$results %>%
datatable(rownames = NULL)
combinations$trans_parameters
## $display_spend
## decay_a hill_a hill_b variable
## 1 0.0 1 1 display_spend
## 2 0.1 1 1 display_spend
## 3 0.7 1 1 display_spend
## 4 0.0 50 1 display_spend
## 5 0.1 50 1 display_spend
## 6 0.7 50 1 display_spend
## 7 0.0 100 1 display_spend
## 8 0.1 100 1 display_spend
## 9 0.7 100 1 display_spend
## 10 0.0 1 5 display_spend
## 11 0.1 1 5 display_spend
## 12 0.7 1 5 display_spend
## 13 0.0 50 5 display_spend
## 14 0.1 50 5 display_spend
## 15 0.7 50 5 display_spend
## 16 0.0 100 5 display_spend
## 17 0.1 100 5 display_spend
## 18 0.7 100 5 display_spend
##
## $vod_spend
## decay_a hill_a hill_b variable
## 1 0.0 1 1 vod_spend
## 2 0.1 1 1 vod_spend
## 3 0.7 1 1 vod_spend
## 4 0.0 50 1 vod_spend
## 5 0.1 50 1 vod_spend
## 6 0.7 50 1 vod_spend
## 7 0.0 100 1 vod_spend
## 8 0.1 100 1 vod_spend
## 9 0.7 100 1 vod_spend
## 10 0.0 1 5 vod_spend
## 11 0.1 1 5 vod_spend
## 12 0.7 1 5 vod_spend
## 13 0.0 50 5 vod_spend
## 14 0.1 50 5 vod_spend
## 15 0.7 50 5 vod_spend
## 16 0.0 100 5 vod_spend
## 17 0.1 100 5 vod_spend
## 18 0.7 100 5 vod_spend
Using the function linea::run_combo_model()
you can run
and visualize individual models within the combinations, by speci
combinations %>%
run_combo_model(results_row = 1) %>%
response_curves(
x_min = 0,
x_max = 1500
)
The Getting Started page
is a good place to start learning how to build linear models with
linea
.
The Additional Features page illustrates all other functions of the library.