Dark Mode
LINEA
LINEA is an R library aimed at simplifying and accelerating the development of linear models to understand the relationship between two or more variables.
Linear models are commonly used in a variety of contexts including natural and social sciences, and various business applications (e.g. Marketing, Finance).
This page covers a basic implementation of the linea
library to analyse a time-series. We’ll cover:
We will run a simple model on some fictitious data sourced from
Google trends. The aim of this exercise will be understand what
variables seem to have an impact on the ecommerce
variable.
we start by importing linea
and some other useful
libraries. Visit this page for guidance on
installation.
library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization
The function linea::read_xcsv()
can be used to read csv
or excel files.
data_path = 'https://raw.githubusercontent.com/paladinic/data/main/ecomm_data.csv'
data = read_xcsv(file = data_path)
data %>%
datatable(rownames = NULL,
options = list(scrollX = TRUE))
As shown above, the data contains several variables including the
ecommerce
variable, other numeric variables, and a
date-type variable (i.e. date
). With this data we can start
building models to understand which variables have an impact on
ecommerce
. The linea::run_model()
function can
be used to run an OLS regression model. Some of the function’s arguments
are:
The function will return an “lm” object like the one from the
stats::lm()
function which can be inspected with the
base::summary()
function.
model = run_model(data = data,
dv = 'ecommerce',
ivs = c('covid','christmas'),
id_var = 'date')
summary(model)
##
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -34222 -5723 -106 4361 64271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56339.61 690.61 81.58 <2e-16 ***
## covid 336.41 19.61 17.16 <2e-16 ***
## christmas 383.15 30.65 12.50 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8947 on 258 degrees of freedom
## Multiple R-squared: 0.6339, Adjusted R-squared: 0.6311
## F-statistic: 223.4 on 2 and 258 DF, p-value: < 2.2e-16
Models can be inspected visually using the
linea::decomping()
function. The model object will
automatically run this function under the hood, but it is important to
understand what is going on. Some of the function’s arguments are:
decomposition = model %>% decomping()
print(names(decomposition))
## [1] "category_decomp" "variable_decomp" "fitted_values"
The decomposition object is a list of 3 data frames. These can be
viewed directly using the functions linea::fit_chart()
and
linea::decomp_chart()
.
The first 2, variable_decomp and category_decomp, capture the role of individual variables in the model (categories can be set to group variables).
decomposition$variable_decomp %>%
datatable(rownames = NULL,
options = list(scrollX = T))
The linea::decomp_chart()
function can be used to
display a stacked bar chart of the decomposition.
model %>%
decomp_chart()
The fitted_values dataframe instead contains the dependent variable (actual), the model prediction (model), and the error (residual).
decomposition$fitted_values %>%
datatable(rownames = NULL,
options = list(scrollX = T))
The linea::fit_chart()
function can be used to display a
line chart of the Prediction, Actual, and Error.
model %>%
fit_chart()
The linea::acf_chart()
and
linea::resid_hist_chart()
functions can be used to assess
your model as per the assumptions of linear regression:
Using the linea::acf_chart()
function we can visualize
the ACF,
which helps us detect Autocorrelation.
model %>%
acf_chart()
Using the linea::resid_hist_chart()
function we can
visualize the distribution on residuals, which helps us detect
Residual Normality.
model %>%
resid_hist_chart()
Using the linea::response_curves()
function we can
visualize the relationship between the independent variables and the
dependent variable.
model %>%
response_curves(interval = 0.1)
The Advanced Features
page shows how to implement the features of linea
that
allow users to capture non-linear relationships.
The Additional Features page illustrates all other functions of the library.