Getting Started


Dark Mode

Fork it

LINEA is an R library aimed at simplifying and accelerating the development of linear models to understand the relationship between two or more variables.

Linear models are commonly used in a variety of contexts including natural and social sciences, and various business applications (e.g. Marketing, Finance).

This page covers a basic implementation of the linea library to analyse a time-series. We’ll cover:

We will run a simple model on some fictitious data sourced from Google trends. The aim of this exercise will be understand what variables seem to have an impact on the ecommerce variable.

we start by importing linea and some other useful libraries. Visit this page for guidance on installation.

library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization

Data Ingestion

The function linea::read_xcsv() can be used to read csv or excel files.

data_path = 'https://raw.githubusercontent.com/paladinic/data/main/ecomm_data.csv'
data = read_xcsv(file = data_path)
data %>%
  datatable(rownames = NULL,
            options = list(scrollX = TRUE))

First Model

As shown above, the data contains several variables including the ecommerce variable, other numeric variables, and a date-type variable (i.e. date). With this data we can start building models to understand which variables have an impact on ecommerce. The linea::run_model() function can be used to run an OLS regression model. Some of the function’s arguments are:

  • dv: dependent variable name
  • ivs: independent variable name (character vector)
  • data: data with variables (data.frame)

The function will return an “lm” object like the one from the stats::lm() function which can be inspected with the base::summary() function.

model = run_model(data = data,
                  dv = 'ecommerce',
                  ivs = c('covid','christmas'),
                  id_var = 'date')

summary(model)
## 
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -34222  -5723   -106   4361  64271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 56339.61     690.61   81.58   <2e-16 ***
## covid         336.41      19.61   17.16   <2e-16 ***
## christmas     383.15      30.65   12.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8947 on 258 degrees of freedom
## Multiple R-squared:  0.6339, Adjusted R-squared:  0.6311 
## F-statistic: 223.4 on 2 and 258 DF,  p-value: < 2.2e-16

Visualisation

Models can be inspected visually using the linea::decomping() function. The model object will automatically run this function under the hood, but it is important to understand what is going on. Some of the function’s arguments are:

  • model: the model (lm) object
  • id_var: the name of the id variable (e.g. “date”, “sample id”, “customer id”; optional)
  • raw_data: the data containing modeled and id variable (optional)
decomposition = model %>% decomping()

print(names(decomposition))
## [1] "category_decomp" "variable_decomp" "fitted_values"

The decomposition object is a list of 3 data frames. These can be viewed directly using the functions linea::fit_chart() and linea::decomp_chart().


Decomposition

The first 2, variable_decomp and category_decomp, capture the role of individual variables in the model (categories can be set to group variables).

decomposition$variable_decomp %>%
  datatable(rownames = NULL,
            options = list(scrollX = T))

The linea::decomp_chart() function can be used to display a stacked bar chart of the decomposition.

model %>%
  decomp_chart()


Prediction, Actual, Error

The fitted_values dataframe instead contains the dependent variable (actual), the model prediction (model), and the error (residual).

decomposition$fitted_values %>%
  datatable(rownames = NULL,
            options = list(scrollX = T))

The linea::fit_chart() function can be used to display a line chart of the Prediction, Actual, and Error.

model %>%
  fit_chart()


Diagnostic Charts

The linea::acf_chart() and linea::resid_hist_chart() functions can be used to assess your model as per the assumptions of linear regression:

  • Linear relationship
  • Residual Normality
  • Autocorrelation
  • Heteroscedasticity
  • Multi-collinearity

Using the linea::acf_chart() function we can visualize the ACF, which helps us detect Autocorrelation.

model %>%
  acf_chart()

Using the linea::resid_hist_chart() function we can visualize the distribution on residuals, which helps us detect Residual Normality.

model %>%
  resid_hist_chart()


Response Curves

Using the linea::response_curves() function we can visualize the relationship between the independent variables and the dependent variable.

model %>%
  response_curves(interval = 0.1)


Next Steps

  1. The Advanced Features page shows how to implement the features of linea that allow users to capture non-linear relationships.

  2. The Additional Features page illustrates all other functions of the library.