LINEA

Home Getting Started Additional Features Advanced Features

An open-source solution by Linea Analytics

Dark Mode

Linea-Analytics provides a measurement platform or full service measurement to:

Connect to your data
Run the statistical analysis
Measure true marketing impact
Optimise in media channel & cross media channel to generate value

To provide an open access forum for brands, agencies, publishers or students to understand & test how the relationship between two or more variables work we created Linea’s open source (frequentist) OLS library: LINEA

This Page

This page covers a basic how to setup the linea library to analyse a time-series. We’ll cover:

Prerequisites: Prerequisites of using the library
Installation: Installing and checking the library
Quick Start: What linea can do

Prerequisites

To use this library an understanding of the following is assumed:

Installation

The library can be installed GitHub using devtools::install_github('linea-analytics/linea'). We’ll soon be available on CRAN as well. Once installed you can check the installation.

# devtools::install_github('linea-analytics/linea')
print(packageVersion("linea"))

## [1] '0.1.2'

Quick Start

The linea library works well with pipes. Used with dplyr and plotly, it can perform data analysis and visualization with elegant code. Let’s build a quick model to illustrate what linea can do.

Import Data

We start by importing linea, some other useful libraries, and some data.

# librarise
library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization

# fictitious ecommerce data
data_path = 'https://raw.githubusercontent.com/paladinic/data/refs/heads/main/ecomm_data.csv'

# importing flat file
data = read_xcsv(file = data_path)

# adding seasonality and Google trends variables
data = data |> 
  get_seasonality(date_col_name = 'date',date_type = 'weekly starting')

# visualize data
data |> 
  datatable(rownames = NULL,
            options = list(scrollX = TRUE))

Run Models

Now lets build a model to understand what drives changes in the ecommerce variable. We can start by selecting a few initial independent variables (i.e. christmas,black.friday,trend,gtrends_prime day)

model = run_model(data = data,
                  dv = 'ecommerce',
                  ivs = c('christmas','black.friday',"trend"),
                  id_var = 'date')

## [1] "actual:"
## [1] 261
## [1] "pred:"
## [1] 261
## [1] "resid:"
## [1] 261
## [1] "id_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_2:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_3:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261

summary(model)

## 
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20462  -4664   -741   2988  54502 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  44108.048    975.172  45.231  < 2e-16 ***
## christmas      294.052     27.219  10.803  < 2e-16 ***
## black.friday   317.203     40.339   7.863 1.03e-13 ***
## trend          130.445      6.308  20.680  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7666 on 257 degrees of freedom
## Multiple R-squared:  0.7323, Adjusted R-squared:  0.7291 
## F-statistic: 234.3 on 3 and 257 DF,  p-value: < 2.2e-16

Our next steps can be guided by functions like what_next(), which will test all other variables in our data. From the output below, it seems like the variables covid and offline_media would improve the model most.

model |> 
  what_next()

## # A tibble: 84 × 6
##    variable      adj_R2 t_stat       coef   vif adj_R2_diff
##    <chr>          <dbl>  <dbl>      <dbl> <dbl>       <dbl>
##  1 offline_media  0.821  11.5        6.50  1.12      0.126 
##  2 year_2020      0.796   9.26   12081.    1.59      0.0921
##  3 covid          0.795   9.12     188.    1.98      0.0901
##  4 year_2019      0.762  -6.09   -7043.    1.07      0.0457
##  5 christmas_eve  0.759  -5.75 -168934.    1.65      0.0412
##  6 week_num_48    0.753   5.09   21389.    1.21      0.0328
##  7 christmas_day  0.750  -4.79 -135781.    1.48      0.0292
##  8 week_num_52    0.748  -4.51  -21135.    1.48      0.0260
##  9 promo          0.740   3.48       5.50  1.07      0.0154
## 10 year_2021      0.738  -3.11   -7264.    1.19      0.0121
## # ℹ 74 more rows

Adding these variables to model brings the adjusted R squared above 80%.

model = run_model(data = data,
                  dv = 'ecommerce',
                  ivs = c('christmas','black.friday','trend','covid','offline_media'),
                  id_var = 'date')

## [1] "actual:"
## [1] 261
## [1] "pred:"
## [1] 261
## [1] "resid:"
## [1] 261
## [1] "id_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_2:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_3:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261

summary(model)

## 
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21204.4  -3193.5   -874.6   2639.9  20486.7 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.819e+04  7.743e+02  62.228  < 2e-16 ***
## christmas     2.736e+02  1.978e+01  13.831  < 2e-16 ***
## black.friday  2.620e+02  2.969e+01   8.825  < 2e-16 ***
## trend         8.150e+01  6.378e+00  12.778  < 2e-16 ***
## covid         1.482e+02  1.737e+01   8.534 1.28e-15 ***
## offline_media 5.602e+00  5.093e-01  11.000  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5506 on 255 degrees of freedom
## Multiple R-squared:  0.863,  Adjusted R-squared:  0.8603 
## F-statistic: 321.2 on 5 and 255 DF,  p-value: < 2.2e-16

Generate Insights

Now that we have a decent model we can start extracting insights from it. We can start by looking at the contribution of each independent variable over time.

model |> 
  decomp_chart()

We can also visualize the relationships between our independent and dependent variables using response curves. From this we can see that, for example, when offline_media is 10, ecommerce increases by ~55. To capture non-linear relationships (i.e. response curves that aren’t straight lines) see the Advanced Features page.

model |> 
  response_curves(x_min = 0)

Next Steps

The Getting Started page is a good place to start learning how to build linear models with linea.
The Advanced Features page shows how to implement the features of linea that allow users to capture non-linear relationships.
The Additional Features illustrates page all other functions of the library.

Latest developments:

You can consult LINEA’s NEWS.md page on Github for details on the improvements from the last version.