Dark Mode
Linea-Analytics provides a measurement platform or full service measurement to:
To provide an open access forum for brands, agencies, publishers or students to understand & test how the relationship between two or more variables work we created Linea’s open source (frequentist) OLS library: LINEA
This page covers a basic how to setup the linea library
to analyse a time-series. We’ll cover:
linea
can doThe library can be installed GitHub using
devtools::install_github('linea-analytics/linea'). We’ll
soon be available on CRAN as well. Once installed you can check the
installation.
# devtools::install_github('linea-analytics/linea')
print(packageVersion("linea"))
## [1] '0.1.2'
The linea library works well with pipes. Used with dplyr
and plotly, it can perform data analysis and visualization with elegant
code. Let’s build a quick model to illustrate what linea
can do.
We start by importing linea, some other useful
libraries, and some data.
# librarise
library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization
# fictitious ecommerce data
data_path = 'https://raw.githubusercontent.com/paladinic/data/refs/heads/main/ecomm_data.csv'
# importing flat file
data = read_xcsv(file = data_path)
# adding seasonality and Google trends variables
data = data |> 
  get_seasonality(date_col_name = 'date',date_type = 'weekly starting')
# visualize data
data |> 
  datatable(rownames = NULL,
            options = list(scrollX = TRUE))
Now lets build a model to understand what drives changes in the
ecommerce variable. We can start by selecting a few initial
independent variables
(i.e. christmas,black.friday,trend,gtrends_prime day)
model = run_model(data = data,
                  dv = 'ecommerce',
                  ivs = c('christmas','black.friday',"trend"),
                  id_var = 'date')
## [1] "actual:"
## [1] 261
## [1] "pred:"
## [1] 261
## [1] "resid:"
## [1] 261
## [1] "id_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_2:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_3:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
summary(model)
## 
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20462  -4664   -741   2988  54502 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  44108.048    975.172  45.231  < 2e-16 ***
## christmas      294.052     27.219  10.803  < 2e-16 ***
## black.friday   317.203     40.339   7.863 1.03e-13 ***
## trend          130.445      6.308  20.680  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7666 on 257 degrees of freedom
## Multiple R-squared:  0.7323, Adjusted R-squared:  0.7291 
## F-statistic: 234.3 on 3 and 257 DF,  p-value: < 2.2e-16
Our next steps can be guided by functions like
what_next(), which will test all other variables in our
data. From the output below, it seems like the variables
covid and offline_media would improve the
model most.
model |> 
  what_next()
## # A tibble: 84 × 6
##    variable      adj_R2 t_stat       coef   vif adj_R2_diff
##    <chr>          <dbl>  <dbl>      <dbl> <dbl>       <dbl>
##  1 offline_media  0.821  11.5        6.50  1.12      0.126 
##  2 year_2020      0.796   9.26   12081.    1.59      0.0921
##  3 covid          0.795   9.12     188.    1.98      0.0901
##  4 year_2019      0.762  -6.09   -7043.    1.07      0.0457
##  5 christmas_eve  0.759  -5.75 -168934.    1.65      0.0412
##  6 week_num_48    0.753   5.09   21389.    1.21      0.0328
##  7 christmas_day  0.750  -4.79 -135781.    1.48      0.0292
##  8 week_num_52    0.748  -4.51  -21135.    1.48      0.0260
##  9 promo          0.740   3.48       5.50  1.07      0.0154
## 10 year_2021      0.738  -3.11   -7264.    1.19      0.0121
## # ℹ 74 more rows
Adding these variables to model brings the adjusted R squared above 80%.
model = run_model(data = data,
                  dv = 'ecommerce',
                  ivs = c('christmas','black.friday','trend','covid','offline_media'),
                  id_var = 'date')
## [1] "actual:"
## [1] 261
## [1] "pred:"
## [1] 261
## [1] "resid:"
## [1] 261
## [1] "id_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_2:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_3:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
summary(model)
## 
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21204.4  -3193.5   -874.6   2639.9  20486.7 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.819e+04  7.743e+02  62.228  < 2e-16 ***
## christmas     2.736e+02  1.978e+01  13.831  < 2e-16 ***
## black.friday  2.620e+02  2.969e+01   8.825  < 2e-16 ***
## trend         8.150e+01  6.378e+00  12.778  < 2e-16 ***
## covid         1.482e+02  1.737e+01   8.534 1.28e-15 ***
## offline_media 5.602e+00  5.093e-01  11.000  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5506 on 255 degrees of freedom
## Multiple R-squared:  0.863,  Adjusted R-squared:  0.8603 
## F-statistic: 321.2 on 5 and 255 DF,  p-value: < 2.2e-16
Now that we have a decent model we can start extracting insights from it. We can start by looking at the contribution of each independent variable over time.
model |> 
  decomp_chart()
We can also visualize the relationships between our independent and
dependent variables using response curves. From this we can see that,
for example, when offline_media is 10,
ecommerce increases by ~55. To capture non-linear
relationships (i.e. response curves that aren’t straight lines) see the
Advanced Features page.
model |> 
  response_curves(x_min = 0)
The Getting Started page
is a good place to start learning how to build linear models with
linea.
The Advanced Features
page shows how to implement the features of linea that
allow users to capture non-linear relationships.
The Additional Features illustrates page all other functions of the library.
Latest developments: