Dark Mode
Linea-Analytics provides a measurement platform or full service measurement to:
To provide an open access forum for brands, agencies, publishers or students to understand & test how the relationship between two or more variables work we created Linea’s open source (frequentist) OLS library: LINEA
This page covers a basic how to setup the linea
library
to analyse a time-series. We’ll cover:
linea
can doThe library can be installed GitHub using
devtools::install_github('linea-analytics/linea')
. We’ll
soon be available on CRAN as well. Once installed you can check the
installation.
# devtools::install_github('linea-analytics/linea')
print(packageVersion("linea"))
## [1] '0.1.2'
The linea
library works well with pipes. Used with dplyr
and plotly, it can perform data analysis and visualization with elegant
code. Let’s build a quick model to illustrate what linea
can do.
We start by importing linea
, some other useful
libraries, and some data.
# librarise
library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization
# fictitious ecommerce data
data_path = 'https://raw.githubusercontent.com/paladinic/data/refs/heads/main/ecomm_data.csv'
# importing flat file
data = read_xcsv(file = data_path)
# adding seasonality and Google trends variables
data = data |>
get_seasonality(date_col_name = 'date',date_type = 'weekly starting')
# visualize data
data |>
datatable(rownames = NULL,
options = list(scrollX = TRUE))
Now lets build a model to understand what drives changes in the
ecommerce
variable. We can start by selecting a few initial
independent variables
(i.e. christmas
,black.friday
,trend
,gtrends_prime day
)
model = run_model(data = data,
dv = 'ecommerce',
ivs = c('christmas','black.friday',"trend"),
id_var = 'date')
## [1] "actual:"
## [1] 261
## [1] "pred:"
## [1] 261
## [1] "resid:"
## [1] 261
## [1] "id_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_2:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_3:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
summary(model)
##
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -20462 -4664 -741 2988 54502
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44108.048 975.172 45.231 < 2e-16 ***
## christmas 294.052 27.219 10.803 < 2e-16 ***
## black.friday 317.203 40.339 7.863 1.03e-13 ***
## trend 130.445 6.308 20.680 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7666 on 257 degrees of freedom
## Multiple R-squared: 0.7323, Adjusted R-squared: 0.7291
## F-statistic: 234.3 on 3 and 257 DF, p-value: < 2.2e-16
Our next steps can be guided by functions like
what_next()
, which will test all other variables in our
data. From the output below, it seems like the variables
covid
and offline_media
would improve the
model most.
model |>
what_next()
## # A tibble: 84 × 6
## variable adj_R2 t_stat coef vif adj_R2_diff
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 offline_media 0.821 11.5 6.50 1.12 0.126
## 2 year_2020 0.796 9.26 12081. 1.59 0.0921
## 3 covid 0.795 9.12 188. 1.98 0.0901
## 4 year_2019 0.762 -6.09 -7043. 1.07 0.0457
## 5 christmas_eve 0.759 -5.75 -168934. 1.65 0.0412
## 6 week_num_48 0.753 5.09 21389. 1.21 0.0328
## 7 christmas_day 0.750 -4.79 -135781. 1.48 0.0292
## 8 week_num_52 0.748 -4.51 -21135. 1.48 0.0260
## 9 promo 0.740 3.48 5.50 1.07 0.0154
## 10 year_2021 0.738 -3.11 -7264. 1.19 0.0121
## # ℹ 74 more rows
Adding these variables to model brings the adjusted R squared above 80%.
model = run_model(data = data,
dv = 'ecommerce',
ivs = c('christmas','black.friday','trend','covid','offline_media'),
id_var = 'date')
## [1] "actual:"
## [1] 261
## [1] "pred:"
## [1] 261
## [1] "resid:"
## [1] 261
## [1] "id_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_2:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
## [1] "pool_var_values:"
## [1] 261
## [1] "id_var_values_3:"
## [1] 261
## [1] "variable_decomp:"
## [1] 261
summary(model)
##
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t, fixed_ivs_t)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -21204.4 -3193.5 -874.6 2639.9 20486.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.819e+04 7.743e+02 62.228 < 2e-16 ***
## christmas 2.736e+02 1.978e+01 13.831 < 2e-16 ***
## black.friday 2.620e+02 2.969e+01 8.825 < 2e-16 ***
## trend 8.150e+01 6.378e+00 12.778 < 2e-16 ***
## covid 1.482e+02 1.737e+01 8.534 1.28e-15 ***
## offline_media 5.602e+00 5.093e-01 11.000 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5506 on 255 degrees of freedom
## Multiple R-squared: 0.863, Adjusted R-squared: 0.8603
## F-statistic: 321.2 on 5 and 255 DF, p-value: < 2.2e-16
Now that we have a decent model we can start extracting insights from it. We can start by looking at the contribution of each independent variable over time.
model |>
decomp_chart()
We can also visualize the relationships between our independent and
dependent variables using response curves. From this we can see that,
for example, when offline_media
is 10,
ecommerce
increases by ~55. To capture non-linear
relationships (i.e. response curves that aren’t straight lines) see the
Advanced Features page.
model |>
response_curves(x_min = 0)
The Getting Started page
is a good place to start learning how to build linear models with
linea
.
The Advanced Features
page shows how to implement the features of linea
that
allow users to capture non-linear relationships.
The Additional Features illustrates page all other functions of the library.
Latest developments: