Removes the seasonal component from the time series, performs imputation on the deseasonalized series and afterwards adds the seasonal component again.

na_seadec(
  x,
  algorithm = "interpolation",
  find_frequency = FALSE,
  maxgap = Inf,
  ...
)

Arguments

x

Numeric Vector (vector) or Time Series (ts) object in which missing values shall be replaced

algorithm

Algorithm to be used after decomposition. Accepts the following input:

  • "interpolation" - Imputation by Interpolation (default choice)

  • "locf" - Imputation by Last Observation Carried Forward

  • "mean" - Imputation by Mean Value

  • "random" - Imputation by Random Sample

  • "kalman" - Imputation by Kalman Smoothing and State Space Models

  • "ma" - Imputation by Weighted Moving Average

find_frequency

If TRUE the algorithm will try to estimate the frequency of the time-series automatically.

maxgap

Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately.

...

Additional parameters for these algorithms that can be passed through. Look at na_interpolation, na_locf, na_random, na_mean for parameter options.

Value

Vector (vector) or Time Series (ts) object (dependent on given input at parameter x)

Details

The algorithm first performs a Seasonal Decomposition of Time Series by Loess via stl. Decomposing the time series into seasonal, trend and irregular components. The seasonal component gets then removed (subtracted) from the original series. As a second step the selected imputation algorithm e.g. na_locf, na_ma, ... is applied on the deseasonalized series. Thus, the algorithm can work without being affected by seasonal patterns. After filling the NA gaps, the seasonal component is added to the deseasonalized series again.

Implementation details: A paper about the STL Decomposition procedure is linked in the references. Since the function only works with complete data, the initial NA data is temporarily filled via linear interpolation in order to perform the decomposition. These temporarily imputed values are replaced with NAs again after obtaining the decomposition for the non-NA observations. STL decomposition is run with robust = TRUE and s.window = 11. Additionally, applying STL decomposition needs a preset frequency. This can be passed by the frequency set in the input ts object or by setting 'find_frequency=TRUE' in order to find an appropriate frequency for the time series. The find_frequency parameter internally uses findfrequency, which does a spectral analysis of the time series for identifying a suitable frequency. Using find_frequency will update the previously set frequency of a ts object to the newly found frequency. The default is 'find_frequency = FALSE', which gives a warning if no seasonality is set for the supplied time series object. If neither seasonality is set nor find_frequency is set to TRUE, the function goes on without decomposition and just applies the selected secondary algorithm to the original time series that still includes seasonality.

References

R. B. Cleveland, W. S. Cleveland, J.E. McRae, and I. Terpenning (1990) STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6, 3–73.

See also

Author

Steffen Moritz

Examples

# Example 1: Perform seasonal imputation using algorithm = "interpolation" na_seadec(tsAirgap, algorithm = "interpolation")
#> Jan Feb Mar Apr May Jun Jul Aug #> 1949 112.0000 118.0000 132.0000 129.0000 121.3941 135.0000 148.0000 148.0000 #> 1950 115.0000 126.0000 141.0000 135.0000 125.0000 149.0000 170.0000 170.0000 #> 1951 145.0000 150.0000 178.0000 163.0000 172.0000 178.0000 199.0000 199.0000 #> 1952 171.0000 180.0000 193.0000 181.0000 183.0000 218.0000 230.0000 242.0000 #> 1953 196.0000 196.0000 236.0000 235.0000 229.0000 243.0000 264.0000 272.0000 #> 1954 204.0000 188.0000 235.0000 227.0000 234.0000 274.9995 302.0000 293.0000 #> 1955 242.0000 233.0000 267.0000 269.0000 270.0000 315.0000 364.0000 347.0000 #> 1956 284.0000 277.0000 321.8776 321.4178 329.4329 374.0000 413.0000 405.0000 #> 1957 315.0000 301.0000 356.0000 348.0000 355.0000 424.1798 465.0000 467.0000 #> 1958 340.0000 318.0000 357.2362 348.0000 363.0000 435.0000 491.0000 505.0000 #> 1959 360.0000 342.0000 406.0000 396.0000 420.0000 472.0000 548.0000 559.0000 #> 1960 417.0000 391.0000 419.0000 461.0000 477.2415 535.0000 622.0000 606.0000 #> Sep Oct Nov Dec #> 1949 131.6204 119.0000 104.0000 118.0000 #> 1950 149.7682 133.0000 113.0095 140.0000 #> 1951 184.0000 162.0000 146.0000 166.0000 #> 1952 209.0000 191.0000 172.0000 194.0000 #> 1953 237.0000 211.0000 180.0000 201.0000 #> 1954 259.0000 229.0000 203.0000 229.0000 #> 1955 312.0000 274.0000 237.0000 278.0000 #> 1956 355.0000 306.0000 271.0000 306.0000 #> 1957 404.0000 347.0000 305.0315 336.0000 #> 1958 404.0000 359.0000 310.0000 337.0000 #> 1959 463.0000 407.0000 362.0000 398.3917 #> 1960 508.0000 461.0000 390.0000 432.0000
# Example 2: Perform seasonal imputation using algorithm = "mean" na_seadec(tsAirgap, algorithm = "mean")
#> Jan Feb Mar Apr May Jun Jul Aug #> 1949 112.0000 118.0000 132.0000 129.0000 278.5342 135.0000 148.0000 148.0000 #> 1950 115.0000 126.0000 141.0000 135.0000 125.0000 149.0000 170.0000 170.0000 #> 1951 145.0000 150.0000 178.0000 163.0000 172.0000 178.0000 199.0000 199.0000 #> 1952 171.0000 180.0000 193.0000 181.0000 183.0000 218.0000 230.0000 242.0000 #> 1953 196.0000 196.0000 236.0000 235.0000 229.0000 243.0000 264.0000 272.0000 #> 1954 204.0000 188.0000 235.0000 227.0000 234.0000 310.9817 302.0000 293.0000 #> 1955 242.0000 233.0000 267.0000 269.0000 270.0000 315.0000 364.0000 347.0000 #> 1956 284.0000 277.0000 282.0424 276.8787 280.1899 374.0000 413.0000 405.0000 #> 1957 315.0000 301.0000 356.0000 348.0000 355.0000 323.7648 465.0000 467.0000 #> 1958 340.0000 318.0000 281.3777 348.0000 363.0000 435.0000 491.0000 505.0000 #> 1959 360.0000 342.0000 406.0000 396.0000 420.0000 472.0000 548.0000 559.0000 #> 1960 417.0000 391.0000 419.0000 461.0000 281.0843 535.0000 622.0000 606.0000 #> Sep Oct Nov Dec #> 1949 289.2064 119.0000 104.0000 118.0000 #> 1950 289.6443 133.0000 237.2661 140.0000 #> 1951 184.0000 162.0000 146.0000 166.0000 #> 1952 209.0000 191.0000 172.0000 194.0000 #> 1953 237.0000 211.0000 180.0000 201.0000 #> 1954 259.0000 229.0000 203.0000 229.0000 #> 1955 312.0000 274.0000 237.0000 278.0000 #> 1956 355.0000 306.0000 271.0000 306.0000 #> 1957 404.0000 347.0000 215.2535 336.0000 #> 1958 404.0000 359.0000 310.0000 337.0000 #> 1959 463.0000 407.0000 362.0000 242.2860 #> 1960 508.0000 461.0000 390.0000 432.0000
# Example 3: Same as example 1, just written with pipe operator tsAirgap %>% na_seadec(algorithm = "interpolation")
#> Jan Feb Mar Apr May Jun Jul Aug #> 1949 112.0000 118.0000 132.0000 129.0000 121.3941 135.0000 148.0000 148.0000 #> 1950 115.0000 126.0000 141.0000 135.0000 125.0000 149.0000 170.0000 170.0000 #> 1951 145.0000 150.0000 178.0000 163.0000 172.0000 178.0000 199.0000 199.0000 #> 1952 171.0000 180.0000 193.0000 181.0000 183.0000 218.0000 230.0000 242.0000 #> 1953 196.0000 196.0000 236.0000 235.0000 229.0000 243.0000 264.0000 272.0000 #> 1954 204.0000 188.0000 235.0000 227.0000 234.0000 274.9995 302.0000 293.0000 #> 1955 242.0000 233.0000 267.0000 269.0000 270.0000 315.0000 364.0000 347.0000 #> 1956 284.0000 277.0000 321.8776 321.4178 329.4329 374.0000 413.0000 405.0000 #> 1957 315.0000 301.0000 356.0000 348.0000 355.0000 424.1798 465.0000 467.0000 #> 1958 340.0000 318.0000 357.2362 348.0000 363.0000 435.0000 491.0000 505.0000 #> 1959 360.0000 342.0000 406.0000 396.0000 420.0000 472.0000 548.0000 559.0000 #> 1960 417.0000 391.0000 419.0000 461.0000 477.2415 535.0000 622.0000 606.0000 #> Sep Oct Nov Dec #> 1949 131.6204 119.0000 104.0000 118.0000 #> 1950 149.7682 133.0000 113.0095 140.0000 #> 1951 184.0000 162.0000 146.0000 166.0000 #> 1952 209.0000 191.0000 172.0000 194.0000 #> 1953 237.0000 211.0000 180.0000 201.0000 #> 1954 259.0000 229.0000 203.0000 229.0000 #> 1955 312.0000 274.0000 237.0000 278.0000 #> 1956 355.0000 306.0000 271.0000 306.0000 #> 1957 404.0000 347.0000 305.0315 336.0000 #> 1958 404.0000 359.0000 310.0000 337.0000 #> 1959 463.0000 407.0000 362.0000 398.3917 #> 1960 508.0000 461.0000 390.0000 432.0000