Missing value replacement by weighted moving average. Uses semi-adaptive window size to ensure all NAs are replaced.

na_ma(x, k = 4, weighting = "exponential", maxgap = Inf)

Arguments

x

Numeric Vector (vector) or Time Series (ts) object in which missing values shall be replaced

k

integer width of the moving average window. Expands to both sides of the center element e.g. k=2 means 4 observations (2 left, 2 right) are taken into account. If all observations in the current window are NA, the window size is automatically increased until there are at least 2 non-NA values present.

weighting

Weighting to be used. Accepts the following input:

  • "simple" - Simple Moving Average (SMA)

  • "linear" - Linear Weighted Moving Average (LWMA)

  • "exponential" - Exponential Weighted Moving Average (EWMA) (default choice)

maxgap

Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately.

Value

Vector (vector) or Time Series (ts) object (dependent on given input at parameter x)

Details

In this function missing values get replaced by moving average values. Moving Averages are also sometimes referred to as "moving mean", "rolling mean", "rolling average" or "running average".

The mean in this implementation taken from an equal number of observations on either side of a central value. This means for an NA value at position i of a time series, the observations i-1,i+1 and i+1, i+2 (assuming a window size of k=2) are used to calculate the mean.

Since it can in case of long NA gaps also occur, that all values next to the central value are also NA, the algorithm has a semi-adaptive window size. Whenever there are less than 2 non-NA values in the complete window available, the window size is incrementally increased, till at least 2 non-NA values are there. In all other cases the algorithm sticks to the pre-set window size.

There are options for using Simple Moving Average (SMA), Linear Weighted Moving Average (LWMA) and Exponential Weighted Moving Average (EWMA).

SMA: all observations in the window are equally weighted for calculating the mean.

LWMA: weights decrease in arithmetical progression. The observations directly next to a central value i, have weight 1/2, the observations one further away (i-2,i+2) have weight 1/3, the next (i-3,i+3) have weight 1/4, ...

EWMA: uses weighting factors which decrease exponentially. The observations directly next to a central value i, have weight 1/2^1, the observations one further away (i-2,i+2) have weight 1/2^2, the next (i-3,i+3) have weight 1/2^3, ...

See also

Author

Steffen Moritz

Examples

# Example 1: Perform imputation with simple moving average na_ma(tsAirgap, weighting = "simple")
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 112 118 132 129 NA 135 148 148 NA 119 104 118 #> 1950 115 126 141 135 125 149 170 170 NA 133 NA 140 #> 1951 145 150 178 163 172 178 199 199 184 162 146 166 #> 1952 171 180 193 181 183 218 230 242 209 191 172 194 #> 1953 196 196 236 235 229 243 264 272 237 211 180 201 #> 1954 204 188 235 227 234 NA 302 293 259 229 203 229 #> 1955 242 233 267 269 270 315 364 347 312 274 237 278 #> 1956 284 277 NA NA NA 374 413 405 355 306 271 306 #> 1957 315 301 356 348 355 NA 465 467 404 347 NA 336 #> 1958 340 318 NA 348 363 435 491 505 404 359 310 337 #> 1959 360 342 406 396 420 472 548 559 463 407 362 NA #> 1960 417 391 419 461 NA 535 622 606 508 461 390 432
# Example 2: Perform imputation with exponential weighted moving average na_ma(tsAirgap)
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 112 118 132 129 NA 135 148 148 NA 119 104 118 #> 1950 115 126 141 135 125 149 170 170 NA 133 NA 140 #> 1951 145 150 178 163 172 178 199 199 184 162 146 166 #> 1952 171 180 193 181 183 218 230 242 209 191 172 194 #> 1953 196 196 236 235 229 243 264 272 237 211 180 201 #> 1954 204 188 235 227 234 NA 302 293 259 229 203 229 #> 1955 242 233 267 269 270 315 364 347 312 274 237 278 #> 1956 284 277 NA NA NA 374 413 405 355 306 271 306 #> 1957 315 301 356 348 355 NA 465 467 404 347 NA 336 #> 1958 340 318 NA 348 363 435 491 505 404 359 310 337 #> 1959 360 342 406 396 420 472 548 559 463 407 362 NA #> 1960 417 391 419 461 NA 535 622 606 508 461 390 432
# Example 3: Perform imputation with exponential weighted moving average, window size 6 na_ma(tsAirgap, k = 6)
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 112 118 132 129 NA 135 148 148 NA 119 104 118 #> 1950 115 126 141 135 125 149 170 170 NA 133 NA 140 #> 1951 145 150 178 163 172 178 199 199 184 162 146 166 #> 1952 171 180 193 181 183 218 230 242 209 191 172 194 #> 1953 196 196 236 235 229 243 264 272 237 211 180 201 #> 1954 204 188 235 227 234 NA 302 293 259 229 203 229 #> 1955 242 233 267 269 270 315 364 347 312 274 237 278 #> 1956 284 277 NA NA NA 374 413 405 355 306 271 306 #> 1957 315 301 356 348 355 NA 465 467 404 347 NA 336 #> 1958 340 318 NA 348 363 435 491 505 404 359 310 337 #> 1959 360 342 406 396 420 472 548 559 463 407 362 NA #> 1960 417 391 419 461 NA 535 622 606 508 461 390 432
# Example 4: Same as example 1, just written with pipe operator tsAirgap %>% na_ma(weighting = "simple")
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 112 118 132 129 NA 135 148 148 NA 119 104 118 #> 1950 115 126 141 135 125 149 170 170 NA 133 NA 140 #> 1951 145 150 178 163 172 178 199 199 184 162 146 166 #> 1952 171 180 193 181 183 218 230 242 209 191 172 194 #> 1953 196 196 236 235 229 243 264 272 237 211 180 201 #> 1954 204 188 235 227 234 NA 302 293 259 229 203 229 #> 1955 242 233 267 269 270 315 364 347 312 274 237 278 #> 1956 284 277 NA NA NA 374 413 405 355 306 271 306 #> 1957 315 301 356 348 355 NA 465 467 404 347 NA 336 #> 1958 340 318 NA 348 363 435 491 505 404 359 310 337 #> 1959 360 342 406 396 420 472 548 559 463 407 362 NA #> 1960 417 391 419 461 NA 535 622 606 508 461 390 432