Missing Value Imputation by Mean Value

Missing value replacement by mean values. Different means like median, mean, mode possible.

na_mean(x, option = "mean", maxgap = Inf)

Arguments

x

Numeric Vector (vector) or Time Series (ts) object in which missing values shall be replaced

option

Algorithm to be used. Accepts the following input:

"mean" - take the mean for imputation (default choice)
"median" - take the median for imputation
"mode" - take the mode for imputation
"harmonic" - take the harmonic mean
"geometric" - take the geometric mean

maxgap

Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately.

Value

Vector (vector) or Time Series (ts) object (dependent on given input at parameter x)

Details

Missing values get replaced by overall mean values. The function calculates the mean, median, mode, harmonic or geometric mean over all the non-NA values and replaces all NAs with this value. Option 'mode' replaces NAs with the most frequent value in the time series. If two or more values occur equally frequent, the function imputes the lower value. Due to their calculation formula geometric and harmonic mean are not well defined for negative values or zero values in the input series.

In general using the mean for imputation imputation is mostly a suboptimal choice and should be handled with great caution.

Author

Steffen Moritz

Examples

# Prerequisite: Create Time series with missing values
x <- ts(c(2, 3, 4, 5, 6, NA, 7, 8))

# Example 1: Perform imputation with the overall mean
na_mean(x)
#> Time Series:
#> Start = 1 
#> End = 8 
#> Frequency = 1 
#> [1] 2 3 4 5 6 5 7 8

# Example 2: Perform imputation with overall median
na_mean(x, option = "median")
#> Time Series:
#> Start = 1 
#> End = 8 
#> Frequency = 1 
#> [1] 2 3 4 5 6 5 7 8

# Example 3: Same as example 1, just written with pipe operator
x %>% na_mean()
#> Time Series:
#> Start = 1 
#> End = 8 
#> Frequency = 1 
#> [1] 2 3 4 5 6 5 7 8