Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward- LOCF). Optionally this can also be done starting from the back of the series (Next Observation Carried Backward - NOCB).
na_locf(x, option = "locf", na_remaining = "rev", maxgap = Inf)
x | Numeric Vector ( |
---|---|
option | Algorithm to be used. Accepts the following input:
|
na_remaining | Method to be used for remaining NAs.
|
maxgap | Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately. |
Vector (vector
) or Time Series (ts
)
object (dependent on given input at parameter x)
Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward - LOCF). This can also be done in reverse direction, starting from the end of the series (then called Next Observation Carried Backward - NOCB).
In case one or more successive observations directly at the start of the time series are NA, there exists no 'last value' yet, that can be carried forward. Thus, no LOCF imputation can be performed for these NAs. As soon as the first non-NA value appears, LOCF can be performed as expected. The same applies to NOCB, but from the opposite direction.
While this problem might appear seldom and will only affect a very small
amount of values at the beginning, it is something to consider.
The na_remaining
parameter helps to define, what should happen
with these values at the start, that would remain NA after pure LOCF.
Default setting is na_remaining = "rev"
, which performs
nocb / locf from the other direction to fill these NAs. So a NA
at the beginning will be filled with the next non-NA value appearing
in the series.
With na_remaining = "keep"
NAs at the beginning (that can not
be imputed with pure LOCF) are just left as remaining NAs.
With na_remaining = "rm"
NAs at the beginning of the series are
completely removed. Thus, the time series is basically shortened.
Also available is na_remaining = "mean"
, which uses the overall
mean of the time series to replace these remaining NAs. (but beware,
mean is usually not a good imputation choice - even if it only affects
the values at the beginning)
Steffen Moritz
# Prerequisite: Create Time series with missing values x <- ts(c(NA, 3, 4, 5, 6, NA, 7, 8)) # Example 1: Perform LOCF na_locf(x) # Example 2: Perform NOCF na_locf(x, option = "nocb") # Example 3: Perform LOCF and remove remaining NAs na_locf(x, na_remaining = "rm")#> [1] 3 4 5 6 7 8# Example 4: Same as example 1, just written with pipe operator x %>% na_locf()