Print summary stats about the distribution of missing values in a univariate time series.
statsNA(x, bins = 4, print_only = TRUE)
Numeric Vector (vector
) or
Time Series (ts
) object containing NAs
Split number for bin stats. Number of bins the time series gets divided into. For each bin information about amount/percentage of missing values is printed. Default value is 4 - what means stats about the 1st,2nd,3rd,4th quarter of the time series are shown.
Choose if the function Prints or Returns. For print_only = TRUE the function has no return value and just prints out missing value stats. If print_only is changed to FALSE, nothing is printed and the function returns a list.Print gives a little bit more information, since the returned list does not include "Stats for Bins" and "overview NA series"
A list
containing the stats. Beware: Function gives
only a return value if print_only = FALSE.
Prints the following information about the missing values in the time series:
"Length of time series" - Number of observations in the time series (including NAs)
"Number of Missing Values" - Number of missing values in the time series
"Percentage of Missing Values" - Percentage of missing values in the time series
"Number of Gaps" - Number of NA gaps (consisting of one or more consecutive NAs) in the time series
"Average Gap Size" - Average size of consecutive NAs for the NA gaps in the time series
"Stats for Bins" - Number/percentage of missing values for the split into bins
"Longest NA gap" - Longest series of consecutive missing values (NAs in a row) in the time series
"Most frequent gap size" - Most frequent occurring series of missing values in the time series
"Gap size accounting for most NAs" - The series of consecutive missing values that accounts for most missing values overall in the time series
"Overview NA series" - Overview about how often each series of consecutive missing values occurs. Series occurring 0 times are skipped
It is furthermore, important to note, that you are able to choose whether the function returns a list or prints the information only. (see description of parameter "print_only")
# Example 1: Print stats about the missing data in tsNH4
statsNA(tsNH4)
#> [1] "Length of time series:"
#> [1] 4552
#> [1] "-------------------------"
#> [1] "Number of Missing Values:"
#> [1] 883
#> [1] "-------------------------"
#> [1] "Percentage of Missing Values:"
#> [1] "19.4%"
#> [1] "-------------------------"
#> [1] "Number of Gaps:"
#> [1] 155
#> [1] "-------------------------"
#> [1] "Average Gap Size:"
#> [1] 5.696774
#> [1] "-------------------------"
#> [1] "Stats for Bins"
#> [1] " Bin 1 (1138 values from 1 to 1138) : 233 NAs (20.5%)"
#> [1] " Bin 2 (1138 values from 1139 to 2276) : 433 NAs (38%)"
#> [1] " Bin 3 (1138 values from 2277 to 3414) : 135 NAs (11.9%)"
#> [1] " Bin 4 (1138 values from 3415 to 4552) : 82 NAs (7.21%)"
#> [1] "-------------------------"
#> [1] "Longest NA gap (series of consecutive NAs)"
#> [1] "157 in a row"
#> [1] "-------------------------"
#> [1] "Most frequent gap size (series of consecutive NA series)"
#> [1] "1 NA in a row (occurring 68 times)"
#> [1] "-------------------------"
#> [1] "Gap size accounting for most NAs"
#> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)"
#> [1] "-------------------------"
#> [1] "Overview NA series"
#> [1] " 1 NA in a row: 68 times"
#> [1] " 2 NA in a row: 26 times"
#> [1] " 3 NA in a row: 16 times"
#> [1] " 4 NA in a row: 10 times"
#> [1] " 5 NA in a row: 8 times"
#> [1] " 6 NA in a row: 4 times"
#> [1] " 7 NA in a row: 2 times"
#> [1] " 8 NA in a row: 3 times"
#> [1] " 9 NA in a row: 2 times"
#> [1] " 10 NA in a row: 1 times"
#> [1] " 11 NA in a row: 1 times"
#> [1] " 12 NA in a row: 2 times"
#> [1] " 14 NA in a row: 1 times"
#> [1] " 16 NA in a row: 1 times"
#> [1] " 17 NA in a row: 1 times"
#> [1] " 21 NA in a row: 1 times"
#> [1] " 25 NA in a row: 1 times"
#> [1] " 26 NA in a row: 1 times"
#> [1] " 27 NA in a row: 1 times"
#> [1] " 32 NA in a row: 1 times"
#> [1] " 42 NA in a row: 2 times"
#> [1] " 91 NA in a row: 1 times"
#> [1] " 157 NA in a row: 1 times"
# Example 2: Return list with stats about the missing data in tsAirgap
statsNA(tsAirgap, print_only = FALSE)
#> $length_series
#> [1] 144
#>
#> $number_NAs
#> [1] 13
#>
#> $number_na_gaps
#> [1] 11
#>
#> $average_size_na_gaps
#> [1] 1.181818
#>
#> $percentage_NAs
#> [1] "9.03%"
#>
#> $longest_na_gap
#> [1] 3
#>
#> $most_frequent_na_gap
#> [1] 1
#>
#> $most_weighty_na_gap
#> [1] 1
#>
#> $df_distribution_na_gaps
#> [1] 10 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [126] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
# Example 3: Same as example 1, just written with pipe operator
tsNH4 %>% statsNA()
#> [1] "Length of time series:"
#> [1] 4552
#> [1] "-------------------------"
#> [1] "Number of Missing Values:"
#> [1] 883
#> [1] "-------------------------"
#> [1] "Percentage of Missing Values:"
#> [1] "19.4%"
#> [1] "-------------------------"
#> [1] "Number of Gaps:"
#> [1] 155
#> [1] "-------------------------"
#> [1] "Average Gap Size:"
#> [1] 5.696774
#> [1] "-------------------------"
#> [1] "Stats for Bins"
#> [1] " Bin 1 (1138 values from 1 to 1138) : 233 NAs (20.5%)"
#> [1] " Bin 2 (1138 values from 1139 to 2276) : 433 NAs (38%)"
#> [1] " Bin 3 (1138 values from 2277 to 3414) : 135 NAs (11.9%)"
#> [1] " Bin 4 (1138 values from 3415 to 4552) : 82 NAs (7.21%)"
#> [1] "-------------------------"
#> [1] "Longest NA gap (series of consecutive NAs)"
#> [1] "157 in a row"
#> [1] "-------------------------"
#> [1] "Most frequent gap size (series of consecutive NA series)"
#> [1] "1 NA in a row (occurring 68 times)"
#> [1] "-------------------------"
#> [1] "Gap size accounting for most NAs"
#> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)"
#> [1] "-------------------------"
#> [1] "Overview NA series"
#> [1] " 1 NA in a row: 68 times"
#> [1] " 2 NA in a row: 26 times"
#> [1] " 3 NA in a row: 16 times"
#> [1] " 4 NA in a row: 10 times"
#> [1] " 5 NA in a row: 8 times"
#> [1] " 6 NA in a row: 4 times"
#> [1] " 7 NA in a row: 2 times"
#> [1] " 8 NA in a row: 3 times"
#> [1] " 9 NA in a row: 2 times"
#> [1] " 10 NA in a row: 1 times"
#> [1] " 11 NA in a row: 1 times"
#> [1] " 12 NA in a row: 2 times"
#> [1] " 14 NA in a row: 1 times"
#> [1] " 16 NA in a row: 1 times"
#> [1] " 17 NA in a row: 1 times"
#> [1] " 21 NA in a row: 1 times"
#> [1] " 25 NA in a row: 1 times"
#> [1] " 26 NA in a row: 1 times"
#> [1] " 27 NA in a row: 1 times"
#> [1] " 32 NA in a row: 1 times"
#> [1] " 42 NA in a row: 2 times"
#> [1] " 91 NA in a row: 1 times"
#> [1] " 157 NA in a row: 1 times"