Print summary stats about the distribution of missing values in a univariate time series.
statsNA(x, bins = 4, print_only = TRUE)
x | Numeric Vector ( |
---|---|
bins | Split number for bin stats. Number of bins the time series gets divided into. For each bin information about amount/percentage of missing values is printed. Default value is 4 - what means stats about the 1st,2nd,3rd,4th quarter of the time series are shown. |
print_only | Choose if the function Prints or Returns. For print_only = TRUE the function has no return value and just prints out missing value stats. If print_only is changed to FALSE, nothing is printed and the function returns a list.Print gives a little bit more information, since the returned list does not include "Stats for Bins" and "overview NA series" |
A list
containing the stats. Beware: Function gives
only a return value if print_only = FALSE.
Prints the following information about the missing values in the time series:
"Length of time series" - Number of observations in the time series (including NAs)
"Number of Missing Values" - Number of missing values in the time series
"Percentage of Missing Values" - Percentage of missing values in the time series
"Number of Gaps" - Number of NA gaps (consisting of one or more consecutive NAs) in the time series
"Average Gap Size" - Average size of consecutive NAs for the NA gaps in the time series
"Stats for Bins" - Number/percentage of missing values for the split into bins
"Longest NA gap" - Longest series of consecutive missing values (NAs in a row) in the time series
"Most frequent gap size" - Most frequent occurring series of missing values in the time series
"Gap size accounting for most NAs" - The series of consecutive missing values that accounts for most missing values overall in the time series
"Overview NA series" - Overview about how often each series of consecutive missing values occurs. Series occurring 0 times are skipped
It is furthermore, important to note, that you are able to choose whether the function returns a list or prints the information only. (see description of parameter "print_only")
Steffen Moritz
# Example 1: Print stats about the missing data in tsNH4 statsNA(tsNH4)#> [1] "Length of time series:" #> [1] 4552 #> [1] "-------------------------" #> [1] "Number of Missing Values:" #> [1] 883 #> [1] "-------------------------" #> [1] "Percentage of Missing Values:" #> [1] "19.4%" #> [1] "-------------------------" #> [1] "Number of Gaps:" #> [1] 155 #> [1] "-------------------------" #> [1] "Average Gap Size:" #> [1] 5.696774 #> [1] "-------------------------" #> [1] "Stats for Bins" #> [1] " Bin 1 (1138 values from 1 to 1138) : 233 NAs (20.5%)" #> [1] " Bin 2 (1138 values from 1139 to 2276) : 433 NAs (38%)" #> [1] " Bin 3 (1138 values from 2277 to 3414) : 135 NAs (11.9%)" #> [1] " Bin 4 (1138 values from 3415 to 4552) : 82 NAs (7.21%)" #> [1] "-------------------------" #> [1] "Longest NA gap (series of consecutive NAs)" #> [1] "157 in a row" #> [1] "-------------------------" #> [1] "Most frequent gap size (series of consecutive NA series)" #> [1] "1 NA in a row (occurring 68 times)" #> [1] "-------------------------" #> [1] "Gap size accounting for most NAs" #> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)" #> [1] "-------------------------" #> [1] "Overview NA series" #> [1] " 1 NA in a row: 68 times" #> [1] " 2 NA in a row: 26 times" #> [1] " 3 NA in a row: 16 times" #> [1] " 4 NA in a row: 10 times" #> [1] " 5 NA in a row: 8 times" #> [1] " 6 NA in a row: 4 times" #> [1] " 7 NA in a row: 2 times" #> [1] " 8 NA in a row: 3 times" #> [1] " 9 NA in a row: 2 times" #> [1] " 10 NA in a row: 1 times" #> [1] " 11 NA in a row: 1 times" #> [1] " 12 NA in a row: 2 times" #> [1] " 14 NA in a row: 1 times" #> [1] " 16 NA in a row: 1 times" #> [1] " 17 NA in a row: 1 times" #> [1] " 21 NA in a row: 1 times" #> [1] " 25 NA in a row: 1 times" #> [1] " 26 NA in a row: 1 times" #> [1] " 27 NA in a row: 1 times" #> [1] " 32 NA in a row: 1 times" #> [1] " 42 NA in a row: 2 times" #> [1] " 91 NA in a row: 1 times" #> [1] " 157 NA in a row: 1 times"# Example 2: Return list with stats about the missing data in tsAirgap statsNA(tsAirgap, print_only = FALSE)#> $length_series #> [1] 144 #> #> $number_NAs #> [1] 13 #> #> $number_na_gaps #> [1] 11 #> #> $average_size_na_gaps #> [1] 1.181818 #> #> $percentage_NAs #> [1] "9.03%" #> #> $longest_na_gap #> [1] 3 #> #> $most_frequent_na_gap #> [1] 1 #> #> $most_weighty_na_gap #> [1] 1 #> #> $df_distribution_na_gaps #> [1] 10 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [126] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #># Example 3: Same as example 1, just written with pipe operator tsNH4 %>% statsNA()#> [1] "Length of time series:" #> [1] 4552 #> [1] "-------------------------" #> [1] "Number of Missing Values:" #> [1] 883 #> [1] "-------------------------" #> [1] "Percentage of Missing Values:" #> [1] "19.4%" #> [1] "-------------------------" #> [1] "Number of Gaps:" #> [1] 155 #> [1] "-------------------------" #> [1] "Average Gap Size:" #> [1] 5.696774 #> [1] "-------------------------" #> [1] "Stats for Bins" #> [1] " Bin 1 (1138 values from 1 to 1138) : 233 NAs (20.5%)" #> [1] " Bin 2 (1138 values from 1139 to 2276) : 433 NAs (38%)" #> [1] " Bin 3 (1138 values from 2277 to 3414) : 135 NAs (11.9%)" #> [1] " Bin 4 (1138 values from 3415 to 4552) : 82 NAs (7.21%)" #> [1] "-------------------------" #> [1] "Longest NA gap (series of consecutive NAs)" #> [1] "157 in a row" #> [1] "-------------------------" #> [1] "Most frequent gap size (series of consecutive NA series)" #> [1] "1 NA in a row (occurring 68 times)" #> [1] "-------------------------" #> [1] "Gap size accounting for most NAs" #> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)" #> [1] "-------------------------" #> [1] "Overview NA series" #> [1] " 1 NA in a row: 68 times" #> [1] " 2 NA in a row: 26 times" #> [1] " 3 NA in a row: 16 times" #> [1] " 4 NA in a row: 10 times" #> [1] " 5 NA in a row: 8 times" #> [1] " 6 NA in a row: 4 times" #> [1] " 7 NA in a row: 2 times" #> [1] " 8 NA in a row: 3 times" #> [1] " 9 NA in a row: 2 times" #> [1] " 10 NA in a row: 1 times" #> [1] " 11 NA in a row: 1 times" #> [1] " 12 NA in a row: 2 times" #> [1] " 14 NA in a row: 1 times" #> [1] " 16 NA in a row: 1 times" #> [1] " 17 NA in a row: 1 times" #> [1] " 21 NA in a row: 1 times" #> [1] " 25 NA in a row: 1 times" #> [1] " 26 NA in a row: 1 times" #> [1] " 27 NA in a row: 1 times" #> [1] " 32 NA in a row: 1 times" #> [1] " 42 NA in a row: 2 times" #> [1] " 91 NA in a row: 1 times" #> [1] " 157 NA in a row: 1 times"