Print summary stats about the distribution of missing values in a univariate time series.

statsNA(x, bins = 4, print_only = TRUE)

Arguments

x

Numeric Vector (vector) or Time Series (ts) object containing NAs

bins

Split number for bin stats. Number of bins the time series gets divided into. For each bin information about amount/percentage of missing values is printed. Default value is 4 - what means stats about the 1st,2nd,3rd,4th quarter of the time series are shown.

print_only

Choose if the function Prints or Returns. For print_only = TRUE the function has no return value and just prints out missing value stats. If print_only is changed to FALSE, nothing is printed and the function returns a list.Print gives a little bit more information, since the returned list does not include "Stats for Bins" and "overview NA series"

Value

A list containing the stats. Beware: Function gives only a return value if print_only = FALSE.

Details

Prints the following information about the missing values in the time series:

  • "Length of time series" - Number of observations in the time series (including NAs)

  • "Number of Missing Values" - Number of missing values in the time series

  • "Percentage of Missing Values" - Percentage of missing values in the time series

  • "Number of Gaps" - Number of NA gaps (consisting of one or more consecutive NAs) in the time series

  • "Average Gap Size" - Average size of consecutive NAs for the NA gaps in the time series

  • "Stats for Bins" - Number/percentage of missing values for the split into bins

  • "Longest NA gap" - Longest series of consecutive missing values (NAs in a row) in the time series

  • "Most frequent gap size" - Most frequent occurring series of missing values in the time series

  • "Gap size accounting for most NAs" - The series of consecutive missing values that accounts for most missing values overall in the time series

  • "Overview NA series" - Overview about how often each series of consecutive missing values occurs. Series occurring 0 times are skipped

It is furthermore, important to note, that you are able to choose whether the function returns a list or prints the information only. (see description of parameter "print_only")

See also

Author

Steffen Moritz

Examples

# Example 1: Print stats about the missing data in tsNH4 statsNA(tsNH4)
#> [1] "Length of time series:" #> [1] 4552 #> [1] "-------------------------" #> [1] "Number of Missing Values:" #> [1] 883 #> [1] "-------------------------" #> [1] "Percentage of Missing Values:" #> [1] "19.4%" #> [1] "-------------------------" #> [1] "Number of Gaps:" #> [1] 155 #> [1] "-------------------------" #> [1] "Average Gap Size:" #> [1] 5.696774 #> [1] "-------------------------" #> [1] "Stats for Bins" #> [1] " Bin 1 (1138 values from 1 to 1138) : 233 NAs (20.5%)" #> [1] " Bin 2 (1138 values from 1139 to 2276) : 433 NAs (38%)" #> [1] " Bin 3 (1138 values from 2277 to 3414) : 135 NAs (11.9%)" #> [1] " Bin 4 (1138 values from 3415 to 4552) : 82 NAs (7.21%)" #> [1] "-------------------------" #> [1] "Longest NA gap (series of consecutive NAs)" #> [1] "157 in a row" #> [1] "-------------------------" #> [1] "Most frequent gap size (series of consecutive NA series)" #> [1] "1 NA in a row (occurring 68 times)" #> [1] "-------------------------" #> [1] "Gap size accounting for most NAs" #> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)" #> [1] "-------------------------" #> [1] "Overview NA series" #> [1] " 1 NA in a row: 68 times" #> [1] " 2 NA in a row: 26 times" #> [1] " 3 NA in a row: 16 times" #> [1] " 4 NA in a row: 10 times" #> [1] " 5 NA in a row: 8 times" #> [1] " 6 NA in a row: 4 times" #> [1] " 7 NA in a row: 2 times" #> [1] " 8 NA in a row: 3 times" #> [1] " 9 NA in a row: 2 times" #> [1] " 10 NA in a row: 1 times" #> [1] " 11 NA in a row: 1 times" #> [1] " 12 NA in a row: 2 times" #> [1] " 14 NA in a row: 1 times" #> [1] " 16 NA in a row: 1 times" #> [1] " 17 NA in a row: 1 times" #> [1] " 21 NA in a row: 1 times" #> [1] " 25 NA in a row: 1 times" #> [1] " 26 NA in a row: 1 times" #> [1] " 27 NA in a row: 1 times" #> [1] " 32 NA in a row: 1 times" #> [1] " 42 NA in a row: 2 times" #> [1] " 91 NA in a row: 1 times" #> [1] " 157 NA in a row: 1 times"
# Example 2: Return list with stats about the missing data in tsAirgap statsNA(tsAirgap, print_only = FALSE)
#> $length_series #> [1] 144 #> #> $number_NAs #> [1] 13 #> #> $number_na_gaps #> [1] 11 #> #> $average_size_na_gaps #> [1] 1.181818 #> #> $percentage_NAs #> [1] "9.03%" #> #> $longest_na_gap #> [1] 3 #> #> $most_frequent_na_gap #> [1] 1 #> #> $most_weighty_na_gap #> [1] 1 #> #> $df_distribution_na_gaps #> [1] 10 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [126] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #>
# Example 3: Same as example 1, just written with pipe operator tsNH4 %>% statsNA()
#> [1] "Length of time series:" #> [1] 4552 #> [1] "-------------------------" #> [1] "Number of Missing Values:" #> [1] 883 #> [1] "-------------------------" #> [1] "Percentage of Missing Values:" #> [1] "19.4%" #> [1] "-------------------------" #> [1] "Number of Gaps:" #> [1] 155 #> [1] "-------------------------" #> [1] "Average Gap Size:" #> [1] 5.696774 #> [1] "-------------------------" #> [1] "Stats for Bins" #> [1] " Bin 1 (1138 values from 1 to 1138) : 233 NAs (20.5%)" #> [1] " Bin 2 (1138 values from 1139 to 2276) : 433 NAs (38%)" #> [1] " Bin 3 (1138 values from 2277 to 3414) : 135 NAs (11.9%)" #> [1] " Bin 4 (1138 values from 3415 to 4552) : 82 NAs (7.21%)" #> [1] "-------------------------" #> [1] "Longest NA gap (series of consecutive NAs)" #> [1] "157 in a row" #> [1] "-------------------------" #> [1] "Most frequent gap size (series of consecutive NA series)" #> [1] "1 NA in a row (occurring 68 times)" #> [1] "-------------------------" #> [1] "Gap size accounting for most NAs" #> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)" #> [1] "-------------------------" #> [1] "Overview NA series" #> [1] " 1 NA in a row: 68 times" #> [1] " 2 NA in a row: 26 times" #> [1] " 3 NA in a row: 16 times" #> [1] " 4 NA in a row: 10 times" #> [1] " 5 NA in a row: 8 times" #> [1] " 6 NA in a row: 4 times" #> [1] " 7 NA in a row: 2 times" #> [1] " 8 NA in a row: 3 times" #> [1] " 9 NA in a row: 2 times" #> [1] " 10 NA in a row: 1 times" #> [1] " 11 NA in a row: 1 times" #> [1] " 12 NA in a row: 2 times" #> [1] " 14 NA in a row: 1 times" #> [1] " 16 NA in a row: 1 times" #> [1] " 17 NA in a row: 1 times" #> [1] " 21 NA in a row: 1 times" #> [1] " 25 NA in a row: 1 times" #> [1] " 26 NA in a row: 1 times" #> [1] " 27 NA in a row: 1 times" #> [1] " 32 NA in a row: 1 times" #> [1] " 42 NA in a row: 2 times" #> [1] " 91 NA in a row: 1 times" #> [1] " 157 NA in a row: 1 times"