Print summary stats about the distribution of missing values in a univariate time series.

statsNA(x, bins = 4, print_only = TRUE)

## Arguments

x Numeric Vector (vector) or Time Series (ts) object containing NAs Split number for bin stats. Number of bins the time series gets divided into. For each bin information about amount/percentage of missing values is printed. Default value is 4 - what means stats about the 1st,2nd,3rd,4th quarter of the time series are shown. Choose if the function Prints or Returns. For print_only = TRUE the function has no return value and just prints out missing value stats. If print_only is changed to FALSE, nothing is printed and the function returns a list.Print gives a little bit more information, since the returned list does not include "Stats for Bins" and "overview NA series"

## Value

A list containing the stats. Beware: Function gives only a return value if print_only = FALSE.

## Details

Prints the following information about the missing values in the time series:

• "Length of time series" - Number of observations in the time series (including NAs)

• "Number of Missing Values" - Number of missing values in the time series

• "Percentage of Missing Values" - Percentage of missing values in the time series

• "Number of Gaps" - Number of NA gaps (consisting of one or more consecutive NAs) in the time series

• "Average Gap Size" - Average size of consecutive NAs for the NA gaps in the time series

• "Stats for Bins" - Number/percentage of missing values for the split into bins

• "Longest NA gap" - Longest series of consecutive missing values (NAs in a row) in the time series

• "Most frequent gap size" - Most frequent occurring series of missing values in the time series

• "Gap size accounting for most NAs" - The series of consecutive missing values that accounts for most missing values overall in the time series

• "Overview NA series" - Overview about how often each series of consecutive missing values occurs. Series occurring 0 times are skipped

It is furthermore, important to note, that you are able to choose whether the function returns a list or prints the information only. (see description of parameter "print_only")

ggplot_na_distribution, ggplot_na_distribution2, ggplot_na_gapsize

Steffen Moritz

## Examples

# Example 1: Print stats about the missing data in tsNH4
statsNA(tsNH4)
#> [1] "Length of time series:"
#> [1] 4552
#> [1] "-------------------------"
#> [1] "Number of Missing Values:"
#> [1] 883
#> [1] "-------------------------"
#> [1] "Percentage of Missing Values:"
#> [1] "19.4%"
#> [1] "-------------------------"
#> [1] "Number of Gaps:"
#> [1] 155
#> [1] "-------------------------"
#> [1] "Average Gap Size:"
#> [1] 5.696774
#> [1] "-------------------------"
#> [1] "Stats for Bins"
#> [1] "  Bin 1 (1138 values from 1 to 1138) :      233 NAs (20.5%)"
#> [1] "  Bin 2 (1138 values from 1139 to 2276) :      433 NAs (38%)"
#> [1] "  Bin 3 (1138 values from 2277 to 3414) :      135 NAs (11.9%)"
#> [1] "  Bin 4 (1138 values from 3415 to 4552) :      82 NAs (7.21%)"
#> [1] "-------------------------"
#> [1] "Longest NA gap (series of consecutive NAs)"
#> [1] "157 in a row"
#> [1] "-------------------------"
#> [1] "Most frequent gap size (series of consecutive NA series)"
#> [1] "1 NA in a row (occurring 68 times)"
#> [1] "-------------------------"
#> [1] "Gap size accounting for most NAs"
#> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)"
#> [1] "-------------------------"
#> [1] "Overview NA series"
#> [1] "  1 NA in a row: 68 times"
#> [1] "  2 NA in a row: 26 times"
#> [1] "  3 NA in a row: 16 times"
#> [1] "  4 NA in a row: 10 times"
#> [1] "  5 NA in a row: 8 times"
#> [1] "  6 NA in a row: 4 times"
#> [1] "  7 NA in a row: 2 times"
#> [1] "  8 NA in a row: 3 times"
#> [1] "  9 NA in a row: 2 times"
#> [1] "  10 NA in a row: 1 times"
#> [1] "  11 NA in a row: 1 times"
#> [1] "  12 NA in a row: 2 times"
#> [1] "  14 NA in a row: 1 times"
#> [1] "  16 NA in a row: 1 times"
#> [1] "  17 NA in a row: 1 times"
#> [1] "  21 NA in a row: 1 times"
#> [1] "  25 NA in a row: 1 times"
#> [1] "  26 NA in a row: 1 times"
#> [1] "  27 NA in a row: 1 times"
#> [1] "  32 NA in a row: 1 times"
#> [1] "  42 NA in a row: 2 times"
#> [1] "  91 NA in a row: 1 times"
#> [1] "  157 NA in a row: 1 times"
# Example 2: Return list with stats about the missing data in tsAirgap
statsNA(tsAirgap, print_only = FALSE)
#> $length_series #> [1] 144 #> #>$number_NAs
#> [1] 13
#>
#> $number_na_gaps #> [1] 11 #> #>$average_size_na_gaps
#> [1] 1.181818
#>
#> $percentage_NAs #> [1] "9.03%" #> #>$longest_na_gap
#> [1] 3
#>
#> $most_frequent_na_gap #> [1] 1 #> #>$most_weighty_na_gap
#> [1] 1
#>
#> \$df_distribution_na_gaps
#>   [1] 10  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#> [101]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#> [126]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
#>
# Example 3: Same as example 1, just written with pipe operator
tsNH4 %>% statsNA()
#> [1] "Length of time series:"
#> [1] 4552
#> [1] "-------------------------"
#> [1] "Number of Missing Values:"
#> [1] 883
#> [1] "-------------------------"
#> [1] "Percentage of Missing Values:"
#> [1] "19.4%"
#> [1] "-------------------------"
#> [1] "Number of Gaps:"
#> [1] 155
#> [1] "-------------------------"
#> [1] "Average Gap Size:"
#> [1] 5.696774
#> [1] "-------------------------"
#> [1] "Stats for Bins"
#> [1] "  Bin 1 (1138 values from 1 to 1138) :      233 NAs (20.5%)"
#> [1] "  Bin 2 (1138 values from 1139 to 2276) :      433 NAs (38%)"
#> [1] "  Bin 3 (1138 values from 2277 to 3414) :      135 NAs (11.9%)"
#> [1] "  Bin 4 (1138 values from 3415 to 4552) :      82 NAs (7.21%)"
#> [1] "-------------------------"
#> [1] "Longest NA gap (series of consecutive NAs)"
#> [1] "157 in a row"
#> [1] "-------------------------"
#> [1] "Most frequent gap size (series of consecutive NA series)"
#> [1] "1 NA in a row (occurring 68 times)"
#> [1] "-------------------------"
#> [1] "Gap size accounting for most NAs"
#> [1] "157 NA in a row (occurring 1 times, making up for overall 157 NAs)"
#> [1] "-------------------------"
#> [1] "Overview NA series"
#> [1] "  1 NA in a row: 68 times"
#> [1] "  2 NA in a row: 26 times"
#> [1] "  3 NA in a row: 16 times"
#> [1] "  4 NA in a row: 10 times"
#> [1] "  5 NA in a row: 8 times"
#> [1] "  6 NA in a row: 4 times"
#> [1] "  7 NA in a row: 2 times"
#> [1] "  8 NA in a row: 3 times"
#> [1] "  9 NA in a row: 2 times"
#> [1] "  10 NA in a row: 1 times"
#> [1] "  11 NA in a row: 1 times"
#> [1] "  12 NA in a row: 2 times"
#> [1] "  14 NA in a row: 1 times"
#> [1] "  16 NA in a row: 1 times"
#> [1] "  17 NA in a row: 1 times"
#> [1] "  21 NA in a row: 1 times"
#> [1] "  25 NA in a row: 1 times"
#> [1] "  26 NA in a row: 1 times"
#> [1] "  27 NA in a row: 1 times"
#> [1] "  32 NA in a row: 1 times"
#> [1] "  42 NA in a row: 2 times"
#> [1] "  91 NA in a row: 1 times"
#> [1] "  157 NA in a row: 1 times"