ROV BIZSTATS QUICK REFERENCE GUIDE: ANALYTICS SUMMARY

The material below comprises excerpts from books by Dr. Johnathan Mun, our CEO and founder, such as Readings in Certified Quantitative Risk Management, 3rd Edition, and Quantitative Research Methods Using Risk Simulator and ROV BizStats Software Applying Econometrics, Multivariate Regression, Parametric and Nonparametric Hypothesis Testing, Monte Carlo Risk Simulation, Predictive Modeling, and Optimization, 4th Edition (https://www.amazon.com/author/johnathanmun). All screenshots and analytical models are run using the ROV Risk Simulator and ROV BizStats software applications. Statistical results shown are computed using Risk Simulator or BizStats. Online Training Videos are also available on these topics as well as the Certified in Quantitative Risk Management (CQRM) certification program. All materials are copyrighted as well as patent protected under international law, with all rights reserved.

The following is a quick reference guide to all the analytics and methods in ROV’s BizStats software. It begins with an alphabetical presentation of each model that includes a description of what the method or model does, a short tip that is also visible in the BizStats software, the required data inputs, and examples of data inputs. Note that additional examples of data types and how data variables should be arranged are provided in the data type sections previously in Chapter 8. In that chapter, the methods and models are also arranged by categories (e.g., multivariate methods versus single variable methods, or stochastic models versus reliability and consistency methods).

ANCOVA (Single Factor Multiple Treatments). Performs ANCOVA or Analysis of Covariance with multiple repeated treatments (Group 1) that removes the Group 2 covariate effects. The net effects after covariates have been accounted for will be tested against the null hypothesis that the various treatments in Group 1 are identical to each other after accounting for the effects of covariates in Group 2.
- Short Tip: Analysis of Covariance with multiple repeated treatments (Group 1) that removes the Group 2 covariate effects (H₀: the various treatments are identical).
- Model Input: Data Type D. Two groups of variables are required. Both groups are required to have the same number of variables. Group 1 has the main variables to test where each variable is a type of treatment like ANOVA. Group 2 has the covariates whose effects the analysis will integrate into the model.
  - Group 1 Main Variables, Group 2 Covariate Variables:

- - - >VAR1; VAR2; VAR3; VAR4; …
    - >VAR5; VAR6; VAR7; VAR8; …

ANOVA (MANOVA General Linear Model). Runs the Multiple ANOVA (MANOVA) with multiple numerical dependent variables against one alphanumeric categorical independent variable. Extends the ANOVA single factor multiple treatments to include multiple simultaneous dependent variables. The null hypothesis tested is that there is zero mean difference among all the variables. The computed statistics include the standard F statistics as well as Pillai’s Trace, Wilk’s Lambda, and Hotelling’s Trace, which modifies the degrees of freedom and sums of squares to adjust for the simultaneous tests of multiple dependent variables.
- Short Tip: MANOVA with multiple numerical dependent variables and one alphanumeric categorical independent variable (H₀: no difference among all the variables).
- Model Input: Data Type C. Three or more input variables are required. Different variables are arranged in columns and all variables must have at least 6 data points each, with the same number of total data points or rows per variable. Must also have one variable for Categories, which can be alphanumeric.
  - Categories, Variables:

1. 1. - >VAR10
    - >VAR1; VAR2; VAR3; …

ANOVA (MANOVA 2-Factor Replication General Linear Model). Runs the Multiple ANOVA (MANOVA) with multiple numerical dependent variables against two alphanumeric categorical independent variables. Extends the Two-Way ANOVA to include multiple simultaneous dependent variables. The null hypothesis tested is that there is zero mean difference among all the variables. The computed statistics include the standard F statistics as well as Pillai’s Trace, Wilk’s Lambda, and Hotelling’s Trace, which modifies the degrees of freedom and sums of squares to adjust for the simultaneous tests of multiple dependent variables.
- Short Tip: MANOVA with multiple numerical dependent variables and two alphanumeric categorical independent variables (H₀: no difference among all the dependent variables compared against the independent variables and their interactions).
- Model Input: Data Type C. Four or more input variables are required. Different variables are arranged in columns and all variables must have at least 6 data points each, with the same number of total data points or rows per variable. Must also have two variables for Categories, which can be alphanumeric.
  - Categories, Variables:

1. 1. - >VAR10; VAR11
    - >VAR1; VAR2; VAR3; …

ANOVA (Randomized Blocks Multiple Treatments). The sampling distribution is assumed to be approximately normal and there exists a block variable for which ANOVA will control (i.e., block the effects of this variable by controlling it in the experiment). This analysis can test for the effects of the one dependent variable divided into different treatment groups as well as the effectiveness of the different levels of the one control or block variable. If the calculated p-value for the treatment or block is less than or equal to the significance level used in the test, then reject the null hypothesis and conclude that there is a significant difference among the different treatments or blocks.
- Short Tip: ANOVA with blocking variables (H₀: no difference among all the treatment variables and no effects of blocking variables).
- Model Input: Data Type C. Three or more input variables are required. Different treatment variables are arranged in columns and blocking variables are arranged in rows, and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:

- - - >VAR1; VAR2; VAR3; …

ANOVA (Single Factor Multiple Treatments). An extension of the two-variable t-test, looking at one numerical dependent variable against one categorical independent variable that is separated into multiple treatment groups, and when the sampling distribution is assumed to be approximately normal. A two-tailed hypothesis tests the null hypothesis such that the population means of each treatment are statistically identical to the rest of the group, indicating that there is no effect among the different treatment groups.
- Short Tip: Runs ANOVA with multiple treatments (H₀: no difference among all the treatment groups).
- Model Input: Data Type C. Three or more input variables are required. Different treatment variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:
    - >VAR1; VAR2; VAR3; …
ANOVA (Single Factor Repeated Measures). A modification of the ANOVA single-factor model looking at one numerical dependent variable that is tested repeatedly. These repeated measures are separated into multiple columns or test groups. A two-tailed hypothesis tests the null hypothesis such that the population means of each treatment is statistically identical to the rest of the group, indicating that there is no effect among the different repeated measurements groups.
- Short Tip: Runs ANOVA with repeated measures (H₀: no difference among all the repeated tests).
- Model Input: Data Type C. Three or more input variables are required. Different repeated test values are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:

- - - >VAR1; VAR2; VAR3; …

ANOVA (Two-Way Analysis). An extension of the Single Factor and Randomized Block ANOVAs by simultaneously examining the effects of one numerical dependent variable against two categorical independent variables (two factors along with the effects of interactions between the different levels of these two factors). Unlike the randomized block design, this model examines the interactions between different levels of the factors or independent variable In a two-factor experiment, interaction exists when the effect of a level for one factor depends on which level of the other factor is present. There are three sets of null and alternate hypotheses to be tested.
- Short Tip: Runs Two-Way ANOVA with multiple treatments with one numerical dependent variable and two categorical independent variables (H₀: no difference among all the treatment variables for each row factor, column factor, and interactions between factors).
- Model Input: Data Type C. Three or more input variables are required. Different column factor variables are arranged in columns and second replicated row factors are arranged as rows where all variables must have at least 4 data points each, with the same number of total data points or rows per variable. The total number of rows must be divisible by the number of row replications. For example, row factors can be arranged as A1, A1, A2, A2, A3, A3, A4, A4 for 8 rows with 4 factors, implying a replication of 2.
  - Variables, Replication:

1. 1. - >VAR1; VAR2; VAR3; …
    - >2

ARIMA. Autoregressive Integrated Moving Average is used for forecasting time-series data using its own historical data by itself or with exogenous or independent variables. The first segment is the autoregressive (AR) term corresponding to the number of lagged values of the residual in the unconditional forecast model. The model captures the historical variation of actual data to a forecasting model and uses this variation or residual to create a better predicting model. The second segment is the integration order (I) term corresponding to the number of differencing the time series to be forecasted goes through to make the data stationary. This element accounts for any nonlinear growth rates existing in the data. The third segment is the moving average (MA) term, which is essentially the moving average of lagged forecast errors. By incorporating this lagged forecast errors term, the model learns from its forecast errors or mistakes and corrects them through a moving average calculation. The ARIMA model follows the Box–Jenkins methodology with each term representing steps taken in the model construction until random noise remains.
- Short Tip: Runs the Autoregressive Integrated Moving Average ARIMA(p,d,q) model using historical time series and optionally with other exogenous variables.
- Model Input: Data Types A and C. One input variable is required, although additional exogenous variables can be added as required.
  - Historical Time-Series Variable, AR(p), I(d), MA(q), Iterations (Optional, default set at 100), Forecast Periods (Optional, default set at 5), Backcasting (Optional, default set at 0), Use Exogenous Variables (Optional, default set at 0), Exogenous Variables (Optional, default set at 0):

- - - >VAR1
    - >1
    - >0
    - >1

Auto ARIMA. Runs some common combinations of ARIMA models (low-order PDQ) and returns the best models.
- Short Tip: Runs the multiple ARIMA(p,d,q) models with low order p, d, q values, ranks and returns the best models.
- Model Input: Data Types A and C. One input variable is required, although additional exogenous variables can be added as required. ARIMA models typically require large amounts of data (e.g., 30–50 data points).
  - Historical Time-Series Variable, Iterations (Optional, default set at 100), Forecast Periods (Optional, default set at 5), Backcasting (Optional, default set at 0), Use Exogenous Variables (Optional, default set at 0), Exogenous Variables (Optional, default set at 0):

1. 1. - >VAR1

Auto Econometrics (Detailed and Quick). Runs some common combinations of Basic Econometrics and returns the best models using different algorithms.
- Short Tip: Runs Auto Econometrics by testing multiple combinations of models that provide the best fit for your data, including linear, nonlinear, logarithmic, and interaction models.
- Model Input: Data Type C. One dependent variable and one or multiple independent variables are required.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Autocorrelation and Partial Autocorrelation. One very simple approach to test for autocorrelation is to graph the time series of a regression equation’s residuals. If these residuals exhibit some cyclicality, then autocorrelation exists. Another more robust approach to detect autocorrelation is the use of the Durbin–Watson statistic, which estimates the potential for a first-order autocorrelation. The Durbin–Watson test employed also identifies model misspecification, that is, if a time-series variable is correlated to itself one period prior. Many time-series data tend to be autocorrelated to their historical Autocorrelation is applicable only to time-series data. This relationship can exist for multiple reasons, including the variables’ spatial relationships (similar time and space), prolonged economic shocks and events, psychological inertia, smoothing, seasonal adjustments of the data, and so forth.
- Short Tip: Runs Autocorrelation and Partial Autocorrelation on your time-series data up to 20 time-lag periods depending on data availability.
- Model Input: Data Type A. One input variable is required with at least 5 data points or rows of data.
  - Variable:

- - - >VAR1

Autocorrelation Durbin–Watson AR(1) Test. Runs the Durbin–Watson test for autocorrelation of one lag or AR(1) process.
- Short Tip: Runs the Durbin–Watson test for autocorrelation of one lag or AR(1) process.
- Model Input: Data Type C. One dependent variable and one or multiple independent variables are required.
  - Dependent Variable, Independent Variables:

- - - >VAR1
    - >VAR2; VAR3; …

Bonferroni Test (Single Variable with Repetition). The Bonferroni Test is an adjustment made to p-values when multiple dependent or independent statistical T-tests are being performed simultaneously on a single dataset. Simultaneous confidence intervals are computed and compared against multiple individual tests. This single variable with repetition corrections test is applied to one group of multiple variables at once.
- Short Tip: Corrects for p-values on multiple independent tests and runs simultaneous confidence intervals (H₀: the individual expected means are equal to the goals).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have the same number of data points or rows. The total number of goals needs to match the number of variables. Alpha is 0.05 by default and can optionally be changed by the user.
  - Variables, Goals Tested, Alpha Level (Optional, default is 0.05):

1. 1. - >VAR1; VAR2; VAR3; …
    - >7; 8; 5; …
    - >0.05

Bonferroni Test (Two Variables with Repetition). The Bonferroni Test is an adjustment made to p-values when multiple dependent or independent statistical T-tests are being performed simultaneously on a single dataset. Simultaneous confidence intervals are computed and compared against multiple individual tests. This two-variable with repetition corrections test is applied on two groups of multiple variables each. The null hypothesis tested is that the individual expected differences are all equal to zero.
- Short Tip: Corrects for p-values on multiple independent tests and runs simultaneous confidence intervals (H₀: the individual expected differences are equal to zero).
- Model Input: Data Type D. Two groups of variables are required. In each group, two or more input variables are required with the same number of data points or rows.
  - Group 1’s Variables, Group 2’s Variables, Alpha Level (Optional, default is 0.05):

1. 1. - >VAR1; VAR2; VAR3; …
    - >VAR4; VAR5; VAR6; …
    - >0.05

Box–Cox Normal Transformation. Takes your existing dataset and transforms it into normally distributed data. The original dataset is tested using the Shapiro–Wilk test for normality (H₀: data is assumed to be normal), then transformed using the Box–Cox method either using your custom Lambda parameter or internally optimized Lambda. The transformed data is tested again for normality using Shapiro–Wilk.
- Short Tip: Transforms your existing data into normally distributed data that is tested using Shapiro–Wilk (H₀: data is assumed to be normal) and visualized in a QQ Chart.
- Model Input: Data Type A. One input variable is required with at least 5 rows of data. Optionally enter a nonzero Lambda value (positive or negative values only, but no zeros are allowed).
  - Variable, Lambda (Optional, default computed internally but can be overridden with any nonzero value):

1. 1. - >VAR1
    - >0.2

Box’s Test for Homogeneity of Covariance. Runs the Box Test of Covariance Homogeneity of two groups of variables’ covariance The null hypothesis tested is that there is zero difference between the two covariance matrices.
- Short Tip: Tests if two covariance matrices are homogeneous (H₀: no difference between the two covariance matrices).
- Model Input: Data Type D. Two groups of variables are required. In each group, two or more input variables are required with at least 5 data points each and the same total number of data points or rows.
  - Group 1’s Variables, Group 2’s Variables:

- - - >VAR1; VAR2; VAR3; …
    - >VAR4; VAR5; VAR6; …

Charts. Generates various 2D and 3D charts(area, bar, line, point, and scatter) as well as QQ charts, Box-Whisker charts, and Pareto Most of these charts take Data Types A and C (this just means either one or multiple series will be charted), with the exception of Pareto Charts, which require only Data Type C.
- 2D and 3D Area, Bar, Line, Point, Scatter.
  - Short Tip: Generates the selected 2D or 3D chart with one, two, three, or multiple variables.
  - Model Input: Data Types A and C. One or more input variables are required with at least 3 rows of data. Optionally add other variables to chart.

- - - - Variables:

- - - - >VAR1; VAR2; VAR3; …
Box-Whisker Charts. Box plots or box-and-whisker plots graphically depict numerical data using their descriptive statistics: the smallest observation (Minimum), First Quartile or 25th Percentile (Q1), Median or Second Quartile or 50th Percentile (Q2), Third Quartile (Q3), and largest observation (Maximum). A box plot may also indicate which observations, if any, might be considered outliers.
- Short Tip: Generates a Box-Whisker chart with one, two, three, or multiple variables.
- Model Input: Data Types A and C. At least one input variable is required with at least 3 rows of data. Optionally add other variables to chart.

1. - Variables:

- - - >VAR1; VAR2; VAR3; …
2D and 3D Pareto Charts. A Pareto chart contains both a bar chart and a line graph. Individual values are represented in descending order by the bars and the cumulative total is represented by the ascending line. Also known as the “80-20” chart, whereby you see that by focusing on the top few variables, we are already accounting for more than 80% of the cumulative effects of the total.
- Short Tip: Generates a 2D and 3D Pareto chart with two, three, or multiple variables. Each variable has one data point only.
- Model Input: Data Type C. At least two or more input variables are required with exactly 1 row of data for each variable. Optionally add other variables to chart.

1. - Variables:

- - - >VAR1; VAR2; VAR3; …
Q-Q Normal Chart. This Quantile-Quantile chart is a normal probability plot, which is a graphical method for comparing a probability distribution with the normal distribution by plotting their quantiles against each other.
- Short Tip: Generates QQ Normal chart where the CDF distribution is mapped against the user’s raw data to see its fit.
- Model Input: Data Type A. Only one input variable is required.

1. - Variable:

- - - >VAR1
Coefficient of Variation Homogeneity Test. Returns the coefficient of variation (CV) calculations for each of the input variables (standard deviation divided by the mean), as a unitless and relative measure of risk and uncertainty. Then, a pooled Chi-Square test is applied to test the null hypothesis that these CV values are homogeneous and statistically similar, and a Shapiro–Wilk test is also applied to test the normality of the variables’ dataset.
- Short Tip: Tests if the coefficient of variations from different variables are similar (H₀: all CVs are equal or homogeneous).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each. A different number of total data points or rows per variable is allowed.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Cointegration Test (Engle–Granger). Runs the Engle–Granger test for any cointegration of two nonstationary time-series variables. If there are two time-series variables that are nonstationary to order one, I(1), and if a linear combination of these two series is stationary at I(0), then these two variables are, by definition, cointegrated. Many macroeconomic data are I(1), and conventional forecasting and modeling methods do not apply due to the nonstandard properties of unit root processes. The Cointegration test can be applied to identify the presence of cointegration, and if confirmed to exist, a subsequent Error Correction Model can then be used to forecast the time-series variables.
- Short Tip: Runs the Engle–Granger test for any cointegration of two nonstationary time-series variables.
- Model Input: Data Type B. Exactly two input variables are required. Variables are arranged in columns and both variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:

1. 1. - >VAR1; VAR2

Combinatorial Fuzzy Logic. Applies fuzzy logic algorithms for forecasting time-series data by combining forecast methods to create an optimized model. Fuzzy logic is a probabilistic logic dealing with the reasoning that is approximate rather than fixed and exact where fuzzy logic variables may have a truth value that ranges in degree between 0 and 1.
- Short Tip: Computes time-series forecasts using fuzzy logic combining and optimizing multiple forecast methods into one unified forecast.
- Model Input: Data Type A. Only one input variable is required.
  - Variable

1. 1. - >VAR1

Control Charts: C, NP, P, R, U, X, XMR. Sometimes specification limits of a process are not set; instead, statistical control limits are computed based on the actual data collected (e.g., the number of defects in a manufacturing line). The upper control limit (UCL) and lower control limit (LCL) are computed, as are the central line (CL) and other sigma levels. The resulting chart is called a control chart, and if the process is out of control, the actual defect line will be outside of the UCL and LCL lines for a certain number of times.
- C Chart. The variable is an attribute (e.g., defective or non defective), the data collected are in the total number of defects (actual count in units), and there are multiple measurements in a sample experiment; when multiple experiments are run and the average number of defects of the collected data is of interest; and a constant number of samples collected in each experiment.
  - Short Tip: Control C Chart depicting and measuring upper and lower control levels on the number of defects.
  - Model Input: Data Type A. Only one input variable is required.

1. 1. - Defective Units

- - - - >VAR1
NP Chart. The variable is an attribute (e.g., defective or non defective), the data collected are in proportions of defects (or the number of defects in a specific sample), and there are multiple measurements in a sample experiment; when multiple experiments are run and the average proportion of defects of the collected data is of interest; and a constant number of samples collected in each experiment.
- Short Tip: Control NP Chart depicting and measuring upper and lower control levels on the proportions of defects.
- Model Input: Data Type A. Only one input variable is required, and a second manual numerical input of the sample size.

1. - Defective Units, Sample Size:

- - - >VAR1
    - 20

P Chart. The variable is an attribute (e.g., defective or non defective), the data collected are in proportions of defects (or a number of defects in a specific sample), and there are multiple measurements in a sample experiment; when multiple experiments are run and the average proportion of defects of the collected data is of interest; and with a different number of samples in each experiment.
- Short Tip: Control P Chart depicting and measuring upper and lower control levels using defective units compared to sample size.
- Model Input: Data Type B. Only one input variable is required, and a second manual numerical input of the sample size.

- - Defective Units, Sample Size:

- - - >VAR1
    - >VAR2

R Chart. The variable has raw data values, there are multiple measurements in a sample experiment, multiple experiments are run, and the range of the collected data is of interest.
- Short Tip: Control R Chart depicting and measuring upper and lower control levels using repeated defective unit measurements.
- Model Input: Data Type C. Multiple variables of measurements of defective units required.

1. - Defective Units Measurement Variables:

- - - >VAR1; VAR2; VAR3; …
U Chart. The variable is an attribute (e.g., defective or non defective), the data collected are in the total number of defects (actual count in units), and there are multiple measurements in a sample experiment; when multiple experiments are run and the average number of defects of the collected data is of interest; and with a different number of samples collected in each experiment.
- Short Tip: Control U Chart depicting and measuring upper and lower control levels on the total units of defects.
- Model Input: Data Type A. Only one input variable is required, and a second manual numerical input of the sample size.

1. - Defective Units, Sample Size:

- - - >VAR1
    - >20
X Chart. The variable has raw data values, there are multiple measurements in a sample experiment, multiple experiments are run, and the range of the collected data is of interest.
- Short Tip: Control X Chart depicting and measuring upper and lower control levels using multiple repeated defective unit measurements.
- Model Input: Data Type C. Multiple variables of measurements of defective units required.

1. - Defective Units Measurement Variables:

- - - >VAR1; VAR2; VAR3; …
XMR Chart. The variable has raw data values, there is a single measurement taken in each sample experiment, multiple experiments are run, and the actual value of the collected data is of interest.
- Short Tip: Control XMR Chart depicting and measuring upper and lower control levels on the total units of defects.
- Model Input: Data Type A. Only one input variable is required, and a second manual numerical input of the sample size.

1. - Defective Units:

- - - >VAR1
Correlation Matrix (Linear and Nonlinear). Computes the Pearson‘s linear product-moment correlations (commonly referred to as the Pearson’s R) as well as the nonlinear Spearman rank-based correlation between variable pairs and returns them as a correlation matrix. The correlation coefficient ranges between –1.0 and +1.0, inclusive. The sign indicates the direction of association between the variables, while the coefficient indicates the magnitude or strength of association.
- Short Tip: Runs the linear Pearson and nonlinear nonparametric Pearson correlations as well as significance p-values (H₀: each correlation is equal to zero).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of data points for all variables.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Covariance Matrix. Runs the variance-covariance matrix for a sample and population as well as Pearson’s linear correlation matrix. For additional details on correlations, run the Correlation Matrix method instead for Pearson’s linear, nonparametric Spearman rank nonlinear, and significant p-values on correlations.
- Short Tip: Generates variance-covariance and correlation matrices.
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have the same number of data points or rows.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3…

Cox Regression. Runs the Cox’s proportional hazards model for survival time and tests the effect of several variables upon the time a specified event takes to happen.
- Short Tip: Runs a Cox Regression proportional hazards model.
- Model Input: Data Type C. Multiple input variables are required. Different variables are arranged in columns and all variables must have the same number of data points or rows.
  - Survived, Dead, Independent Variables:

1. 1. - >VAR1
    - >VAR2
    - >VAR3, VAR4; VAR5; …

Cubic Spline. Interpolates missing values of a time-series dataset and extrapolates values of future forecast periods using nonlinear Spline curves can also be used to forecast or extrapolate values of future time periods beyond the time period of available data and the data can be linear or nonlinear.
- Short Tip: Interpolates and extrapolates a data series with missing values.
- Model Input: Data Type B. Two input variables are required. Different variables are arranged in columns and all variables must have at least 5 data points each, with the same number of total data points or rows per variable.
  - Known X Values, Known Y Values, Starting Period, Ending Period, Step Size:

1. 1. - >VAR1
    - >VAR2
    - >3
    - >8
    - >0.5

Custom Econometric Model. Applicable for forecasting time-series and cross-sectional data and for modeling relationships among variables and allows you to create custom multiple regression Econometrics refers to a branch of business analytics, modeling, and forecasting techniques for modeling the behavior of or forecasting certain business, financial, economic, physical science, and other variables. Running the Basic Econometrics models is like regular regression analysis except that the dependent and independent variables can be modified before a regression is run.
- Short Tip: Customizes your linear and nonlinear regression model using custom independent variables.
- Model Input: Data Type C. One dependent variable and multiple independent variables are required.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; LN(VAR3); (VAR4)^2; LAG(VAR5,1); (VAR6*VAR7)

Data Analysis: Cross Tabulation. Used to find alphanumeric values (number and word combinations) and to find unique values and then perform a cross-tabulation.
- Short Tip: Runs Cross Tabulation on unique alphanumeric values or text.
- Model Input: Data Type B. Two input variables are required. Both variables can be numerical, alphabetical, or alphanumeric.
  - Variable 1 (Alphanumeric), Variable 2 (Alphanumeric):

1. 1. - >VAR1
    - >VAR2

Data Analysis: New Values Only. Used to find new values in the Main Variable that do not exist in the Reference Variables, and to find the values that already exist in the Reference Variable as well as values that are Duplicates if both variables are combined.
- Short Tip: Finds alphanumeric data in the main variable that either exists or does not exist in the reference variable, as well as identifies duplicates if both variables are combined.
- Model Input: Data Type B. Two input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Main Variable (Alphanumeric), Reference Variable (Alphanumeric), Number of Characters (Optional, default set to all characters):

1. 1. - >VAR1
    - >VAR2
    - >5

Data Analysis: Subtotal by Category. Used to find value subtotals based on unique categories.
- Short Tip: Computes subtotals based on unique categories.
- Model Input: Data Type B. Two input variables are required: Category can be alphanumeric whereas Values must be numerical.
  - Category (Alphanumeric), Values (Numeric):
    - >VAR1

Data Analysis: Unique Values Only. Identifies the values that are unique in each variable. Data can be alphanumeric, and the first N characters can optionally be used to determine uniqueness.
- Short Tip: Finds unique alphanumeric values in each variable.
- Model Input: Data Types B and C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Main Variables (Alphanumeric), Number of Characters:
    - >VAR1; >VAR2; VAR3; …
    - >5

Data Descriptive Statistics. Almost all distributions can be described within four moments(some distributions require one moment, while others require two moments, and so forth). This tool computes the four moments and associated descriptive statistics.
- Short Tip: Computes various moments and descriptive statistics.
- Model Input: Data Type A. One input variable is required.
  - Variable:
    - >VAR1

Deseasonalize. This model deseasonalizes and detrends your original data to take out any seasonal and trending components. In forecasting models, the process eliminates the effects of accumulating datasets from seasonality and trend to show only the absolute changes in values and to allow potential cyclical patterns to be identified by removing the general drift, tendency, twists, bends, and effects of seasonal cycles of a set of time-series
- Short Tip: Deseasonalizes a time-series dataset.
- Model Input: Data Type A. One input variable is required, and the number of periods per season.
  - Variable, Periodicity:
    - >VAR1
    - >4

Discriminate Analysis (Linear). LDA (Linear Discriminant Analysis) is related to ANOVA and multivariate regression analysis, which attempt to model one dependent variable as a linear combination of other independent variables. Discriminant Analysis has continuous independent variables and a categorical dependent variable.
- Short Tip: Continuous independent variables are used to linearly explain and model a categorical dependent variable.
- Model Input: Data Type C. One dependent variable(categorical data) and one or multiple independent variables.
  - Categorical Dependent Variable, Independent Variables:
    - >VAR1
    - >VAR2; VAR3; …

Discriminate Analysis (Quadratic). QDA (Quadratic Discriminant Analysis) is related to ANOVA and multivariate regression analysis, which attempt to model one dependent variable as a nonlinear combination of other independent variables. Discriminant Analysis has continuous independent variables and a categorical dependent variable. This QDA is a nonlinear version of the LDA (Linear Discriminant Analysis).
- Short Tip: Continuous independent variables are used to nonlinearly explain and model a categorical dependent variable.
- Model Input: Data Type C. One dependent variable(categorical data) and one or multiple independent variables.
  - Categorical Dependent Variable, Independent Variables:
    - >VAR1
    - >VAR2; VAR3; …

Distributional Fitting. Which distribution does an analyst or engineer use for an input variable in a model? What are the relevant distributional parameters? The null hypothesis tested is that the fitted distribution is the same distribution as the population from which the sample data to be fitted comes.
- Short Tip: Performs various distributional fitting methods to identify the best-fitting distribution.
- Model Input: Data Type A. One variable is required.
  - Variable:
    - >VAR1

- Akaike Information Criterion(AIC). Rewards goodness-of-fit but also includes a penalty that is an increasing function of the number of estimated parameters (although AIC penalizes the number of parameters less strongly than other methods).
- Anderson–Darling(AD).When applied to testing if a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting departures from normality and is powerful for testing normal tails. However, in non-normal distributions, this test lacks power compared to others.
- Kolmogorov–Smirnov(KS). A nonparametric test for the equality of continuous probability distributions can be used to compare a sample with a reference probability distribution, making it useful for testing abnormally-shaped distributions and non-normal distributions.
- Kuiper’s Statistic (K). Related to the KS test making it as sensitive in the tails as at the median and is invariant under cyclic transformations of the independent variable. This test is invaluable when testing for cyclic variations over time. In comparison, the AD provides equal sensitivity at the tails as the median, but it does not provide the cyclic invariance.
- Schwarz/Bayes Information Criterion(SC/BIC). The SC/BIC introduces a penalty term for the number of parameters in the model with a larger penalty than AIC.
- Discrete (Chi-Square).The Chi-Square test is used to perform distributional fitting on discrete data.

Diversity Index (Shannon, Brillouin, Simpson). Diversity measures the probability distribution of observations or frequencies among different categories and computes the probability that any two items randomly selected will belong to the same category. Three indices are computed: Shannon’s Diversity Index for a sample of categorical data (frequencies of occurrence among different categories), Brillouin’s Diversity Index for when the entire population is present, and Simpson’s Diversity Index for sampling with replacement within a large population. The closer the diversity indices to the maximum, the higher the level of diversity.
- Short Tip: Measures the diversity of a dataset using frequencies of various categories as inputs. The closer the diversity indices to the maximum, the higher the level of diversity.
- Model Input: One input variable is required with at least 3 rows of data.
  - Frequency:
    - >VAR1

Eigenvalues and Eigenvectors. Runs and calculates the Eigenvalues and Eigenvectors of your data matrix.
- Short Tip: Calculates the Eigenvalues and Eigenvectors of your data matrix.
- Model Input: Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable. The total number of variables must match the number of rows, i.e., the data entered should be in an N × N matrix.
  - Variables:
    - >VAR1; VAR2; VAR3; …

Endogeneity Test with Two-Stage Least Squares (Durbin–Wu–Hausman). Tests if a regressor is endogenous using the two-stage least squares method and applying the Durbin–Wu–Hausman test. A Structural Model and a (2SLS) Reduced Model are both computed in a 2SLS paradigm, and a Hausman test is administered to test if one of the variables is endogenous.
- Short Tip: Tests if a regressor is endogenous using the two-stage least squares method and applying the Durbin–Wu–Hausman test.
- Model Input: Data Type C.
  - Structural Dependent Variable, Test Variable, Structural Independent Variables, Reduced Equation Independent Variables:
    - VAR1
    - VAR2
    - VAR3
    - VAR4; VAR5; VAR6; VAR7

Endogenous Model (Instrumental Variables with Two-Stage Least Squares). Runs two-stage least squares with instrumental variables on a bivariate model for slope estimation.
- Short Tip: Runs two-stage least squares with instrumental variables on a bivariate model for slope estimation.
- Model Input: Data Type C.
  - Dependent Variable, Endogenous Variable, Instrumental Variables:
    - VAR1
    - VAR2
    - VAR3; VAR4; VAR5; VAR6; VAR7

Error Correction Model (Engle–Granger). Runs an error correction model assuming the variables exhibit cointegration. If two time-series variables are nonstationary in the first order, I(1), and when both variables are cointegrated, we can run an error correction model for estimating the short-term and long-term effects of one time-series on another. The error correction comes from previous periods’ deviation from a long-run equilibrium, where the error influences its short-run dynamics.
- Short Tip: Runs an error correction model assuming the variables exhibit cointegration.
- Model Input: Data Type B. Exactly two input variables are required. Variables are arranged in columns and both variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Dependent Variable, Independent Variable:
    - >VAR1
    - >VAR2

Exponential J-Curve. This method generates an exponential growth where the value of the next period depends on the current period’s level and the increase is exponential. Over time, the values will increase significantly from one period to another. This model can be used in forecasting biological growth and chemical reactions over time.
- Short Tip: Generates a time-series forecastusing Exponential J-curve.
- Model Input: Data Type A. Requires three simple manual inputs: starting value of the forecast, the periodic growth rate in percent, and the total number of periods to forecast.
  - Starting Value, Growth Rate (%), Forecast Periods:
    - >400
    - >3
    - >100

Factor Analysis. Runs Factor Analysis to analyze interrelationships within large numbers of variables and simplifying said factors into a smaller number of common factors. The method condenses information contained in the original set of variables into a smaller set of implicit factor variables with minimal loss of information. The analysis is related to the Principal Component Analysis (PCA) by using the correlation matrix and applying PCA coupled with a Varimax matrix rotation to simplify the factors.
- Short Tip: Runs Factor Analysis to analyze interrelationships within large numbers of variables and simplifying said factors into a smaller number of common factors.
- Model Input: Data Type C. Requires at least three or more variables with an equal number of rows.
  - Variables:
    - >VAR1; VAR2; VAR3; …

Forecast Accuracy: All Goodness of Fit Measures. Runs various forecast accuracy and forecast errormeasurements using actual and forecast values. Models to run include multiple R, R-squared, standard error of estimates, Akaike, Bayes, Log-Likelihood, Hannan–Quinn, SSE (sums of squared errors), MAD (mean absolute deviation), MAPE (mean absolute percentage error), MSE (mean squared error), RMSE (root mean squared error), MdAE (median absolute error), MdAPE (median absolute percentage error), RMSLE (root mean square log error), RMSPE (root mean square percentage error loss), RMdSPE (root median square percentage error loss), sMAPE (symmetrical mean absolute percentage error), Theil’s U1 (Theil’s measure for accuracy), and Theil’s U2 (Theil’s measure for quality).
- Short Tip: Runs various forecast accuracy and forecast errormeasurements using your forecast errors.
- Data Input: Data Type B. Actual data variable, forecast data variable, and a manual input of the number of regressors used to generate your forecast and subsequent errors.
  - Actuals, Forecasts, Total Number of Variables (Dep. + Indep.):
    - >VAR1
    - >VAR2
    - >6

Forecast Accuracy: Akaike, Bayes, Schwarz, MAD, MSE, RMSE. Runs various forecast accuracy and forecast error measurements using your forecast errors. Models to run include the Akaike Information Criterion (AIC), Bayes and Schwarz Criterion (BSC), AIC Correction (Augmented AIC), BSC Correlation (Augmented BSC), Mean Absolute Deviation (MAD), Mean Squared Errors (MSE), and Root Mean Squared Error (RMSE).
- Short Tip: Runs various forecast accuracy and forecast error measurements using your forecast errors.
- Data Input: Data Type A. One input variable of the forecast errors is required, as is a manual input of the number of regressors used to generate your forecast and subsequent errors.
  - Forecast Errors, Total Number of Variables (Dep. + Indep.):
    - >VAR1
    - >6

Forecast Accuracy: Diebold–Mariano (Dual Competing Forecasts). Runs the Diebold–Mariano Test and Harvey–Leybourne–Newbold Test comparing two forecasts to see if there is a difference. The null hypothesis tested is that there is no significant difference between the two forecasts.
- Short Tip: Tests if two forecasts are similarly valid (H₀: no difference between the two forecasts).
- Model Input: Data Type C. Three variables are required: Actual Data, First Forecasted Data, and Second Forecasted Data, with at least 5 rows of data for each variable. Each variable must have the same number of rows.
  - Actual (=1), Forecast 1; Forecast 2:
    - >VAR1
    - >VAR2; VAR3

Forecast Accuracy: Pesaran–Timmermann (Single Directional Forecast). Runs the Pesaran–Timmermann Test to see if the forecast can adequately track directional changes in the data. The null hypothesis tested is that the forecast does not track directional changes in the data.
- Short Tip: Tests if the forecast adequately tracked directional changes in the data (H₀: forecast does not track directional changes).
- Model Input: Data Type B. Two variables are required: Actual Data and Forecast, with at least 5 rows of data for each variable. Each variable must have the same number of rows.
  - Actual, Forecast:

- - - >VAR1
    - >VAR2

Generalized Linear Models (Logit with Binary Outcomes). Limited dependent variables techniques are used to forecast the probability of something occurring given some independent variables (e.g., predicting if a credit line will default given the obligor’s characteristics such as age, salary, credit card debt levels; or the probability a patient will have lung cancer based on age and number of cigarettes smoked monthly, and so forth). The dependent variable is limited (i.e., binary 1 and 0 for default/cancer, or limited to integer values 1, 2, 3, etc.). Traditional regression analysis will not work as the predicted probability is usually less than zero or greater than one, and many of the required regression assumptions are violated (e.g., independence and normality of the errors). We also have a vector of independent variable regressors, X, which are assumed to influence the outcome, Y. A typical ordinary least squares regression approach is invalid because the regression errors are heteroskedastic and non-normal, and the resulting estimated probability estimates will return nonsensical values of above 1 or below 0. This analysis handles these problems using an iterative optimization routine to maximize a log-likelihood function when the dependent variables are limited.
- Short Tip: Runs a Binary Logistic Regression model with one binary (0/1) dependent variable and multiple independent variables.
- Model Input: Data Type C. One binary dependent variable is required with 0 and 1 values, and multiple continuous or categorical independent variables.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Generalized Linear Models (Logit with Bivariate Outcomes). Runs the Multivariate Logistic Regression or Logit Model with two dependent bivariate variables (Number of Successes and Failures) that are dependent on one or more independent variables. Instead of the standard Logit Model that requires raw data of 0 and 1 as a single variable, we can use this Generalized Linear Model (GLM) Logit with Binary Outcomes model with successes and failures (frequency counts) as two separate variables.
- Short Tip: Runs the General Linear Model Logit Regression with two dependent variables (counts of successes and failures).
- Model Input: Data Type C. Two Dependent Variables are required (Number of Successes and Failures), and one or more Independent Variables are allowed, with the same number of total data points or rows per variable.
  - Independent Variables, Successes, Failures:

1. 1. - >VAR1; VAR2: …
    - >VAR3
    - >VAR4

Generalized Linear Models (Probit with Binary Outcomes). A Probit model (sometimes also known as a Normit model) is a popular alternative specification for a binary response model. It employs a Probit function estimated using maximum likelihood estimation and is called Probit regression. The Probit and logistic regression models tend to produce very similar predictions where the parameter estimates in a logistic regression tend to be 1.6 to 1.8 times higher than they are in a corresponding Probit model. The choice of using a Probit or Logit is entirely up to convenience, and the main distinction is that the logistic distribution has a higher kurtosis (fatter tails) to account for extreme value For example, suppose that house ownership is the decision to be modeled, and this response variable is binary (home purchase or no home purchase) and depends on a series of independent variables X_i such as income, age, and so forth, such that I_i = β₀ + β₁X₁ +…+ β_nX_n, where the larger the value of I_i, the higher the probability of home ownership. For each family, a critical I^* threshold exists, where if exceeded, the house is purchased, otherwise, no home is purchased, and the outcome probability (P ) is assumed to be normally distributed, such that P_i = CDF( I) using a standard-normal cumulative distribution function (CDF). Therefore, use the estimated coefficients exactly like that of a regression model and, using the estimated Y, apply a standard-normal distribution to compute the probability.
- Short Tip: Runs a Binary Probit Regression model with one binary (0/1) dependent variable and multiple independent variables.
- Model Input: Data Type C. One binary dependent variable is required with 0 and 1 values, and multiple continuous or categorical independent variables.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Generalized Linear Models (Probit with Bivariate Outcomes). Runs the Multivariate Probit Regression or Probit Model with two dependent bivariate variables (Number of Successes and Failures) that are dependent on one or more independent variables. Instead of the standard Probit Model that requires raw data of 0 and 1, we can use this Generalized Linear Model (GLM) Probit with Binary Outcomes model where successes and failures are frequency counts.
- Short Tip: Runs the General Linear Model Probit Regression with two dependent variables (counts of successes and failures).
- Model Input: Data Type C. Two Dependent Variables are required (Number of Successes and Failures), and one or more Independent Variables are allowed, with the same number of total data points or rows per variable.
  - Independent Variables, Successes, Failures:

1. 1. - >VAR1; VAR2: …
    - >VAR3
    - >VAR4

Generalized Linear Models (Tobit with Censored Data). The Tobit model(Censored Tobit) is an econometric and biometric modeling method used to describe the relationship between a non-negative dependent variable Y_i and one or more independent variables X_i. A Tobit model is an econometric model in which the dependent variable is censored; that is, the dependent variable is censored because values below zero are not observed. The Tobit model assumes that there is a latent unobservable variable Y ^*. This variable is linearly dependent on the X_i variables via a vector of β_i coefficients that determine their interrelationships. In addition, there is a normally distributed error term U_i to capture random influences on this relationship. The observable variable Y_i is defined to be equal to the latent variables whenever the latent variables are above zero, and Y_i is assumed to be zero otherwise. That is, Y_i = Y ^* if Y ^* > 0 and Y_i = 0 if Y ^* = 0. If the relationship parameter β_i is estimated by using ordinary least squares regression of the observed Y_i on X_i, the resulting regression estimators are inconsistent and yield downward-biased slope coefficients and an upward-biased intercept.
- Short Tip: Runs a Tobit Regression model with one limited or censored dependent variable and multiple independent variables.
- Model Input: Data Type C. One censored dependent variable is required and multiple continuous or categorical independent variables.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Granger Causality. Tests if one variable Granger causes another variable and vice versa, using restricted autoregressive lags and unrestricted distributive lag models. Predictive causality in finance and economics is tested by measuring the ability to predict the future values of a time series using prior values of another time series. A simpler definition might be that a time-series variable X Granger causes another time-series variable Y if predictions of the value of Y based solely on its own prior values and on the prior values of X are comparatively better than predictions of Y based solely on its own past values.
- Short Tip: Tests if one variable Granger causes another variable and vice versa, using restricted autoregressive lags and unrestricted distributive lag models.
- Model Input: Data Type B. Exactly two input variables are required. Variables are arranged in columns and both variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables, Maximum Lags:

1. 1. - >VAR1; VAR2
    - >3

Grubbs Test for Outliers. Runs the Grubbs test for outliers to test the null hypothesis if all the values are from the same normal population without outliers.
- Short Tip: Tests for outliers in your data (H₀: there are no outliers).
- Model Input: Data Type A. One input variable is required with at least 3 rows of data.
  - Variable:

- - - >VAR1

Heteroskedasticity Test (Breusch–Pagan–Godfrey). Runs the Breusch–Pagan–Godfrey Test for Heteroskedasticity. It uses the main model to obtain error estimates and, using squared estimates, a restricted model is run, and the Breusch–Pagan–Godfrey test is computed. The null hypothesis is that the time series is homoskedastic.
- Short Tip: Tests for heteroskedasticity (H₀: time series is homoskedastic).
- Data Input: Data Type C. One Dependent Variable and one or multiple Independent Variables are required.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; VAR4; …

Heteroskedasticity Test (Lagrange Multiplier). Runs the Lagrange Multiplier Test for Heteroskedasticity. It uses the main model to obtain error estimates and, using squared estimates, a restricted model is run, and the Lagrange Multiplier test is computed. The null hypothesis is that the time series is homoskedastic.
- Short Tip: Tests for heteroskedasticity (H₀: time series is homoskedastic).
- Data Input: Data Type C. One Dependent Variable and one or multiple Independent Variables are required.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; VAR4; …

Heteroskedasticity Test (Wald–Glejser). Runs the Wald–Glejser Test for Heteroskedasticity. It uses the main model to obtain error estimates and, using squared estimates, a restricted model is run, and the Wald–Glejser test is computed. The null hypothesis is that the time series is homoskedastic.
- Short Tip: Tests for heteroskedasticity(H₀: time series is homoskedastic).
- Data Input: Data Type C. One Dependent Variable and one or multiple Independent Variables are required.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; VAR4; …

Heteroskedasticity (Wald’s on Individual Variables). Several tests exist to test for the presence of heteroskedasticity, that is, volatilities or uncertainties (standard deviation or variance of a variable is non constant over time). Applicable only to time-series data, these tests can also be used for testing misspecifications and nonlinearities. The test is based on the null hypothesis of no heteroskedasticity.
- Short Tip: Runs the Wald’s test for heteroskedasticity on each of the independent variables (H₀: each independent variable is homoskedastic).
- Model Input: Data Type C. One dependent variable and multiple continuous or categorical independent variables are required.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Hotelling T-Square: 1 VAR with Related Measures. Runs the Hotelling T-Square Test for one sample set of multiple related features (variables). For example, features such as usefulness, attractiveness, durability, and interest level on a single new product are collected and listed as column variables. The null hypothesis tested is that there is zero difference between all the related features (variables) against their respective goals. The Hotelling T-Square for One Variable with Related Measures is an extension of the T-Test for Independent Variables and Bonferroni adjustments applied simultaneously to multiple variables.
- Short Tip: Simultaneously tests multiple features of one group of multiple variables (H₀: no difference between the feature variables against their goals).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables (features) must have at least 5 data points each, with the same number of total data points or rows per variable. The number of Goals entered must match the number of variables.
  - Data, Goals:

1. 1. - >VAR1; VAR2; VAR3; …
    - >7
    - >8

Hotelling T-Square: 2 VAR Dependent Pair with Related Measures.Runs the Hotelling T-Square Two Paired Group Test for two sample sets of multiple related features (variables). For example, features such as usefulness, attractiveness, durability, and interest level on two new products are collected and listed as column variables. The null hypothesis tested is that there is zero difference between all the related features (variables) compared across the two groups against their respective goals. The Hotelling T-Square for Two Dependent Variables with Related Measures is an extension of the T-Test for Dependent Variables and Bonferroni adjustments applied simultaneously to multiple paired variables.
- Short Tip: Simultaneously tests multiple features of two groups of paired variables (H₀: no difference between the feature variables of the two groups against their respective goals).
- Model Input: Data Type D. Exactly two groups are required. Two or more input variables are required in each group. Different variables are arranged in columns and all variables (features) must have at least 5 data points each. All variables in both groups must have an equal number of data rows. The number of Group 2 Variables needs to be equal to the number of Group 1 Variables. The number of Goals must match the number of variables in Group 1 and are optional inputs (the default setting is that all goals are equal to 0).
  - Group 1 Variables, Group 2 Variables, Goals:

1. 1. - >VAR1; VAR2; VAR3; …
    - >VAR6; VAR7; VAR8; …
    - >7
    - >8

Hotelling T-Square: 2 VAR Indep. Equal Variance with Related Measures. Runs the Hotelling T-Square Two Independent Groups with Equal Variance Related Measures Test for two sample sets of multiple related features (variables). For example, features such as usefulness, attractiveness, durability, and interest level on two new products are collected and listed as column variables. The null hypothesis tested is that there is zero difference between all the related features (variables) compared across the two groups. The Hotelling T-Square Two Independent Groups with Equal Variance Related Measures is an extension of the T-Test for Independent Variables with Equal Variance and Bonferroni adjustments applied simultaneously to multiple independent variables.
- Short Tip: Simultaneously tests multiple features of two groups of multiple variables with equal variance (H₀: no difference between the feature variables of the two groups).
- Model Input: Data Type D. Exactly two groups are required. Two or more input variables are required in each group. Different variables are arranged in columns and all variables (features) must have at least 5 data points each. All variables in each group must have an equal number of rows, but the number of rows in Group 1 and Group 2 can be different. The number of Group 2 Variables needs to be equal to the number of Group 1 Variables.
  - Group 1 Variables, Group 2 Variables:

1. 1. - >VAR1; VAR2; VAR3; …
    - >VAR6; VAR7; VAR8; …

Hotelling T-Square: 2 VAR Indep. Unequal Variance with Related Measures. Runs the Hotelling T-Square Two Independent Groups with unequal Variance Related Measures Test for two sample sets of multiple related features (variables). For example, features such as usefulness, attractiveness, durability, and interest level on two new products are collected and listed as column variables. The null hypothesis tested is that there is zero difference between all the related features (variables) compared across the two groups. The Hotelling T-Square Two Independent Groups with Unequal Variance Related Measures is an extension of the T-Test for Independent Variables with Equal Variance and Bonferroni adjustments applied simultaneously to multiple independent variables.
- Short Tip: Simultaneously tests multiple features of two groups of multiple variables with unequal variance (H₀: no difference between the feature variables of the two groups).
- Model Input: Data Type D. Exactly two groups are required. Two or more variables are required in each group, with more than 5 rows of data in each variable. All variables in each group must have an equal number of rows, but the number of rows in Group 1 and Group 2 can be different. The number of Group 2 Variables needs to be equal to the number of Group 1 Variables.
  - Group 1 Variables, Group 2 Variables:

1. 1. - >VAR1; VAR2; VAR3; …
    - >VAR6; VAR7; VAR8; …

Internal Consistency Reliability: Cronbach’s Alpha (Dichotomous Data). Cronbach’s Alpha measures the internal consistency and reliability for continuous and non-dichotomous data including questionnaire and Likert scale data. A high alpha (> 0.7) implies strong reliability versus the null hypothesis tested for alpha equals zero where there is no internal consistency and no reliability among the raters. Each question is set up as different column variables versus the rows of data, which are different raters’ assessments or answers.
- Short Tip: Checks for internal consistency and reliability of different peoples’ responses to the same questions (H₀: there is zero alpha reliability and there is no internal consistency).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 5 data points each, with the same number of total data points or rows per variable.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Internal Consistency Reliability: Guttman’s Lambda and Split Half Model. Internal consistency and reliability imply that the measurements of an experiment will be consistent over repeated tests of the same subject under identical conditions. The Guttman’s test and Split Half test take an existing dataset and divide it into multiple replicable internal tests. These tests measure the consistency and reliability of different responses to the same question where low correlations and lambda scores mean low reliability and low consistency, and higher lambda and correlation scores (>0.7) implies a higher level of reliability.
- Short Tip: Measures consistency and reliability of different responses to the same questions where low correlations and lambda scores mean low reliability and low consistency.
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable. The total number of variables must be even.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Internal Consistency Reliability: Kuder–Richardson Statistic (Dichotomous Data). Kuder–Richardson 20 and 21 Statistics measure the internal consistency of measurements of dichotomous and binary responses, and are typically between 0 and 1. The higher the value, the higher the level of consistency, and the method is similar to Cronbach’s Alpha. The KR 20 and KR 21 statistics measure the internal consistency and reliability for dichotomous data. A high KR statistic (> 0.7) implies strong reliability.
- Short Tip: Checks for internal consistency and reliability of different peoples’ responses to the same questions.
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 5 data points, with an identical number of total data points/rows per variable.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Inter-rater Reliability: Cohen’s Kappa. Cohen’s Kappa test measures the reliability of two independent raters by measuring their agreement levels and accounting for random chances of disagreement. The null hypothesis tested is that the two independent researchers’ judgments are reliable or have no significant difference. Enter the data as an N × N matrix (rows as one judge’s responses to various questions and columns as the second judge’s responses to the same questions).
- Short Tip: Measures the reliability of two independent raters and their agreement levels (H₀: both sets of judgments agree and are reliable compared to one another).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 5 data points each, with the same number of total data points or rows per variable. The number of columns must equal the number of rows (N × N matrix).
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Inter-rater Reliability: Inter-Class Correlation(ICC). The Interclass Correlation (ICC) Reliability tests the reliability of ratings by comparing the variability of various ratings of the same subject to the total variation across all ratings and all subjects simultaneously. A high ICC indicates a high level of reliability, and the analysis can be applied to Likert scales and any other quantitative scales. The variable columns are each judge’s responses to different subjects (rows).
- Short Tip: Measures reliability of different judges’ responses to the same subjects where low correlations mean low reliability and low consistency.
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 5 data points each, with the same number of total data points or rows per variable.
  - Variables:

- - - >VAR1; VAR2; VAR3; …

Inter-rater Reliability: Kendall’s W (No Ties). Runs the Kendall’s W Measure of Concordance between raters. Each column is a different item, and each row is a judge’s or rater’s value. The null hypothesis tested is that there is no agreement among different judges (W = 0), indicating no reliability among raters.
- Short Tip: Measures inter-rater concordance (H₀: there is zero concordance among different raters, indicating W = 0 or no inter-rater reliability).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Inter-rater Reliability: Kendall’s W (with Ties). Runs the Kendall’s W Measure of Concordance between raters after adjusting for ties. Each column is a different item, and each row is a judge’s or rater’s value. The null hypothesis tested is that there is no agreement among different judges (W = 0), indicating no reliability among raters.
- Short Tip: Measures inter-rater concordance adjusted for ties (H₀: there is zero concordance among different raters, indicating W = 0 or no inter-rater reliability).
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Kendall’s Tau Correlation (No Ties). Kendall’s Tau is a nonparametric correlation coefficient, considering concordance or discordance, based on all possible pairwise combinations of ordinal ranked data. The null hypothesis tested is that there is zero correlation between the two variables.
- Short Tip: Nonparametric Kendall’s Tau concordance correlation (H₀: there is zero correlation between the two variables).
- Model Input: Data Type B. Two input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:

1. 1. - >VAR1; VAR2

Kendall’s Tau Correlation (with Ties). Kendall’s Tau with Ties is a nonparametric correlation coefficient, considering concordance or discordance, and correcting for ties, based on all possible pairwise combinations of ordinal ranked data. The null hypothesis tested is that there is zero correlation between the two variables.
- Short Tip: Nonparametric Kendall’s Tau concordance correlation corrected for ties (H₀: there is zero correlation between the two variables).
- Model Input: Data Type B. Two input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Variables:

- - - >VAR1; VAR2

Linear Interpolation. Sometimes interest rates or any type of time-dependent rates may have missing values. For instance, the Treasury rates for Years 1, 2, and 3 exist, and then jump to Year 5, skipping Year 4. We can, using linear interpolation (i.e., we assume the rates during the missing periods are linearly related), determine and “fill in” or interpolate their values.
- Short Tip: Fills in the missing points in a data series.
- Model Input: Data Type B. Two input variables are required. Different variables are arranged in columns and all variables must have at least 3 data points each, with the same number of total data points or rows per variable.
  - Periods, Values, Required Value for Period:

1. 1. - >VAR1
    - >VAR2
    - 5

Logistic S-Curve. The S-curve, or logistic growth curve, starts off like a J-curve, with exponential growth rates. Over time, the environment becomes saturated (e.g., market saturation, competition, overcrowding), the growth slows, and the forecast value eventually ends up at a saturation or maximum level. The S-curve model is typically used in forecasting market share or sales growth of a new product from market introduction until maturity and decline, population dynamics, growth of bacterial cultures, and other naturally occurring variables.
- Short Tip: Generates a time-series forecast using the Logistic S-curve.
- Model Input: Data Type A. Requires four simple manual inputs: assumed growth rates (%), starting value of the forecast, maximum capacity value, and the total number of periods to forecast.
  - Assumed Growth Rate (%), Initial Starting Value, Max Capacity Value, Forecast Periods:

1. 1. - >10
    - >10
    - >1200
    - >120

Mahalanobis Distance. The Mahalanobis distance measures the distance between point X and a distribution Y, based on multidimensional generalizations of the number of standard deviations X is away from the average of Y. This multidimensional Mahalanobis distance is equivalent to standard Euclidean distance. The null hypothesis tested is that there are no outliers in each of the data rows.
- Short Tip: Checks for outliers in each row of data.
- Model Input: Data Type C. Two or more input variables are required. Different variables are arranged in columns and all variables must have at least 5 data points each, with the same number of total data points or rows per variable.
  - Variables:

1. 1. - >VAR1; VAR2; VAR3; …

Markov Chain. The Markov Chain models the probability of a future state that depends on a previous state (a mathematical system that undergoes transitions from one state to another), forming a chain when linked together (a random process characterized as memoryless, i.e., the next state depends only on the current state and not on the sequence of events that preceded it) that reverts to a long-run steady-state level. Used to forecast the market share of two competitors.
- Short Tip: Generates a time series of a two-state Markov Chain of alternating states.
- Model Input: Data Type A. Requires two simple manual inputs, State 1 Probability and State 2 Probability.
  - State 1, State 2:

1. 1. - >10
    - >10

Markov Chain Transition Risk Matrix. The Markov Chain Transition Matrix models the probability of future states using a mathematical system that undergoes transitions from one state to another. It is an extension of the two-state Markov Chain.
- Short Tip: Generates a time series of a multiple-state Markov Chain of alternating states as a risk transition matrix.
- Model Input: Data Type A. Requires one variable of historical data and a simple manual input of the number of states to model.
  - Variable, Number of States:

- - - >VAR1
    - >5

Multiple Poisson Regression (Population and Frequency). The Poisson Regression is like the Logit Regression in that the dependent variables can only take on non-negative values, but also that the underlying distribution of the data is a Poisson distribution, drawn from a known population size.
- Short Tip: Runs Poisson Regression with non-negative dependent variables where all variables follow a Poisson distribution with some known population size.
- Model Input: Data Type C. One Dependent Variable, one Population Size or Frequency variable, and one or more Independent Variables are allowed, with the same number of total data points or rows per variable.
  - Dependent Variable, Population or Frequency, Independent Variables:

1. 1. - >VAR1
    - >VAR2
    - >VAR3; VAR4; VAR5

Multiple Regression (Deming Regression with Known Variance). In regular multivariate regressions, the dependent variable Y is modeled and predicted by independent variables X_i with some error ε. However, in a Deming regression, we further assume that the data collected for Y and X have additional uncertainties and errors, or variances, that is used to provide a more relaxed fit in a Deming model.
- Short Tip: Runs a bivariate regression assuming the variables have additional uncertainties or variances.
- Model Input: Data Type B. Two variables are required with at least 5 rows of data each, the Dependent Variable, and the Independent Variable. Also, the variances of these two variables are required.
  - Dependent Variable, Independent Variable, Dependent Variable’s Variance, Independent Variable’s Variance:

1. 1. - VAR1
    - VAR2
    - 09
    - 02

Multiple Regression (Linear). Multivariate linear regression is used to model the relationship structure and characteristics of a certain dependent variable as it depends on other independent exogenous variables. Using the modeled relationship, we can forecast the future values of the dependent variable. The accuracy and goodness-of-fit for this model can also be determined. Linear and nonlinear models can be fitted in the multiple regression analysis.
- Short Tip: Runs a multiple linear regression.
- Model Input: Data Type C. Two sets of variables are required: One Dependent Variable and One or Multiple Independent Variables, with at least 5 rows of data in each variable, with the same number of total data points or rows per variable.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Multiple Regression (Nonlinear). Multivariate nonlinear regression is used to model the relationship structure and characteristics of a certain dependent variable as it depends on other independent exogenous variables. Using the modeled relationship, we can forecast the future values of the dependent variable. The accuracy and goodness-of-fit for this model can also be determined. Linear and nonlinear models can be fitted in the multiple regression analysis.
- Short Tip: Runs a multiple nonlinear regression.
- Model Input: Data Type C. Two sets of variables are required: One Dependent Variable and One or Multiple Independent Variables, with at least 5 rows of data in each variable, with the same number of total data points or rows per variable.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Multiple Regression (Ordinal Logistic Regression). Runs a multivariate ordinal logistic regression with two predictor variables and multiple ordered variables.
- Short Tip: Runs a multivariate ordinal logistic regression with two predictor variables and multiple frequencies of ordered variables. For instance, the two categorical variables of Gender (0/1) and Age (1–5), with five variables filled with the numbers or frequencies of people who responded Strongly Agree, Agree, Neutral, Disagree, or Strongly Disagree, which presumable are ordered. Note that this is an ordinal dataset where the Age variable is ordered, and it is multinomial because we are forecasting the multiple variables’ frequency and probability.
- Model Input: Data Type C. Two sets of variables are required: Two Predictor Variables and One or Multiple Variables of Frequency Counts, with at least 5 rows of data in each variable, with the same number of total data points or rows per variable.
  - Predictor Variables, Frequency Count Variables:

1. 1. - >VAR1; VAR2
    - >VAR3; VAR4; VAR5; …

Multiple Regression (Through Origin). Runs a multiple linear regression but without an intercept.
- Short Tip: Runs a multiple linear regression but without an intercept.
- Model Input: Data Type C. Two sets of variables are required: One Dependent Variable and One or Multiple Independent Variables, with at least 5 rows of data in each variable, with the same number of total data points or rows per variable.
  - Dependent Variable, Independent Variables:

1. 1. - >VAR1
    - >VAR2; VAR3; …

Multiple Regression (Two-Variable Functional Form Tests). Runs a bivariate regression test on multiple functional forms including Linear, Linear Log, Reciprocal, Quadratic, Log Linear, Log Reciprocal, Log Quadratic, Double Log, Logistic.
- Short Tip: Runs a bivariate regression test on multiple functional forms.
- Model Input: Data Type B. Two variables are required: One Dependent Variable and One Independent Variable, with at least 5 rows of data in each variable, with the same number of total data points or rows per variable.
  - Dependent Variable, Independent Variable:

1. 1. - >VAR1
    - >VAR2

Multiple Ridge Regression (Low Variance, High Bias, High VIF). A Ridge Regression comes with a higher bias than an Ordinary Least Squares multiple regression but has less variance. It is more suitable in situations with high Variance Inflation Factors and multicollinearity or when there is a high number of variables compared to data points.
- Short Tip: Multiple regression adjusted for high VIF Multicollinearity or when there is a high number of independent variables compared to available data points.
- Model Input: Data Type C. One Dependent Variable is required, and one or more Independent Variables are allowed, with the same number of total data points or rows per variable. Lambda is an optional input.
  - Dependent Variable, Independent Variables, Lambda (Optional, default is 0.1):

1. 1. - VAR1
    - VAR2
    - 09

Multiple Weighted Regression (Fixing Heteroskedasticity).Runs a Multivariate Regression on Weighted Variables to correct for heteroskedasticity in all the variables. The weights used to adjust these variables are the user input standard deviations.
- Short Tip: Multiple regression modeling on weight-adjusted variables to account for and correct heteroskedasticity.
- Model Input: Data Type C. One Dependent Variable is required, and one or more Independent Variables are allowed, with the same number of total data points or rows per variable. Finally, a Weight input variable is required, which is a series of Standard Deviations.
  - Dependent Variable, Independent Variable, Weights in Stdev:

1. 1. - VAR1
    - VAR2; VAR3; VAR4; …
    - VAR5

Neural Network. Commonly used to refer to a network or circuit of biological neurons, modern usage of the term neural network often refers to artificial neural networks that consist of artificial neurons, or nodes, recreated in a software environment. Such networks attempt to mimic the neurons in the human brain in ways of thinking and identifying patterns and, in our situation, identifying patterns for the purposes of forecasting time-series data.
- Linear. Applies a linear function.
- Nonlinear Logistic. Applies a nonlinear logistic
- Nonlinear Cosine with Hyperbolic Tangent. Applies a nonlinear cosine with hyperbolic tangent function.
- Nonlinear Hyperbolic Tangent. Applies a nonlinear hyperbolic tangent function.
  - Short Tip: Runs a time-series neural network forecast through pattern recognition algorithms (linear, logistic, cosine, hyperbolic).
  - Model Input: Data Type A. Data Variable, Layers, Testing Set, Forecast Periods, and Apply Multi phased Optimization (Optional, default set to 0 or no optimization)

- - - Data Variable, Layers, Testing Set, Forecast Periods:
      - >VAR1
      - >3
      - >20
      - >5
      - >1

Nominal Data Contingency Analysis (McNemar’s Marginal Homogeneity). Runs the McNemar’s test on a pair of alphanumeric nominal data and creates 2 × 2 contingency tables with dichotomous traits. The test determines if the row and column variables’ marginal probabilities are equal, that is, if there is marginal homogeneity. The null hypothesis is marginal homogeneity where the two marginal probabilities for each outcome are the same.
- Short Tip: Runs the McNemar’s test on a pair of alphanumeric nominal data and creates 2 × 2 contingency tables with dichotomous traits.
- Model Input: Data Type B. Two alphanumeric variables are required, with the same number of total data points or rows per variable.
  - Variable 1, Variable 2

1. 1. - VAR1; VAR2