Achieving Harmony Between Model and Data

Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size

Using Forward Stepwise Regression to Control Overfitting

algorithm for best subset selection

Figure 3-13: Wine quality prediction error using forward stepwise regression

  • Number of attributes

Evaluating and Understanding Your Predictive Model

Several other plots are helpful in understanding the performance of a trained algorithm and can point the way to making improvements in its performance.

  • Figure 3-14: Actual taste scores versus predictions generated with forward stepwise regression
  • Figure 3-15: Histogram of wine taste prediction error with forward stepwise regression

The number of attributes to be incorporated in the solution can be called a complexity parameter. Models with larger complexity parameters have more free parameters and are more likely to overfit the data than less-complex models.

Control Overfitting by Penalizing Regression Coefficients—Ridge Regression

first introduction to penalized linear regression

coefficient penalized regression [...] making all the coefficients smaller instead of making some of them zero.

Equation 3-15: Ridge regression minimization problem

Listing 3-5: Predicting Wine Taste with Ridge Regression—ridgeWine.py

Figure 3-16: Wine quality prediction error using ridge regression

  • x-Achse: -log(alpha)
  • y-Achse: RMS-Error

Figure 3-17: Actual taste scores versus predictions generated with ridge regression

Figure 3-18: Histogram of wine taste prediction error with ridge regression

*

  • CODE Listing 3-3: Forward Stepwise Regression: Wine Quality Data—fwdStepwiseWine.py Figure 3-13: Wine quality prediction error using forward stepwise regression Listing 3-4: Forward Stepwise Regression Output—fwdStepwiseWineOutput.txt Figure 3-14: Actual taste scores versus predictions generated with forward stepwise regression Figure 3-15: Histogram of wine taste prediction error with forward stepwise regression Listing 3-5: Predicting Wine Taste with Ridge Regression—ridgeWine.py Figure 3-16: Wine quality prediction error using ridge regression Listing 3-6: Ridge Regression Output—ridgeWineOutput.txt Figure 3-17: Actual taste scores versus predictions generated with ridge regression Figure 3-18: Histogram of wine taste prediction error with ridge regression Listing 3-7: Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMines.py Listing 3-8: Output from Classification Model for Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMinesOutput.txt Figure 3-19: AUC for the rocks-versus-mines classifier using ridge regression Figure 3-20: Plot of actual versus prediction for the rocks-versus-mines classifier using ridge regression