Achieving Harmony Between Model and Data
Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size
Using Forward Stepwise Regression to Control Overfitting
algorithm for best subset selection
Figure 3-13: Wine quality prediction error using forward stepwise regression
- Number of attributes
Evaluating and Understanding Your Predictive Model
Several other plots are helpful in understanding the performance of a trained algorithm and can point the way to making improvements in its performance.
- Figure 3-14: Actual taste scores versus predictions generated with forward stepwise regression
- Figure 3-15: Histogram of wine taste prediction error with forward stepwise regression
The number of attributes to be incorporated in the solution can be called a complexity parameter. Models with larger complexity parameters have more free parameters and are more likely to overfit the data than less-complex models.
Control Overfitting by Penalizing Regression Coefficients—Ridge Regression
first introduction to penalized linear regression
coefficient penalized regression [...] making all the coefficients smaller instead of making some of them zero.
Equation 3-15: Ridge regression minimization problem
Listing 3-5: Predicting Wine Taste with Ridge Regression—ridgeWine.py
Figure 3-16: Wine quality prediction error using ridge regression
- x-Achse: -log(alpha)
- y-Achse: RMS-Error
Figure 3-17: Actual taste scores versus predictions generated with ridge regression
Figure 3-18: Histogram of wine taste prediction error with ridge regression
*
- CODE Listing 3-3: Forward Stepwise Regression: Wine Quality Data—fwdStepwiseWine.py Figure 3-13: Wine quality prediction error using forward stepwise regression Listing 3-4: Forward Stepwise Regression Output—fwdStepwiseWineOutput.txt Figure 3-14: Actual taste scores versus predictions generated with forward stepwise regression Figure 3-15: Histogram of wine taste prediction error with forward stepwise regression Listing 3-5: Predicting Wine Taste with Ridge Regression—ridgeWine.py Figure 3-16: Wine quality prediction error using ridge regression Listing 3-6: Ridge Regression Output—ridgeWineOutput.txt Figure 3-17: Actual taste scores versus predictions generated with ridge regression Figure 3-18: Histogram of wine taste prediction error with ridge regression Listing 3-7: Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMines.py Listing 3-8: Output from Classification Model for Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMinesOutput.txt Figure 3-19: AUC for the rocks-versus-mines classifier using ridge regression Figure 3-20: Plot of actual versus prediction for the rocks-versus-mines classifier using ridge regression