Notebooks Bowles

Aus praktischen Gründen wurden die Notebooks der Veranstaltung DSCI einzelnen Bowles-Kapiteln zugeordnet. Die folgende Übersicht zeigt, welche Bowles-Notebooks in welche Notebooks aus DSCI eingeflossen sind.

dsci bowles code Python 2.7 ggf. Abbildungen etc.
(ipynb)dsci_intro_1 Ohne Bowles, nur Python, in der Intro-Veranstaltung
(ipynb)dsci_intro_2 Unterschiede Liste, np.ndarray, pd.DataFrame, pd.Series
(ipynb)Bowles_2.1_titanic
(ipynb)Bowles_2.2_rocks Listing 2-1: Sizing Up a New Data Set—rockVmineSummaries.py (Output: outputRocksVMinesSummaries.txt)
Listing 2-2: Determining the Nature of Attributes—rockVmineContents.py (Output: outputRocksVMinesContents.txt)
Listing 2-3: Summary Statistics for Numeric and Categorical Attributes—rVMSummaryStats.py (Output: outputSummaryStats.txt)
Listing 2-4: Quantile-Quantile Plot for 4th Rocks versus Mines Attribute— qqplotAttribute.py
Listing 2-5: Using Python Pandas to Read and Summarize Data—pandasReadSummarize.py
(ipynb)Bowles_2.3_rocks Listing 2-6: Parallel Coordinates Graph for Real Attribute Visualization—linePlots.py
Listing 2-7: Cross Plotting Pairs of Attributes—corrPlot.py Figure 2-4: Cross-plot of rocks versus mines attributes 2 and 3
Listing 2-8: Correlation between Classifi cation Target and Real Attributes—targetCorr.py
Listing 2-9: Pearson’s Correlation Calculation for Attributes 2 versus 3 and 2 versus 21 - corrCalc.py Figure 2-5: Cross- plot of rocks versus mines attributes 2 and 21
Listing 2-10: Presenting Attribute Correlations Visually—sampleCorrHeatMap.py
(ipynb)Bowles_2.4_abalone Listing 2-11: Read and Summarize the Abalone Data Set—abaloneSummary.py
Listing 2-12: Parallel Coordinate Plot for Abalone Data—abaloneParallelPlot.py Equation 2-5: Using logit transform for soft range compression
Listing 2-13: Correlation Calculations for Abalone Data—abaloneCorrHeat.py
(ipynb)Bowles_2.5_wine Listing 2-14: Wine Data Summary—wineSummary.py
Listing 2-15: Producing a Parallel Coordinate Plot for Wine Data—wineParallelPlot.py Figure 2-19: Correlation heat map for the wine data
(ipynb)Bowles_2.6_glass Listing 2-16: Summary of Glass Data Set—glassSummary.py Figure 2-20: Box plot of the glass data
Listing 2-17: Parallel Coordinate Plot for the Glass Data Figure 2-21: Parallel coordinate plot for the glass data
(ipynb)Bowles_3.3_rocks Listing 3-1: Comparison of MSE, MAE and RMSE—regressionErrorMeasures.py Figure 3-9: Confusion matrix example
Listing 3-2: Measuring Performance for Classifier Trained on Rocks-Versus-Mines— classifierPerformance_RocksVMines.py Table 3-2: Dependence of Misclassification Error on Decision Threshold
Table 3-3: Cost of Mistakes for Different Decision Thresholds
Figure 3-10: In-sample ROC for rocks-versus-mines classifier
Figure 3-11: Out-of-sample ROC for rocks-versus-mines classifier
(ipynb)Bowles_3.4_wine_rocks Listing 3-3: Forward Stepwise Regression: Wine Quality Data—fwdStepwiseWine.py Figure 3-13: Wine quality prediction error using forward stepwise regression
Listing 3-4: Forward Stepwise Regression Output—fwdStepwiseWineOutput.txt
Figure 3-14: Actual taste scores versus predictions generated with forward stepwise regression
Figure 3-15: Histogram of wine taste prediction error with forward stepwise regression
Listing 3-5: Predicting Wine Taste with Ridge Regression—ridgeWine.py Figure 3-16: Wine quality prediction error using ridge regression
Listing 3-6: Ridge Regression Output—ridgeWineOutput.txt
Figure 3-17: Actual taste scores versus predictions generated with ridge regression
Figure 3-18: Histogram of wine taste prediction error with ridge regression
Listing 3-7: Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMines.py Listing 3-8: Output from Classification Model for Rocks Versus Mines Using Ridge Regression—classifierRidgeRocksVMinesOutput.txt
Figure 3-19: AUC for the rocks-versus-mines classifier using ridge regression
Figure 3-20: Plot of actual versus prediction for the rocks-versus-mines classifier using ridge regression
(ipynb)Bowles_4.3_wine Listing 4-1: LARS Algorithm for Predicting Wine Taste—larsWine2.py Figure 4-3: Coefficient curves for LARS regression on wine data.
Listing 4-2: 10-Fold Cross-Validation to Determine Best Set of Coefficients—larsWineCV.py Figure 4-4: Cross-validated mean square error for LARS on wine data.
Listing 4-3: Glmnet Algorithm—glmnetWine.py Figure 4-6: Coefficient curves for glmnet models for predicting wine taste
(ipynb)Bowles_4.4_rocks_wine_abalone Listing 4-4: Converting a Classifi cation Problem to an Ordinary Regression Problem by Assigning Numeric Values to Binary Labels Figure 4-7: Coefficient curves for rocks versus mines classification problem solved by converting to labels
Listing 4-5: Basis Expansion for Wine Taste Prediction Figure 4-8: Functions generated to expand wine attribute session
Listing 4-6: Coding Categorical Variable for Penalized Linear Regression - Abalone Data—larsAbalone.py
(ipynb)Bowles_5.2_wine Listing 5-1: Using Cross-Validation to Estimate Out-of-Sample Error with Lasso Modeling Wine Taste—wineLassoCV.py Figure 5-1: ... un-normalized Y
Figure 5-2: ... normalized Y
Figure 5-3: ... un-normalized X and Y
Listing 5-2: Lasso Training on Full Data Set—wineLassoCoefCurves.py Figure 5-4: Coefficient curves for Lasso trained to predict wine quality
Figure 5-5: Coefficient curves for Lasso trained on un-normalized Xs
Listing 5-3: Using Out-of-Sample Error to Evaluate New Attributes for Predicting Wine Quality—wineExpandedLassoCV.py Figure 5-6: Cross-validation error curves for Lasso trained on wine quality data with expanded feature set
(ipynb)Bowles_5.3_rocks Listing 5-4: Using ElasticNet Regression to Build a Binary (Two-Class) Classifier— rocksVMinesENetRegCV.py Figure 5-7: Out-of-sample classifier misclassification performance
Figure 5-8: Out-of-sample classifier AUC performance
Figure 5-9: Receiver operating characteristic for best performing classifier
(ipynb)Bowles_5.4_rocks Listing 5-5: Coefficient Trajectories for ElasticNet Trained on Rocks versus Mines Data— rocksVMinesCoefCurves.py Figure 5-10: Coefficient curves for ElasticNet trained on rocks versus mines data
Listing 5-6: Penalized Logistic Regression Trained on Rocks versus Mines Data— rocksVMinesGlmnet.py Figure 5-11: Coefficient curves for ElasticNet penalized logistic regression trained on rocks versus mines data
(ipynb)Bowles_5.5_glass Listing 5-7: Multiclass Classification with Penalized Linear Regression - Classifying Crime Scene Glass Samples—glassENetRegCV.py Figure 5-12: Misclassification error rates using penalized linear regression for glass classification