Chapter 1: Two Essential Algorithms

Why Are These Two Algorithms So Useful?

Algorithmen

  • function approximation problems
    • lineare Regression
      • penalized linear regression methods
    • logistische Regression
    • ensemble methods
      • boosted decision tree (denoted by BSTDT in Table 1-3)
      • Random Forests (denoted by RF in Table 1-3)
  • k nearest neighbors (KNNs)
  • artificial neural nets (ANNs)
  • support vector machines (SVMs)

Studien

  • Caruana, Rich, and Alexandru Niculescu‐Mizil (2006)
    • The reason for that is that these data sets have few attributes (at most 200) relative to examples (5,000 in each data set). There's plenty of data to resolve a model with so few attributes, and yet the training sets are small enough that the training time is not excessive.
  • Caruana, Rich, Nikos Karampatziakis, and Ainur Yessenalina (2008)
    • how do these algorithms compare on big data?
    • genomic problems have several tens of thousands of attributes (one attribute per gene)
    • text mining problems can have millions of attributes (one attribute per distinct word or per distinct pair of words)
    • Linear (logistic) regression is in the top three for 5 of the 11 test cases used in the study
    • The study demonstrates that penalized linear regression can provide the best answers available in many cases and be near the top even in cases where they are not the best.

JB: Unterscheide Probleme und Algorithmen: das ist eine n:m-Abbildung. Vorsicht Spoiler: Man kann mit einem Regressions-Algorithmus auch Klassifikations-Probleme lösen.

Zusammenfassung S.5

  • predictive performance: ok
  • training speed
    • particularly early in development when iterations are required to home in on the best approach
  • prediction speed
    • quickly enough for high‐speed trading or Internet ad insertions
  • easy to use
    • nicht viele einstellbare (Hyper-) Parameter
    • well defined, well structured input types
  • solve several generic problem types
  • feature selection
    • indicate variable relevance
    • ranking included
  • It is not unusual to be able to arrange the input data and generate a first trained model and performance predictions within an hour or two of starting a new problem.

The algorithms discussed in this book have the beneficial property of providing metrics on the utility of each attribute in producing predictions. (17)

Begriffe erklären

  • predictive model
  • function approximation
  • binary classification
  • attribute
  • positive example
  • classification problem
  • regression problem
  • unbalanced
  • feature selection
  • feature engineering

1.2 What Are Penalized Regression Methods?

Begriffe

  • ordinary least squares (OLS) regression

1.3 What Are Ensemble Methods?

combine base learners

  • e.g. binary decision trees

bagging, bootstrap aggregation: train on random subsets

"""The trick is how to generate large numbers of independent models, particularly if they are all using the same base learner."""

handle instability

1.4 How to Decide Which Algorithm to Use (p.11)

series of experiments (11)

Penalized linear regression will generally be faster than an ensemble method, and the time difference can be a material factor in the development process.

The basic idea is that on problems that are not complex and problems for which sufficient data are not available, linear methods may achieve better overall performance than more complicated ensemble methods.

1.5 The Process Steps for Building a Predictive Model

p14.3

develop a set of features

train model, estimate performance

Prozess zyklisch verbessern

  • pull out the examples that show the worst performance
  • add other features
  • """bifurcate the data and train different models on different populations

Framing a Machine Learning Problem

Fig 1-5: "What does "better" mean?

Feature Extraction and Feature Engineering

Feature engineering is the process of manipulating and combining features to arrive at more informative ones.

NOTE Data preparation and feature engineering is estimated to take 80 to 90 percent of the time required to develop a machine learning model.

Determining Performance of a Trained Model

systematic ways to hold out data

BEWARE: """ “leak” [data] into the training process