The Anatomy of a New Problem

Different Types of Attributes and Labels Drive Modeling Choices

  • numeric (inc. ordinal)
  • categorical
    • no order relation
  • """When the labels are numeric, the problem is called a regression problem. When the labels are categorical, the problem is called a classifi cation problem. If the categorical target takes only two values, the problem is called a binary classifica- tion problem. If it takes more than two values, the problem is called a multiclass classification problem .

Things to Notice about Your New Data Set 26

  • Items to Check
    • Number of rows and columns
    • Number of categorical variables and number of unique values for each
    • Number of Missing values
    • Summary statistics for attributes and labels