Einführung in CRISP-DM
Quelle: CRISP-DM 1.0 Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler). © 2000 SPSS Inc. CRISPMWP-1104 CRISP-DM
2 Data mining problem types
| problem type | appropriate technique | |
| 2.1 Data description and summarization | Iris: Attribute > pandas describe | |
| 2.2 Segmentation | Clustering techniques | Iris kNN 3D |
| Neural networks | ||
| Visualization | ||
| 2.3 Concept descriptions | Rule induction methods | |
| Conceptual clustering | ||
| 2.4 Classification | Discriminant analysis | |
| Rule induction methods | CRISP-DM, S.68 unten: If SEX = male and AGE > 51 then CUSTOMER = loyal ... | |
| Decision tree learning | iris decision tree | |
| Neural networks | ||
| K nearest neighbor | https://medium.com/@srishtisawla/k-nearest-neighbors-f77f6ee6b7f5 | |
| Case-based reasoning | ||
| Genetic algorithms | ||
| 2.5 Prediction | Regression analysis | analyticsvidhya ridge lasso > Regressionsgerade |
| Regression trees | ||
| Neural networks | ||
| K nearest neighbor | ||
| Box-Jenkins methods | ||
| Genetic algorithms | ||
| 2.6 Dependency analysis | Correlation analysis | https://en.wikipedia.org/wiki/Correlation_and_dependence > Beispiele |
| Regression analysis | ||
| Association rules | kdnugget > grocery transactions | |
| Bayesian networks | http://users.sussex.ac.uk/~christ/crs/kr-ist/lec09a.html > Reasoning as propagation | |
| Inductive logic programming | ||
| Visualization techniques | Tableau: Market Basket Analysis / Heatmap | |
Entscheidungsbaum zur Auswahl von Algorithmen: scikit-learn.org > scikit-learn algorithm cheat sheet
Phases, Tasks, Outputs
Figure 3: Generic tasks (bold) and outputs (italic) of the CRISP-DM reference model (CRISP-DM, p. 12)
| Business Understanding | Determine Business Objectives | Background |
| Business Objectives | ||
| Business Success Criteria | ||
| Assess Situation | Inventory of Resources | |
| Requirements, Assumptions, and Constraints | ||
| Risks and Contingencies | ||
| Terminology | ||
| Costs and Benefits | ||
| Determine Data Mining Goals | Data Mining Goals | |
| Data Mining Success Criteria | ||
| Data Understanding | Collect Initial Data | Initial Data Collection Report |
| Describe Data | Data Description Report | |
| Explore Data | Data Exploration Report | |
| Verify Data Quality | Data Quality Report | |
| Data Preparation | Select Data | Rationale for Inclusion/ Exclusion |
| Clean Data | Data Cleaning Report | |
| Construct Data | Derived Attributes | |
| Generated Records | ||
| Integrate Data | Merged Data | |
| Format Data | Reformatted Data | |
| Dataset | Dataset Description | |
| Modeling | Select Modeling Techniques | Modeling Technique |
| Modeling Assumptions | ||
| Generate Test Design | Test Design | |
| Build Model | Parameter Settings | |
| Models | ||
| Model Descriptions | ||
| Assess Model | Model Assessment | |
| Revised Parameter Settings | ||
| Evaluation | Evaluate Results | Assessment of Data Mining Results w.r.t. Business Success Criteria |
| Approved Models | ||
| Review Process | Review of Process | |
| Determine Next Steps | List of Possible Actions | |
| Decision | ||
| Deployment | Plan Deployment | Deployment Plan |
| Plan Monitoring and Maintenance | Monitoring and Maintenance Plan | |
| Produce Final Report | Final Report | |
| Final Presentation | ||
| Review Project | Experience Documentation |