Einführung in CRISP-DM
Quelle: CRISP-DM 1.0 Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler). © 2000 SPSS Inc. CRISPMWP-1104 CRISP-DM
2 Data mining problem types
problem type | appropriate technique | |
2.1 Data description and summarization | Iris: Attribute > pandas describe | |
2.2 Segmentation | Clustering techniques | Iris kNN 3D |
Neural networks | ||
Visualization | ||
2.3 Concept descriptions | Rule induction methods | |
Conceptual clustering | ||
2.4 Classification | Discriminant analysis | |
Rule induction methods | CRISP-DM, S.68 unten: If SEX = male and AGE > 51 then CUSTOMER = loyal ... | |
Decision tree learning | iris decision tree | |
Neural networks | ||
K nearest neighbor | https://medium.com/@srishtisawla/k-nearest-neighbors-f77f6ee6b7f5 | |
Case-based reasoning | ||
Genetic algorithms | ||
2.5 Prediction | Regression analysis | analyticsvidhya ridge lasso > Regressionsgerade |
Regression trees | ||
Neural networks | ||
K nearest neighbor | ||
Box-Jenkins methods | ||
Genetic algorithms | ||
2.6 Dependency analysis | Correlation analysis | https://en.wikipedia.org/wiki/Correlation_and_dependence > Beispiele |
Regression analysis | ||
Association rules | kdnugget > grocery transactions | |
Bayesian networks | http://users.sussex.ac.uk/~christ/crs/kr-ist/lec09a.html > Reasoning as propagation | |
Inductive logic programming | ||
Visualization techniques | Tableau: Market Basket Analysis / Heatmap |
Entscheidungsbaum zur Auswahl von Algorithmen: scikit-learn.org > scikit-learn algorithm cheat sheet
Phases, Tasks, Outputs
Figure 3: Generic tasks (bold) and outputs (italic) of the CRISP-DM reference model (CRISP-DM, p. 12)
Business Understanding | Determine Business Objectives | Background |
Business Objectives | ||
Business Success Criteria | ||
Assess Situation | Inventory of Resources | |
Requirements, Assumptions, and Constraints | ||
Risks and Contingencies | ||
Terminology | ||
Costs and Benefits | ||
Determine Data Mining Goals | Data Mining Goals | |
Data Mining Success Criteria | ||
Data Understanding | Collect Initial Data | Initial Data Collection Report |
Describe Data | Data Description Report | |
Explore Data | Data Exploration Report | |
Verify Data Quality | Data Quality Report | |
Data Preparation | Select Data | Rationale for Inclusion/ Exclusion |
Clean Data | Data Cleaning Report | |
Construct Data | Derived Attributes | |
Generated Records | ||
Integrate Data | Merged Data | |
Format Data | Reformatted Data | |
Dataset | Dataset Description | |
Modeling | Select Modeling Techniques | Modeling Technique |
Modeling Assumptions | ||
Generate Test Design | Test Design | |
Build Model | Parameter Settings | |
Models | ||
Model Descriptions | ||
Assess Model | Model Assessment | |
Revised Parameter Settings | ||
Evaluation | Evaluate Results | Assessment of Data Mining Results w.r.t. Business Success Criteria |
Approved Models | ||
Review Process | Review of Process | |
Determine Next Steps | List of Possible Actions | |
Decision | ||
Deployment | Plan Deployment | Deployment Plan |
Plan Monitoring and Maintenance | Monitoring and Maintenance Plan | |
Produce Final Report | Final Report | |
Final Presentation | ||
Review Project | Experience Documentation |