Einführung in CRISP-DM

Quelle: CRISP-DM 1.0 Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler). © 2000 SPSS Inc. CRISPMWP-1104 CRISP-DM

2 Data mining problem types

problem type appropriate technique
2.1 Data description and summarization Iris: Attribute > pandas describe
2.2 Segmentation Clustering techniques Iris kNN 3D
Neural networks
2.3 Concept descriptions Rule induction methods
Conceptual clustering
2.4 Classification Discriminant analysis
Rule induction methods CRISP-DM, S.68 unten: If SEX = male and AGE > 51 then CUSTOMER = loyal ...
Decision tree learning iris decision tree
Neural networks
K nearest neighbor https://medium.com/@srishtisawla/k-nearest-neighbors-f77f6ee6b7f5
Case-based reasoning
Genetic algorithms
2.5 Prediction Regression analysis analyticsvidhya ridge lasso > Regressionsgerade
Regression trees
Neural networks
K nearest neighbor
Box-Jenkins methods
Genetic algorithms
2.6 Dependency analysis Correlation analysis https://en.wikipedia.org/wiki/Correlation_and_dependence > Beispiele
Regression analysis
Association rules kdnugget > grocery transactions
Bayesian networks http://users.sussex.ac.uk/~christ/crs/kr-ist/lec09a.html > Reasoning as propagation
Inductive logic programming
Visualization techniques Tableau: Market Basket Analysis / Heatmap

Entscheidungsbaum zur Auswahl von Algorithmen: scikit-learn.org > scikit-learn algorithm cheat sheet

Phases, Tasks, Outputs

Figure 3: Generic tasks (bold) and outputs (italic) of the CRISP-DM reference model (CRISP-DM, p. 12)

Business Understanding Determine Business Objectives Background
Business Objectives
Business Success Criteria
Assess Situation Inventory of Resources
Requirements, Assumptions, and Constraints
Risks and Contingencies
Costs and Benefits
Determine Data Mining Goals Data Mining Goals
Data Mining Success Criteria
Data Understanding Collect Initial Data Initial Data Collection Report
Describe Data Data Description Report
Explore Data Data Exploration Report
Verify Data Quality Data Quality Report
Data Preparation Select Data Rationale for Inclusion/ Exclusion
Clean Data Data Cleaning Report
Construct Data Derived Attributes
Generated Records
Integrate Data Merged Data
Format Data Reformatted Data
Dataset Dataset Description
Modeling Select Modeling Techniques Modeling Technique
Modeling Assumptions
Generate Test Design Test Design
Build Model Parameter Settings
Model Descriptions
Assess Model Model Assessment
Revised Parameter Settings
Evaluation Evaluate Results Assessment of Data Mining Results w.r.t. Business Success Criteria
Approved Models
Review Process Review of Process
Determine Next Steps List of Possible Actions
Deployment Plan Deployment Deployment Plan
Plan Monitoring and Maintenance Monitoring and Maintenance Plan
Produce Final Report Final Report
Final Presentation
Review Project Experience Documentation