Kaggle Learn (KL)#

To work online only:

  • Allocate an (more or less) anonymous Account at Kaggle.

  • Find the courses which are important for our module IM 870 “Data Science” directly at Kaggle Learn.

To work offline in the dsci-lab:

  • change directory to /home/dsci/b/KaggleLearn/.

  • Find the respective Kaggle notebooks in the respective subdirectories.

The local ipynb files are alread adapted to import the relevant files properly from /home/dsci/b/KaggleLearn/input/...

KL > Python#

Kaggle: https://www.kaggle.com/learn/python; dsci-lab: ~/b/KaggleLearn/Python/Python.ipynb

  • 1 Hello, Python

  • 2 Functions and Getting Help

  • 3 Booleans and Conditionals

  • 4 Lists

  • 5 Loops and List Comprehensions

KL > Intro to Machine Learning#


  • 1 How Models Work: The first step if you’re new to machine learning

  • 2 Basic Data Exploration: Load and understand your data

  • 3 Your First Machine Learning Model: Building your first model. Hurray!

  • 4 Model Validation: Measure the performance of your model ? so you can test and compare alternatives

  • 5 Underfitting and Overfitting: Fine-tune your model for better performance.

  • 6 Random Forests: Using a more sophisticated machine learning algorithm.

KL > Intermediate Machine Learning#


  • 2 Missing Values: Missing values happen. Be prepared for this common challenge in real datasets.

  • 3 Categorical Variables: There’s a lot of non-numeric data out there. Here’s how to use it for machine learning

  • 4 Pipelines: A critical skill for deploying (and even testing) complex models with pre-processing

  • 5 Cross-Validation: A better way to test your models

  • 6 XGBoost: The most accurate modeling technique for structured data

  • 7 Data Leakage: Find and fix this problem that ruins your model in subtle ways

KL > Data Visualization#


  • 2 Line Charts: Visualize trends over time

  • 3 Bar Charts and Heatmaps: Use color or length to compare categories in a dataset

  • 4 Scatter Plots: Leverage the coordinate plane to explore relationships between variables

  • 5 Distributions: Create histograms and density plots

  • 6 Choosing Plot Types and Custom Styles: Customize your charts and make them look snazzy

KL > Pandas#


  • 1 Creating, Reading and Writing: You can’t work with data if you can’t read it. Get started here.

  • 2 Indexing, Selecting & Assigning: Pro data scientists do this dozens of times a day. You can, too!

  • 3 Summary Functions and Maps: Extract insights from your data.

  • 4 Grouping and Sorting: Scale up your level of insight. The more complex the dataset, the more this matters

  • 5 Data Types and Missing Values: Deal with the most common progress-blocking problems

  • 6 Renaming and Combining: Data comes in from many sources. Help it all make sense together

KL > Feature Engineering#

Quelle: Vormals https://www.kaggle.com/learn/feature-engineering (wurde ersetzt durch ein gleichnamiges Modul mit anderem Inhalt), jetzt nur noch unter https://www.kaggle.com/matleonard/code:

  • 1 Baseline Model: Building a baseline model as a starting point for feature engineering

  • 2 Categorical Encodings: There are many ways to encode categorical data for modeling. Some are pretty clever.

  • 3 Feature Generation: The frequently useful case where you can combine data from multiple rows into useful features

  • 4 Feature Selection: You can make a lot of features. Here’s how to get the best set of features for your model.

  • TBD: Neue Inhalte aus https://www.kaggle.com/learn/feature-engineering sichten und bewerten

All notebooks of Kaggle Learn have been published under the Apache 2.0 open source license. We are allowed to modify the notebook slightly (what we have carefully done in some few places, mainly to read local files) and to redistribute our modification also under Apache 2.9 license.