Kaggle Learn ===== Find the courses which are important for our module IM 870 "Data Science" directly at [Kaggle Learn](https://www.kaggle.com/learn/overview). Find a short overview below. ## Stuff related to dsci-pytrans * 1 Hello, Python * 2 Functions and Getting Help * 3 Booleans and Conditionals * 4 Lists * 5 Loops and List Comprehensions ## Main course dsci : * 1 How Models Work: The first step if you're new to machine learning * 2 Basic Data Exploration: Load and understand your data * 3 Your First Machine Learning Model: Building your first model. Hurray! * 4 Model Validation: Measure the performance of your model ? so you can test and compare alternatives * 5 Underfitting and Overfitting: Fine-tune your model for better performance. * 6 Random Forests: Using a more sophisticated machine learning algorithm. : * 2 Missing Values: Missing values happen. Be prepared for this common challenge in real datasets. * 3 Categorical Variables: There's a lot of non-numeric data out there. Here's how to use it for machine learning * 4 Pipelines: A critical skill for deploying (and even testing) complex models with pre-processing * 5 Cross-Validation: A better way to test your models * 6 XGBoost: The most accurate modeling technique for structured data * 7 Data Leakage: Find and fix this problem that ruins your model in subtle ways : * 2 Line Charts: Visualize trends over time * 3 Bar Charts and Heatmaps: Use color or length to compare categories in a dataset * 4 Scatter Plots: Leverage the coordinate plane to explore relationships between variables * 5 Distributions: Create histograms and density plots * 6 Choosing Plot Types and Custom Styles: Customize your charts and make them look snazzy : * 1 Creating, Reading and Writing: You can't work with data if you can't read it. Get started here. * 2 Indexing, Selecting & Assigning: Pro data scientists do this dozens of times a day. You can, too! * 3 Summary Functions and Maps: Extract insights from your data. * 4 Grouping and Sorting: Scale up your level of insight. The more complex the dataset, the more this matters * 5 Data Types and Missing Values: Deal with the most common progress-blocking problems * 6 Renaming and Combining: Data comes in from many sources. Help it all make sense together : * 1 Baseline Model: Building a baseline model as a starting point for feature engineering * 2 Categorical Encodings: There are many ways to encode categorical data for modeling. Some are pretty clever. * 3 Feature Generation: The frequently useful case where you can combine data from multiple rows into useful features * 4 Feature Selection: You can make a lot of features. Here's how to get the best set of features for your model. All notebooks of Kaggle Learn have been published under the [Apache 2.0 open source license](http://www.apache.org/licenses/LICENSE-2.0). We are allowed to modify the notebook slightly (what we have carefully done in some few places, mainly to read local files) and to redistribute our modification also under Apache 2.9 license. We advice to create a kaggle account in order to go through the course material. For a legacy snapshot of the above courses see [Moodle -> Legacy Snaphots -> KaggleLearn_ws2020.tar.gz](https://moodle.haw-landshut.de/mod/resource/view.php?id=311365).