# dsci 2020-11-27: Pandas


## Pain Points

Hin- und Her-Konvertieren zwischen numpy und pandas
* Data Wrangling soweit möglich mit pandas machen
* scikit machine learning arbeitet dann mit numpy

Bzgl. Numpy und Pandas ist ein vertieftes Verständnis der Datenstrukturen empfehlenswert; empfohlene Literatur:  Jake VanderPlas: Python Data Science Handbook (s.u.).

Kopieren oder Inplace? 
* Wir Lernende wollen idealerweise immer auf Kopien arbeiten
* denn das ist für Jupyter Notebooks affiner:
   * leichter neu einsteigen in vorangehenden Zellen
   * weniger Überraschungen durch Nebeneffekte

## Pandas Lernkarten

Prima Quelle:

* <https://www.w3resource.com/python-exercises/pandas/index.php>

Empfehlung: Website durchgehen und (echte oder virtuelle) Lern-Karteikarten herstellen (ggf. auch einfach in einer Excel-Tabelle). Beispiel:

* Quelle und Frage, z.B. <https://www.w3resource.com/python-exercises/pandas/index.php > *Select the 'name' and 'score' columns from the following DataFrame*
* Trick identifizieren, hier: `df[['name', 'score']]`
* [Lösung](https://www.w3resource.com/python-exercises/pandas/python-pandas-data-frame-exercise-5.php)

JB zeigt seine Lösung


## Literatur zu Pandas

* <https://chrisalbon.com/> > "Preprocessing Structured Data"
  * dort, wo nur die numpy-Lösung angegeben ist, auch die pandas-Lösung recherchieren!
* Jake VanderPlas: Python Data Science Handbook, online:
  * <https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html>
  * <https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html>
* Stepanek, Hannah: Thinking in Pandas : How to Use the Python Data Analysis Library the Right Way. 2020 (pdf online bib haw la)