build the dsci-lab by yourself

Note: You are not requested to build the dsci-lab by yourself. Instead you are advised to download the fully configured dsci-lab as a ready-to-use virtual machine.

However, if you are a lecturer or a deeply interested student you might want to learn how we set up the dsci-lab. Here are the steps.

Download an ISO image

Download an ISO image (ca. 1.55 GB) of the LTS release: 20.04, Focal Fossa, 64-bit of XUbuntu e.g. from https://xubuntu.org/download/ > http://ftp.uni-kl.de/pub/linux/ubuntu-dvd/xubuntu/releases/20.04/release/ http://ftp.uni-kl.de/pub/linux/ubuntu-dvd/xubuntu/releases/20.04/release/xubuntu-20.04-desktop-amd64.iso : right click on file, “save as”, save to disk

Install Basic System

Start Oracle VM Virtual Box Manager.

  • Maschine > Neu > Name: e.g. xubuntu-august-2020, Typ: Linux, Version: Ubuntu (64-bit)

  • Speichergröße: 1024 MB

  • Platte: Festplatte erzeugen

    • Dateityp der Festplatte: VDI

    • Art der Speicherung: dynamisch alloziert

    • Dateiname und Größe: hier mindestens 100 GB angeben > Erzeugen

Start machine xubuntu-august-2020. A window will pop up: “Medium für Start auswählen”

  • Medium hinzufügen > locate xubuntu-20.04-desktop-amd64.iso on your disc (ca. 1.55 GB) > Auswählen

  • Starten

VirtualBox will start the ISO image.

  • Select language, e.g. “Deutsch”.

  • Install XUbuntu

  • Aktualisierungen herunterladen

  • Installationsart: Festplatte löschen und installieren (this will clean only your newly allocated virtual hard disc, NOT the disk of your host system); > Installieren

  • give Name, Username etc.. WRITE down your password on a secure location!

    • Name: Data Scientist

    • Name des Rechners: dsci-lab-march-2021

    • Benutzername: data

    • password: datadata (this is an intitial and very weak password, you MUST it change later!)

“Die Installation ist abgeschlossen. Sie müssen jetzt den Rechner neu starten, um das System zu benutzen” > Jetzt neu starten

  • “Remove installation medium”: (nothing to do), “press ENTER”: press Enter!

  • log in with user Data scientist, password datadata

We will use the command line hereafter where possible. Get a new terminal by typing Strg-Alt-t.

Update your installation

sudo apt upgrade; sudo apt update
sudo reboot

Manually type in (and don’t forget to add the ampersand “&” at the end of the line):

firefox http://jbusse.de/dsci-lab/dsci-lab-build.html &

If things worked well, Firefox will open this page you are currently reading in your new linux machine. This allows you to copy & paste the commands below to your terminal.

Install Guest Extensions

To prepare the installation of new kernel modules we need gcc:

sudo apt install gcc make perl

Oracle Virtual Box > Geräte > Gasterweiterungen einlegen: A window pops up, showing the directory /media/data/VBox_GAs_6.1.6./ > rightclick on background, “Terminal hier öffnen”, a new terminal opens. Type in:

sudo ./VBoxLinuxAdditions.run

Reboot the VM:

sudo reboot

You now should be able to resize the VM window, activate bidirectional Copy & Paste from Windows-Host to VB etc.

Basic Packages

At this stage you have a clean, brand new virtual XUbuntu machine, with VirtualBox Guest extensions enabled.

To get our dsci-lab version you have to install some more packages. You can do so by simply copying the following commands into a terminal (open a new terminal e.g. by typing Strg-Alt-t):

LaTeX (optional, not contained in dsci-lab-march-2021)

sudo apt install texlive-xetex fonts-freefont-otf latexmk

Mindmap (incl. Java):

sudo apt install freeplane

Add your favourite programming editor, e.g.

sudo apt install emacs
sudo apt install vim

Jupyter and Conda

Jupyter

sudo apt install jupyter

Conda:

cd Downloads/
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x ./Miniconda3-latest-Linux-x86_64.sh 
./Miniconda3-latest-Linux-x86_64.sh 

(Instead of installing miniconda you also might want to install anaconda (https://www.anaconda.com/products/individual). Anaconda is much more complete than miniconda, but IMHO fo “fat”. In our dsci-lab we prefer a lightweight system. This allows you to look more easily “under the hood”, to understand what’s going on, and to maintain the whole system - the dependencies in our setup are complex enough anyhow.)

After you have installed Conda, close your terminal (CTRL-D) and open a new terminal again (e.g. with Strg-Alt-t). (Why close and open? In an earlier step you have installed conda. Conda puts an extra virtual environment layer over the standard Python installation, so we can work with multiple Python configurations in parallel. To learn more about conda virtual environments see https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-conda.html)

Your termial command line now should start with (base), which is the name of your current virtual conda environment:

(base) jb@jb-ThinkPad-X250:~$

As said: Conda is minimalistic. Thus we have to install some modules by ourselfes. Some important ones are:

conda install pandas numpy matplotlib scikit-learn seaborn xgboost

Notes:

  • We do install these packages into the virtual conda environment base. If we decide to create another virtual conda environment, it will be empty again, and we have to populate it with libraries again. This is the reason (a) why we prefer lightweight environments, and (b) why we want to learn how to install libraries by ourselfes.

  • Caveat: Install Conda not with sudo, but instead with the role of a normal user. Every user and every virtual conda environment are completely independent from each other. There is no system-wide installation.

Keep conda current:

conda update conda

Install jupytext:

conda install -c conda-forge jupytext

Zotero

Download Zotero from https://www.zotero.org/download/ to /home/data/Downloads/

Install Zotero:

cd Downloads
bunzip2 Zotero-5.0.96_linux-x86_64.tar.bz2
tar -xvf Zotero-5.0.96_linux-x86_64.tar
mv Zotero_linux-x86_64/ ../Zotero
cd

Also do install the Zotero Firefox Connector from the zotero-site.

Launch Zotero the first time:

~/Zotero_linux-x86_64/zotero &

Enhance Zotero with https://retorque.re/zotero-better-bibtex/:

Zotero wants you to restart Zotero several times: do it. You will find that Zotero has allocated the new directory ~/Zotero: Include this directory to your list of directories which are backuped daily ;-)

Zotero allows you to link to your own files within your base (i.e. home) directory. Allocate a directory where you would like to store literature you download from the web. I myself e.g. use this location:

/home/data/a/l2/linked_zotero_files

(/home/data/a/l2 is a location which gets a weekly backup, as opposed e.g. to /home/data/a/l, which holds self made data and thus gets backups on a daily basis).

Tell the location of your own linked_zotero_files - folder to Zotero:

  • Zotero > Erweitert > Dateien und Ordner > Basisverzeichnis für verknüpfte Dateianhänge > Auswählen > /home/data/a/l2/linked_zotero_files

(Hint: Due to a Zotero bug (as of Nov 2020) this directory must not start with the string “zotero”).

jupyter-book

Jupyter-book v0.8 (March 2021, c.f. https://jupyterbook.org/intro.html) makes use of the highly sophisticated documentation tool Sphinx to create websites and (via LaTeX) a pdf-book out of a bunch of jupyter ntebooks.

pip install -U jupyter-book

Test the installation: Build the book according to https://jupyterbook.org/start/build.html

mkdir test
cd test
jupyter-book create mybookname
jupyter-book build mybookname
jupyter-book build mybookname/ --builder pdflatex
atril mybookname/_build/latex/book.pdf &
cd

Gemeinsame Ordner

Einmalig anlegen:

mkdir a

Im Menü von VirtualBox: Geräte > Gemeinsame Ordner >

  • klicke Icon “Ordner+” (Fügt einen neuen gemeinsamen Ordner hinzu”

  • Ordner-Pfad: auf dem Host-Rechner aussuchen

  • Ordner-Name: z.B. “arbitraryname” (kann beliebig heißen, muss nur eindeutig sein)

  • Permament erzeugen: Check

Nach jedem Hochfahren der virtuellen Maschine:

sudo mount  -t vboxsf -o uid=1000,gid=1000 arbitraryname ~/a

Hinweis: Dieses - komplizierte - Kommando befindet sich ja auch schon in der bash-history. mit STRG-r mount bekommt man dieses Kommando sofort wieder angezeigt, also kein Aufwand.