Project Jupyter created an indispensable tool for every Data Scientist that finally made exploring and visualizing data a fun thing to do: Jupyter Notebooks. Although the Notebooks app works great, they built a new and more powerful and extensible front end: JupyterLab.
Tell me, where does it hurt?
Finally, I wanted to make the switch and tried to install JupyterLab 1.0.4 based on a Vagrant box (ubuntu/bionic64
). Until that point it was all fun and games. As recommended in the official docs, I went with Anaconda to install the various required Python packages, as I didn’t want to drown in a hot lava sea down in the dependency hell. In the past, I had good experiences installing Jupyter Notebook using Anaconda. However, I couldn’t get JupyterLab installed with Anaconda this time and wasted several hours scratching my head. Let me give you some impressions:
- Packages I like to use in notebooks (such as
for data visualization) ended up in other virtual environments than the actualplotly
package, although I never specified a particular environment name. Python virtual environments are still some kind of a book of seven seals for me.jupyter
- Importing
pandas
in a notebook gave me complaints about old versions of
. I had hoped that Anaconda took automatically care of this matter ๐numpy
warned me about being out of date (although I installed the latest version available from the Anaconda website). Trying to update it as documented gave me error messages for which my intensive online research didn’t deliver any results to solve the problem.conda
I’ll be honest: Probably I’m simply to stupid to use Anaconda. I found that it has rarely been true that a widely used software was buggy, when i had my issues with it – more often the bug was sitting in front of my computer. In fact Anaconda has the ability to not only install Python packages, but also non-Python packages, which are required for many Python packages to work properly. However, after trying very hard for several hours, I chose to take another approach.
Running the installation with pip3
I took the easiest route on the map and installed all my packages with the Python 3 Package Manager pip
, more specifically pip3
. One immediate observation I made was a much quicker download and installation process – it felt faster by magnitudes. On the other hand, you really need to know which packages you need to install on Ubuntu and for your local Python 3 installation. An article from idroot.us served as a guideline to this setup process.
The commands shown here have been tested on a Ubuntu 18.04 Vagrant box, but work on any other Ubuntu system as well.
Python 3
At first, we need to install a set of packages on the operating system level to enable the usage of Python 3.
sudo apt update sudo apt install -y \ python3 \ python3-pip \ python3-dev \ curl
Running installations with apt
on Ubuntu and Debian requires root permissions. Because of this, do not forget to use sudo
. As not all flavors of Ubuntu out there are always fully equipped with all kinds of tools, I added curl
to the list of packages to be installed. We’ll need this one later.
Popular packages for Data Science purposes
Working with Notebooks in JupyterLab doesn’t make any sense without some prominent libraries from the Data Science community. These will be installed using the now available
command:pip3
pip3 install pandas \ fastparquet \ pyarrow \ tables \ plotly \ seaborn \ xlrd
While
is the most-used library for data analysis in Python, pandas
and fastparquet
are packages that will allow you to persist your raw or processed data to disk into compressed formats which can be reloaded into memory very fast. I personally prefer the HDF format (fueled by the pyarrow
package), as it is very tolerant about the data types in your data frames and as it is also amazingly quick when loading Gigabytes of data.tables
and plotly
are great options to visualize data in your notebook, as well as for interactive plots to be used in presentations. Installing seaborn
, another more basic visualization library (seaborn
) will be installed as a dependency – matplotlib
will take care of that.pip3
lets you use xlrd
pandas
to read Excel sheets. Pretty useful.
JupyterLab
JupyterLab requires Jupyter Core to be installed. One part of it is installed on the operating system level, the other one again using
:pip3
sudo apt install -y ipython pip3 install jupyter jupyterlab
In the upcoming steps we will want to use the CLI of the newly installed
. For this purpose we need to make the path of the CLI known to our shell:jupyter
export PATH=$PATH:~/.local/bin source ~/.bashrc
JupyterLab Extensions
In comparison to the established Notebooks app, JupyterLab has a redesigned the component which allows developers to build extensions for JupyterLab to make data analysis even more fun. This will require a supported version of NodeJS. We’ll go with the LTS release.
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.8/install.sh | bash export NVM_DIR="$HOME/.nvm" [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" [ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" nvm install --lts
Although being fairly new, the community around JupyterLab has already implemented some nice extensions, which we can now install using the
command (be patient, this can take some time …):jupyter labextension install
jupyter labextension install \ @jupyterlab/toc \ jupyterlab-chart-editor \ jupyterlab-spreadsheet
displays a clickable table of contents in JupyterLab’s sidebar, which is very useful in longer notebooks. @jupyterlab/toc
is required for Plotly charts to work properly in JupyterLab. That was one obstacle that kept me from switching to JupyterLab until now. @jupyterlab/plotly-extension
allows a WYSIWYG-like manipulation of charts within the graph. jupyterlab-chart-editor
makes it possible to display Excel sheets directly in the JupyterLab canvas. You can display all installed extensions using the command: jupyterlab-spreadsheet
jupyter labextension list
UPDATE (2019-01-29): I had to remove the extension
from my list of extensions to install, so I also updated it in this post. The extension caused the setup process to stall indefinitely (or at least so long that I lost my patience :-)) Also looks like this extension was deprecated, but apparently there is a way to get Plotly setup into JupyterLab and according to this GitHub Issue it should work, but I haven’t tested it yet myself.@jupyterlab/plotly-extension
Run JupyterLab as a system service
Now, we will register the command to start JupyterLab as a Linux systemd
unit.
CAUTION: The procedure shown below will configure JupyterLab to run listening on all network interfaces and without any authentication! Also, it will only be possible to connect to JupyterLab using HTTP. HTTPS would require setting up certificates. If you are looking for instructions to secure your Jupyter installation with Let’s Encrypt, check out this blog post.
Define home directory and data directory (adjust to your needs):
USER_HOME_DIR=$(echo ~) DATA_DIR="/mydata"
Create the data directory:
mkdir -p ~/.config/systemd/user/
Create the service by echoing the text into a service unit file:
echo "[Unit] Description=JupyterLab [Service] ExecStart=$USER_HOME_DIR/.local/bin/jupyter lab --no-browser --port=8888 --ip=0.0.0.0 --NotebookApp.token= --notebook-dir=$DATA_DIR WorkingDirectory=$DATA_DIR [Install] WantedBy=default.target" > ~/.config/systemd/user/jupyterlab.service
The above snippet will create the service definition. You may want to adjust the value
. It points to the path on your Ubuntu box, to which JupyterLab will have access. Ideally this is where all your existing notebooks and data sets are residing.DATA_DIR
To set a password for accessing JupyterLab, run the command:
, and type the passwordjupyter notebook password
To enable the service, we will use
. As we are setting the service up as a user-specific service, we need to run systemctl --user enable
. Otherwise services would automatically be stopped as soon as there aren’t anymore open shells for our current user:loginctl enable-linger
systemctl --user enable jupyterlab.service loginctl enable-linger
Here we go!
Let’s start up JupyterLab!
systemctl --user start jupyterlab
A few seconds later you should be able to open JupyterLab on:
http://<IP or hostname>:8888
If you experience any issues here, check the status of the service to get more details:
systemctl --user status jupyterlab
Final thoughts
Now that you’ve launched your spaceship towards JupyterLab, I wish you luck working on your notebooks. Live long and prosper! ๐