When you have set up Apache Spark and use Jupyter to run analyses on it, you’ll need to connect to the Jupyter notebooks by forwarding the port the notebooks run on to your local machine.
Depending on how the server that runs Spark is secured, you might need to do that through a “jump box”, a server that is hardened to prevent unauthorized access and let’s you access a network that’s otherwise not directly accessible from the Internet.
If you’re as untrained in using
ssh as I am, it can be a bit frustrating to set that up yourself because it’s not entirely obvious when googling around. In the tradition of writing things up so that I don’t have to google them over and over again, here’s how to do it.
The first thing to know: there’s a file called
~/.ssh/config where you can “store”
ssh connections instead of typing them in manually all the time. That’s what makes it possible to type
ssh my-server and access your server instead of
ssh my-username@my-host-address -i /path/to/ssh-key-file. Blew my mind when I learned this.
~/.ssh/config in your editor of choice, then add the following:
Host jump-box HostName jumpbox.yourdomain.com User your-user-name-on-jump-box IdentityFile /Users/local-user-name/.ssh/ssh_key_file_for_jump-box ForwardAgent yes Host jupyter-box User your-username-on-jupyter-server ForwardAgent yes ProxyCommand ssh -q jump-box nc address-of-jupyter-box 22 IdentityFile /Users/local-user-name/.ssh/ssh_key_for_jupyter-box
When you’ve added this to your
~/.ssh/config file, all you need to do to connect to the protected server and forward the port to access your Jupyter notebooks is:
$ ssh -L 8889:localhost:8889 jupyter-box
In this case, we’re forwarding port
8889, which is the port that my Jupyter notebooks are running on.