Installation
Installation instructions for different run modes of STREAMLINE.
Google Colab Notebook
No installation is required to run STREAMLINE in the included Google Colab Notebook. The only other step is to make sure that you have a Google account (free) and click the link below:
https://colab.research.google.com/drive/14AEfQ5hUPihm9JB2g730Fu3LiQ15Hhj2?usp=sharing
Local Installation
The instructions below are for installing STREAMINE locally in order to run it either in the included Jupyter Noteook or from the command line.
Prerequisites
First, be sure to install or confirm previous installation of the following prerequisites.
Git
Install git (if not already installed).
You can test for an existing installation by typing
git
in your command-line.Git installation instructions can be found here.
Anaconda
We recommend installing the most recent stable version of Anaconda3 appropriate for your operating system (if not already installed) which automatically includes Python3 and a number of other common packages used by STREAMLINE (e.g. pandas, scikit-learn, etc.). Python3 is the most essential prerequisite here. Additional required Python packages will automatically be installed by the installation commands (below).
You can test for an existing installation by typing
conda
on your command-line.Anaconda installation instructions can be found here
If you already have Anaconda installed, we recommend updating conda and anaconda (as follows) prior to running STREAMLINE to avoid downstream module/environment errors
conda update conda
conda update anaconda
While STREAMLINE can run on native Python3, this is not generally recomended, since issue resolution becomes complex, especially in MacOS based systems.
Installation Commands
After confirming that you have the prerequisites above, navigate to the directory where you want to save STREAMLINE, and use the following commands in the command-line terminal:
git clone --single-branch https://github.com/UrbsLab/STREAMLINE
cd STREAMLINE
pip install -r requirements.txt
The above 3 commands do the following:
Download the most recent release repository of STREAMLINE
Navigate to the root STREAMLINE directory from where the package can run
Install all other packages required to run STREAMLINE on the local system (see
requirements.txt
for the complete list of these packages)
Now the STREAMLINE package can be run from the STREAMLINE root directory.
Troubleshooting
If you see errors or warnings when running the above commands, this may indicate that the required packages might not be installing properly, which may prevent STREAMLINE from running to completion. We recommend always testing the STREAMLINE installation first by running it (in the desired run mode) on the demonstration data with included/example default run parameters. Issues related to installation will be evident if you get ‘module’ related errors when running STREAMLINE. This is most likely to happen if you are working from a previous installation of Anaconda, and you should update both conda
and anaconda
first and then retry the commands above.
Jupyter Notebook
If you with to run STREAMLINE using the included Jupyter Notebook, additionally do the following:
Make sure the jupyter package is installed using the following command:
pip install jupyter
Run Jupyter Notebook using the command
jupyter notebook
Within the web page that opens, navigate into the saved STREAMLINE folder and open the
STREAMLINE-Notebook.ipynb
file.
For more information on Jupyter Notebook, click here.
Cluster Installation
STREAMLINE installation for a CPU computing cluster (i.e. HPC) is essentially the same as for local installation, but may include extra steps or troubleshooting based on your HPC setup. As for local installation, generally within your cluster home/working directory, you’ll want to install Git, Anaconda (with Python3), and use the installation commands to download STREAMLINE and other required Python packages.
Cluster Compatability
We have set up STREAMLINE to be able to able to run on 7 different types of HPC clusters using dask_jobqueue
including [LSF, SLURM, PBS, OAR, Moab, SGE, HTCondor] as documented here. To date we have explicitly tested STREAMLINE only on LSF and SLURM clusters.
Additional Tools
Here we recommend additional tools that may help in running big jobs across all STREAMLINE phases, from a single command. These tool include terminal emulators like tmux
and screen
and terminal text editors like nano
and vim
. In most likelihood these would already be installed in your cluster or available as modules in your cluster.
Terminal Emulators
A terminal emulator is particularly important when you want to run all phases of STREAMLINE automatically from a single command. To achieve this, STREAMLINE runs a script on the head node (i.e. job submission node) that monitors phase completion and submits new jobs for the next phase. Typically, closing your terminal would interupt this process.
Terminal emulator programs allow you to create several “pseudo terminals” from a single terminal. They decouple your programs from the main terminal, protecting them from accidentally disconnecting. You can detach tmux
or screen
from the login terminal, and all your programs will continue to run safely in the background. Later, we can reattach them to the same or a different terminal to monitor the process.
These are also very useful for running multiple programs with a single connection, such as when you’re remotely connecting to a machine using Secure Shell (SSH).
We recommend using tmux
as a terminal emulator. A quick guide on using it can be found here.
Terminal Text Editors
Terminal text editors are simple text editor programs that allow you to edit files through the terminal. These are particularly useful here for quickly editing the configuration file that specifies all STREAMLINE run parameters for running all phases from a single command.
We recommend using nano
as a text editor. A quick guide on using it can be found here
Known Installation Issues
Scipy version error, the way the STREAMLINE is set up it needs scipy>=1.8.0. If you find this is not true or get a version error output, please run
pip install --upgrade scipy
The lightgbm package on pypi doesn’t work out of the box using pip on MacOS. The following command should solve the problem:
conda install -c conda-forge lightgbm
The most recent skrebate package (0.7) does not install correctly with the standard
pip install skrebate
command, however it does withpip install skrebate==0.7
.