PyTorch is a GPU accelerated tensor computational framework with a Python front end. Functionality can be easily extended with common Python libraries such as NumPy, SciPy, and Cython. Automatic differentiation is done with a tape-based system at both a functional and neural network layer level. This functionality brings a high level of flexibility and speed as a deep learning framework and provides accelerated NumPy-like functionality.
These instructions are intended to help you install PyTorch on the FASRC cluster.
For general information on running GPU jobs refer to our user documentation.
To set up PyTorch with GPU support in your user environment, please follow the below steps:
(1) Start an interactive job requesting GPUs, e.g.,
$ srun --pty -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1 /bin/bash(2) Load required software modules, e.g.,
$ module load python/3.6.3-fasrc02
$ module load cuda/10.1.243-fasrc01
$ module load cudnn/7.6.5.32_cuda10.1-fasrc01(3) Create a conda environment, e.g.,
$ conda create -n pt1.3_cuda10 python=3.7 pip numpy wheel matplotlib(4) Activate the new conda environment:
$ source activate pt1.3_cuda10
(pt1.3_cuda10)(5) Install PyTorch with conda
$ conda install pytorch=1.3 torchvision cudatoolkit=10.0 -c pytorchFor an interactive session to work with the GPUs you can use following:
$ srun --pty -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1 /bin/bashLoad required software modules and source your PyTorch conda environment.
[username@holygpu2c0716 ~]$ module load cuda/10.1.243-fasrc01 cudnn/7.6.5.32_cuda10.1-fasrc01 python/3.6.3-fasrc02 && source activate pt1.3_cuda10
(pt1.3_cuda10)Test PyTorch interactively:
(pt1.3_cuda10) $ python check_gpu.py
Using device: cuda
Tesla V100-PCIE-32GB
Memory Usage:
Allocated: 0.0 GB
Cached: 0.0 GB
tensor([[-1.2709, 2.0035]], device='cuda:0')check_gpu.py checks if GPUs are available and if available sets up the device to use them.
#!/usr/bin/env python
import torch
# Setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()
# Print out additional information when using CUDA
if device.type == 'cuda':
print(torch.cuda.get_device_name(0))
print('Memory Usage:')
print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
print('Cached: ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')
print()
# Run a small test on the available device
T = torch.randn(1,2).to(device)
print(T)An example batch-job submission script is included below:
#!/bin/bash
#SBATCH -c 1
#SBATCH -N 1
#SBATCH -t 0-00:30
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --mem=4G
#SBATCH -o pytorch_%j.out
#SBATCH -e pytorch_%j.err
# Load software modules and source conda environment
module load Anaconda3/5.0.1-fasrc02
module load module load cuda/10.1.243-fasrc01 cudnn/7.6.5.32_cuda10.1-fasrc01
# Run program
srun -c 1 --gres=gpu:1 python check_gpu.py If you name the above batch-job submission script run.sbatch, for instance, the job is submitted with:
$ sbatch run.sbatch