Ubuntu 20.04 LTS Configuration Guide for Deep Learning Model Training on Nvidia RTX GPUs

Anurag Bhatt
8 min readJun 19, 2023

--

Photo by Thomas Foster on Unsplash

Introduction

In this article, we will guide you through the process of configuring your Ubuntu 20.04.6 LTS system with an Nvidia RTX GPU for training deep learning models using popular frameworks like TensorFlow or PyTorch. By following our step-by-step instructions, you’ll be able to harness the power of your Nvidia RTX GPU and leverage the capabilities of Ubuntu for efficient deep-learning model training.

Hardware Requirements

While these instructions are applicable to various RTX Series GPUs, it’s important to note that I have personally tested them on my system equipped with an Nvidia RTX 3080Ti and Ubuntu 20.04.6 LTS. So, let's start the installation.

Nvidia GPU Driver Installation

To begin configuring our Nvidia GPU, the first step is to install the Nvidia GPU driver on your system. To check if the driver is already installed, open your terminal and enter the following command:

nvidia-smi

If the output displayed matches the screenshot below, it means the driver is already installed, and you can skip this section and proceed to the ‘Conda Installation’ section.

Fig 1

However, if you do not see the expected output, it indicates that the driver is not installed. In that case, follow the steps below to install it.

Step 1: Installing the Nvidia GPU Driver

To start the installation process, open the ‘Software & Updates’ panel on your system. You can find it in your applications. Once opened, navigate to the ‘Additional Drivers’ section. The window should resemble the screenshot below.

Fig 2

In the above screenshot, you will notice various Nvidia driver versions listed. We recommend selecting a driver version that includes the terms ‘metapackage’ and ‘proprietary.’ In the above screenshot, four driver versions are highlighted in red boxes: 470, 515, 525, and 530.

When selecting a version, keep in mind that some versions may cause permanent network connection issues. In a future article, we will discuss how to resolve these issues. For now, we advise selecting the oldest version as it tends to be more stable. In this demonstration, we will install ‘nvidia-driver-470.’

To proceed with the installation, open the command line and enter the following command:

sudo apt install nvidia-driver-470

After typing the command, you will be prompted to enter your password. Once entered, you will need to type ‘Y’ to confirm the installation. The installation process will then begin. Sit back, relax, and let the installation progress.

After successfully installing the Nvidia GPU driver, the next step is to update the system and verify the installation. Open your terminal and enter the following command:

sudo apt update

Once the update is complete, enter the command below to verify the installation:

nvidia-smi

Running this command will display the output, which should resemble the screenshot labeled ‘Fig 1.’ If the output matches the expected results, congratulations! Your Nvidia driver is now installed and functioning properly.

If the output doesn’t matches the expected result restart the system and type the above command again.

This verification step confirms that the Nvidia GPU driver is correctly recognized by the system, providing you with access to the powerful capabilities of your Nvidia GPU for deep learning model training. Now, you can proceed to the next steps and configure your system for efficient deep-learning tasks.

Conda Installation

In this section, we will walk you through the process of installing Miniconda, a lightweight version of the Anaconda distribution, which provides a hassle-free way to manage your Python environments for deep learning. Follow the steps below to install Miniconda on your Ubuntu system.

Step 1: Installing Miniconda

To begin, open your terminal and copy the command below, then paste it into the terminal and press Enter. This command will download the Miniconda installer:

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh

Next, execute the following command to run the Miniconda installer:

bash Miniconda3-latest-Linux-x86_64.sh

Follow the prompts in the installer to proceed with the installation. You will be asked to review the license terms, specify the installation location, and set up the Miniconda base environment. Once the installation is complete, you can move on to the next step.

Step 2: Updating System Packages

Before proceeding, it’s essential to update the system packages. Type the following command in the terminal:

sudo apt update

This command will update the packages on your system, ensuring that you have the latest software versions and dependencies.

Step 3: Verifying Miniconda Installation

To verify that Miniconda is successfully installed, type the following command in the terminal:

conda -V

If the installation was successful, this command will display the version of Conda, which is the package manager included with Miniconda. This confirms that Miniconda is installed correctly and ready to be used for managing Python environments.

You have successfully installed Miniconda on your Ubuntu system. With Miniconda, you can easily create isolated Python environments and install the necessary packages for your deep learning projects. In the upcoming sections, we will explore how to configure these environments and install CUDA drivers, as well as TensorFlow or PyTorch, to facilitate deep learning model training.

Cuda Toolkit Installation

Moving forward with the configuration process, we will now create a Conda environment named ‘notebook’ and install all the necessary CUDA drivers within this environment. Follow the steps below to set up the environment and install the required components.

Step 1: Creating and Activating the ‘notebook’ Environment

To create the ‘notebook’ environment, open your terminal and enter the following command:

conda create --name notebook python=3.9

This command will create a new Conda environment named ‘notebook’ with Python 3.9 as the base. You can name anything to your environment I am using ‘notebook’.

Once the environment is created, you can activate it using the following command:

conda activate notebook

To deactivate the environment and switch back to your base environment, use the command:

conda deactivate

Ensure that the ‘notebook’ environment is activated for the rest of the installation steps.

Step 2: Now, we will install CUDA and cuDNN within the ‘notebook’ environment.

It’s important to note that for every new environment you create using Conda, you need to install CUDA and cuDNN separately.

Within the activated ‘notebook’ environment, enter the following command to install CUDA using the Conda package manager:

conda install -c conda-forge cudatoolkit=11.8.0

Next, use pip to install the specific version of cuDNN required. Execute the following command:

pip install nvidia-cudnn-cu11==8.6.0.163

After the installation is complete, create a directory and set environment variables using the commands below:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

These commands will create the necessary directory and set the environment variables required for CUDA and cuDNN.

You have successfully created the ‘notebook’ environment and installed the CUDA drivers and cuDNN within it. In the next steps, we will proceed with installing TensorFlow or PyTorch, depending on your preference.

Install TensorFlow

Step 1: Installing TensorFlow with pip

To begin, open your terminal and enter the following command:

pip install tensorflow==2.12.*

It is recommended to use pip for installing TensorFlow, as it is the official method of release.

Step 2: Verifying the TensorFlow Installation

Before verifying tensorflow installation of GPU close the terminal and open again then go to ‘notebook’ environment and then follow the below instructions.

To verify the successful installation of TensorFlow, enter the following command in your terminal:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Executing this command will output a list of GPU devices if TensorFlow has been installed correctly. You can see the below screenshot.

Fig 3

If you see a list of GPU devices, it signifies that TensorFlow is installed and ready to utilize the computational capabilities of your Nvidia GPU for deep learning tasks.

Congratulations on successfully connecting your GPU with TensorFlow! You are now ready to unleash the power of deep learning and enhance the efficiency of your training tasks. Get ready to explore a wide range of deep learning applications and maximize your training performance.

The above steps can be applied in a similar manner to PyTorch. Instead of installing TensorFlow, you can install PyTorch by using the following command:

pip install torch

You can check that GPUs are available or not by typing the below Python code.

import torch
print(torch.cuda.is_available())

Personal Advice

In the world of software installation and troubleshooting, there are instances when even with everything installed correctly, the expected output remains elusive. In such situations, a simple yet effective tip is to close the terminal and open it again. Surprisingly, this small action can make a significant difference. By reopening the terminal and reissuing the previous command that failed to execute, there is a chance that it will now work as intended. It may sound like a simple solution, but sometimes, the most straightforward steps can yield unexpected and favorable results in the complex realm of technology.

Ubuntu 22.04

If you are using Ubuntu 22.04, you might encounter the following error related to the libdevice directory and ptxas invocation when working with CUDA:

Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice.
...
Couldn't invoke ptxas --version
...
InternalError: libdevice not found at ./libdevice.10.bc [Op:__some_op]

To resolve this error and ensure the smooth execution of your CUDA-related tasks, follow the steps below:

Step 1: Installing NVCC

First, install the NVCC compiler by executing the following command:

conda install -c nvidia cuda-nvcc=11.3.58

This command will install the necessary NVCC compiler version.

Step 2: Configuring the XLA CUDA Directory

To configure the XLA CUDA directory, enter the following commands:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
printf 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

These commands create the necessary directory and set the XLA_FLAGS environment variable.

Step 3: Copying the libdevice File

To resolve the missing libdevice file issue, execute the commands below:

mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/

These commands create the required directory and copy the libdevice file to the specified path.

By following these steps, you should be able to resolve the lib device-related error and ensure smooth CUDA operations on your Ubuntu 22.04 system.

References:

For an official and up-to-date installation procedure for CUDA and related packages, you can refer to the TensorFlow documentation at the following link:

Link to Tensorflow Installation procedure

Please note that the commands provided in this article may become outdated over time as new versions and updates are released. To ensure that you have the latest information and instructions, it is always recommended to refer to the official documentation.

If you need any clarification or come across any errors while following this article, please leave a comment, and I will be happy to assist you. Your feedback is valuable, and I will make sure to address any concerns and provide further guidance.

Thank you for reading, and happy deep learning model training with your Nvidia GPU!

Peace out!

--

--