I have an EC2 remote desktop with NVIDIA GRID drivers and NICE DCV already installed, but recently I needed to add CUDA support so that I can do accelerated rendering in Blender.

I got the basic steps from this NVIDIA page, but want to add my own notes.

First, to install the CUDA base install, execute these commands (from the link above):

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

Note that the install process may take a while: the .deb itself is several GB, and the cuda-toolkit isn’t small either.

The NVIDIA page (linked above) will tell you to install CUDA drivers, but you can skip that step if you’ve already got NVIDIA GPU drivers installed. At least, it wasn’t necessary for me (Blender can see the GPU once the CUDA base install is complete).

Adding nvcc and other CUDA tools Link to heading

I also wanted the ability to compile CUDA programs, so I installed the NVIDIA CUDA toolkit as well:

sudo apt install nvidia-cuda-toolkit

Note: this pulls in a massive set of dependencies. On my system, it was nearly 4GB!

If you want to test that CUDA is really working, you can try creating a file called cudaDeviceInfo.cu, then paste in the following code:

#include <stdio.h>
#include <cuda_runtime.h>

int main() {
    int nDevices;
    cudaGetDeviceCount(&nDevices);
    for (int i = 0; i < nDevices; i++) {
        cudaDeviceProp prop;
        cudaGetDeviceProperties(&prop, i);
        printf("Device Number: %d\n", i);
        printf("  Device name: %s\n", prop.name);
        printf("  Memory Clock Rate (KHz): %d\n",
               prop.memoryClockRate);
        printf("  Memory Bus Width (bits): %d\n",
               prop.memoryBusWidth);
        printf("  Peak Memory Bandwidth (GB/s): %f\n\n",
               2.0*prop.memoryClockRate*(prop.memoryBusWidth/8)/1.0e6);
    }
    return 0;
}

Compile and run with:

nvcc -o cudaDeviceInfo cudaDeviceInfo.cu
./cudaDeviceInfo

My EC2 instance was a g4dn.xlarge, so running the command above (correctly) showed a single T4 GPU:

Device Number: 0
  Device name: Tesla T4
  Memory Clock Rate (KHz): 5001000
  Memory Bus Width (bits): 256
  Peak Memory Bandwidth (GB/s): 320.064000

That’s it. Happy Hacking!