Commit 76a7d23b authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

update README and hardware overview

parent 9d950e72
Loading
Loading
Loading
Loading
+4 −3
Original line number Diff line number Diff line
# GPU cluster
# GPX cluster

Information about the private GPU cluster located in a former prison cell at Trojanova 13, Prague.
Information about the GPX cluster located in a former prison cell at Trojanova 13, Prague.

## Getting started

@@ -11,9 +11,10 @@ If not, you can contact the [administrator](https://mmg-gitlab.fjfi.cvut.cz/gitl
The cluster can be accessed via SSH on port 22.
For example:

    ssh <user>@gp3.fjfi.cvut.cz
    ssh <username>@<nodename>.fjfi.cvut.cz

User authentication is done by _public-key cryptography_ using the key established during account registration.
The available _node names_ are `gp1`, `gp2`, ..., `gp9`.

After a successful login, a short information message about the system will be displayed.
You can also follow the documentation below to find more details.
+124 −6
Original line number Diff line number Diff line
@@ -2,7 +2,41 @@

<img src="DSC08103_popis.JPG" title="GPU cluster" width=50%>

## Login node (gp3.fjfi.cvut.cz)
## Login nodes

This section lists all nodes that have a DNS record and are available via SSH from the public network.

The main login node is __gp3.fjfi.cvut.cz__, which has the largest disk array and is connected to the [compute nodes](#compute-nodes-gp11-14) via a fast 10 Gbit Ethernet network.

### gp1.fjfi.cvut.cz

- CPU:
[Intel Xeon E5-2630 v3](https://ark.intel.com/content/www/us/en/ark/products/83356/intel-xeon-processor-e5-2630-v3-20m-cache-2-40-ghz.html)
  (8 cores @ 2.4-3.2 GHz, 20 MiB cache)
- RAM:
  8× 16 GiB DDR4 ECC 2133 MT/s
- GPU:
[Nvidia Quadro P6000](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#Quadro_Pxxx_series)
  (3840 cores @ 1.5 GHz, 24 GiB GDDR5X, compute capability 6.1)
- Local storage:
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/home/`: 1 TB WD Caviar Black

### gp2.fjfi.cvut.cz

- CPU:
[AMD Ryzen 9 5950X](https://www.amd.com/en/products/cpu/amd-ryzen-9-5950x)
  (16 cores @ 3.4-4.9 GHz, 64 MiB cache)
- RAM:
  4× 32 GiB DDR4 3200 MT/s CL16
- GPU:
[Nvidia GeForce GTX 1060](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series)
  (1280 cores @ 1.77 GHz, 6 GiB GDDR5X, compute capability 6.1)
- Local storage:
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/mnt/gp2/`: 4× 6 TB disks (Seagate IronWolf, WD Gold, Seagate Exos) (RAID 0)

### gp3.fjfi.cvut.cz

- CPU:
[Intel Core i9-9900KF](https://ark.intel.com/content/www/us/en/ark/products/190887/intel-core-i9-9900kf-processor-16m-cache-up-to-5-00-ghz.html)
@@ -13,13 +47,92 @@
[Nvidia GeForce GTX 1080Ti](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series)
  (3584 cores @ 1.62 GHz, 11 GiB GDDR5X, compute capability 6.1)
- Local storage:
    - `/`: 120 GB SSD (KINGSTON SA400S37120G)
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/mnt/gp3/`: 4× 16 TB Seagate Exos X16 (RAID 0)

The `/mnt/gp3/` file system is __not backed up__ and since it is on RAID 0, even __a single drive failure would mean destruction of all data__.
Hence, users are advised not to keep valuable data here or make their own backups if needed.
### gp4.fjfi.cvut.cz

The `/mnt/gp3/` storage is shared with compute nodes over network.
- CPU:
[Intel Core i7-9800X](https://ark.intel.com/content/www/us/en/ark/products/189122/intel-core-i7-9800x-x-series-processor-16-5m-cache-up-to-4-50-ghz.html)
  (8 cores @ 3.8-4.5 GHz, 16 MiB cache)
- RAM:
  4× 16 GiB DDR4 3200 MT/s CL16
- GPU:
[Nvidia GeForce GTX 3060](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#RTX_30_series)
  (1792 cores @ 1.78 GHz, 12 GiB GDDR6, compute capability 8.6)
- Local storage:
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/home/`: 1 TB WD Re

### gp5.fjfi.cvut.cz

- CPU:
[Intel Xeon E5-2640 v4](https://ark.intel.com/content/www/us/en/ark/products/92984/intel-xeon-processor-e5-2640-v4-25m-cache-2-40-ghz.html)
  (10 cores @ 2.4-3.4 GHz, 25 MiB cache)
- RAM:
  4× 16 GiB DDR4 ECC 2133 MT/s
- GPU:
[Nvidia GeForce GTX 3060](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#RTX_30_series)
  (1792 cores @ 1.78 GHz, 12 GiB GDDR6, compute capability 8.6)
- Local storage:
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/home/`: 1 TB WD RED

### gp6.fjfi.cvut.cz

- CPU:
[Intel Core i9-9820X](https://ark.intel.com/content/www/us/en/ark/products/189121/intel-core-i9-9820x-x-series-processor-16-5m-cache-up-to-4-20-ghz.html)
  (10 cores @ 3.3-4.2 GHz, 16 MiB cache)
- RAM:
  4× 16 GiB DDR4 2666 MT/s
- GPU:
[Nvidia GeForce GTX 3060](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#RTX_30_series)
  (1792 cores @ 1.78 GHz, 12 GiB GDDR6, compute capability 8.6)
- Local storage:
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/home/`: 1 TB WD Red

### gp7.fjfi.cvut.cz

- CPU:
[Intel Core i9-9900KF](https://ark.intel.com/content/www/us/en/ark/products/190887/intel-core-i9-9900kf-processor-16m-cache-up-to-5-00-ghz.html)
  (8 cores @ 3.6-5.0 GHz, 16 MiB cache)
- RAM:
  3× 16 GiB DDR4 2666 MT/s CL16
  1× 8 GiB DDR4 2133 MT/s
- GPU:
[Nvidia GeForce GTX 3060](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#RTX_30_series)
  (1792 cores @ 1.78 GHz, 12 GiB GDDR6, compute capability 8.6)
- Local storage:
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/home/`: 2 TB WD Red

### gp8.fjfi.cvut.cz

- CPU:
[AMD Ryzen 9 5950X](https://www.amd.com/en/products/cpu/amd-ryzen-9-5950x)
  (16 cores @ 3.4-4.9 GHz, 64 MiB cache)
- RAM:
  4× 32 GiB DDR4 3200 MT/s CL16
- GPU:
[Radeon RX 570](https://www.amd.com/en/support/graphics/radeon-500-series/radeon-rx-500-series/radeon-rx-570)
  (2048 cores @ 1.17 GHz, 8 GiB GDDR5, architecture gfx803)
- Local storage:
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)
    - `/mnt/gp8/`: 2× 1 TB WD Green (RAID 0)

### gp9.fjfi.cvut.cz

- CPU:
[Intel Xeon E5-2640 v4](https://ark.intel.com/content/www/us/en/ark/products/92984/intel-xeon-processor-e5-2640-v4-25m-cache-2-40-ghz.html)
  (10 cores @ 2.4-3.4 GHz, 25 MiB cache)
- RAM:
  8× 16 GiB DDR4 ECC 2400 MT/s
- GPU:
[Nvidia Tesla P100](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#Tesla)
  (3584 cores @ 1.48 GHz, 16 GiB HBM2, compute capability 6.0)
- Local storage:
    - `/`: 4× 480 GB SSD (Intel SSDSC2BB48) (RAID0)

## Compute nodes (gp[11-14])

@@ -32,7 +145,7 @@ The `/mnt/gp3/` storage is shared with compute nodes over network.
[Nvidia GeForce RTX 2070 Super OC](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_20_series)
  (2560 cores @ 1.78 GHz, 8 GiB GDDR6, compute capability 7.5)
- Local storage:
    - `/`: 120 GB SSD (KINGSTON SA400S37120G)
    - `/`: 960 GB SSD (KINGSTON SA400S37960G)

## Network

@@ -40,6 +153,11 @@ All compute nodes together with the login node are connected to the 10 Gbit Ethe
The compute nodes are not accessible from the outside network, they must be accessed from the login node.
Internet access from the compute nodes is provided via [NAT](https://en.wikipedia.org/wiki/Network_address_translation) on the login node.

The `/mnt/gp3/` file system is __not backed up__ and since it is on RAID 0, even __a single drive failure would mean destruction of all data__.
Hence, users are advised not to keep valuable data here or make their own backups if needed.

The `/mnt/gp3/` storage is shared with compute nodes over network.

## Other nodes

Other nodes (gp{1,2,4,5,6}) are not connected to the 10 Gbit switch and cannot be used for distributed computations.