Commit 171972c5 authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

doc: add hardware overview

parent 71b989c6
Loading
Loading
Loading
Loading
+3 −1
Original line number Diff line number Diff line
# GPU cluster

Documentation and scripts for the GPU cluster
 No newline at end of file
Documentation:

- [Hardware overview](./doc/hardware-overview.md)

doc/DSC08103.JPG

0 → 100644
+973 KiB
Loading image diff...

doc/DSC08103_popis.JPG

0 → 100644
+976 KiB
Loading image diff...
+46 −0
Original line number Diff line number Diff line
# Hardware overview

<img src="DSC08103_popis.JPG" title="GPU cluster" width=50%>

## Login node (gp3.fjfi.cvut.cz)

- CPU:
[Intel Core i9-9900KF](https://ark.intel.com/content/www/us/en/ark/products/190887/intel-core-i9-9900kf-processor-16m-cache-up-to-5-00-ghz.html)
  (8 cores @ 3.6-5.0 GHz, 16 MiB cache)
- RAM:
  2× 16 GiB DDR4 2666 MT/s
- GPU:
[Nvidia GeForce GTX 1080Ti](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series)
  (3584 cores @ 1.62 GHz, 11 GiB GDDR5X, compute capability 6.1)
- Local storage:
    - `/`: 120 GB SSD (KINGSTON SA400S37120G)
    - `/local/`: 4× 16 TB Seagate Exos X16 (RAID 0)

The `/local/` file system is __not backed up__ and since it is on RAID 0, even __a single drive failure would mean destruction of all data__.
Hence, users are advised not to keep valuable data here or make their own backups if needed.

The `/local/` storage is shared with compute nodes over network.

## Compute nodes (gp[11-14])

- CPU:
[Intel Core i7-9800X](https://ark.intel.com/content/www/us/en/ark/products/189122/intel-core-i7-9800x-x-series-processor-16-5m-cache-up-to-4-50-ghz.html)
  (8 cores @ 3.8-4.5 GHz, 16 MiB cache)
- RAM:
  1× 16 GiB DDR4 2666 MT/s CL16
- GPU:
[Nvidia GeForce RTX 2070 Super OC](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_20_series)
  (2560 cores @ 1.78 GHz, 8 GiB GDDR6, compute capability 7.5)
- Local storage:
    - `/`: 120 GB SSD (KINGSTON SA400S37120G)

## Network

All compute nodes together with the login node are connected to the 10 Gbit Ethernet switch ([TP-Link T1700X-16TS](https://www.tp-link.com/us/business-networking/smart-switch/t1700x-16ts/)).
The compute nodes are not accessible from the outside network, they must be accessed from the login node.
Internet access from the compute nodes is provided via [NAT](https://en.wikipedia.org/wiki/Network_address_translation) on the login node.

## Other nodes

Other nodes (gp{1,2,4,5,6}) are not connected to the 10 Gbit switch and cannot be used for distributed computations.
They also do not have common hardware specifications, see http://mmg.fjfi.cvut.cz/mmg/gpu for details.