Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. High Performance Computing Cluster Setup ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

High Performance Computing Cluster Setup: A Tutorial
Marius Hofert ORCID icon link to view author Marius Hofert details  

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1159
Pub. online: 26 November 2024      Type: Computing In Data Science      Open accessOpen Access

Received
26 May 2024
Accepted
30 October 2024
Published
26 November 2024

Abstract

When computations such as statistical simulations need to be carried out on a high performance computing (HPC) cluster, typical questions arise among researchers or practitioners. How do I interact with a HPC cluster? Do I need to type a long host name and also a password on every single login or file transfer? Why does my locally working code not run anymore on the HPC cluster? How can I install the latest versions of software on a HPC cluster to match my local setup? How can I submit a job and monitor its progress? This tutorial provides answers to such questions with experiments on an example HPC cluster.

Supplementary material

 Supplementary Material
The Supplementary Material contains detailed information on how R can be installed entirely in one’s home directory without the permission to write to system directories. We also provide more details about the options specified in our starter.sh Bash script.

References

 
Abadi M, et al. (2024). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. URL: tensorflow.org.
 
Adaptive Computing (2024). TORQUE Resource Manager. URL: adaptivecomputing.com/cherry-services/torque-resource-manager.
 
Anaconda Inc (2024). The Operating System for AI. URL: anaconda.com.
 
Conda (2024). Conda. URL: docs.conda.io/en/latest/.
 
Hofert M, Mächler M (2016a). Parallel and other simulations in R made easy: An end-to-end study. Journal of Statistical Software, 69(4). https://doi.org/10.18637/jss.v069.i04
 
Hofert M, Mächler M (2016b). simsalapar: Tools for Simulation Studies in Parallel with R. CRAN.R-project.org/package=simsalapar.
 
Hofert M, Schepsmeier U (2016). Guidelines for statistical projects: General aspects (part I). International Chinese Statistical Association Bulletin, 28(2): 110–116.
 
Hofert M, Schepsmeier U (2017a). Guidelines for statistical projects: Coding and typography (part II). International Chinese Statistical Association Bulletin, 29(1): 52–58.
 
Hofert M, Schepsmeier U (2017b). Guidelines for statistical projects: Coding and typography (part III). International Chinese Statistical Association Bulletin, 29(2): 113–122.
 
HPC (2024). SLURM Job Scheduler. URL: hpc.hku.hk/guide/slurm-guide.
 
HTCondor (2024). HTCondor Software Suite. URL: htcondor.org.
 
Slurm Workload Manager (2024). Documentation. URL: slurm.schedmd.com.
 
Thain D, Tannenbaum T, Livny M (2005). Distributed computing in practice: The Condor experience. Concurrency and Computation: Practice and Experience, 17(2–4): 323–356. https://doi.org/10.1002/cpe.938
 
The National Radio Astronomy Observatory (2024). Translating between Torque, Slurm, and HTCondor. URL: info.nrao.edu/computing/guide/cluster-processing/appendix/translating-between-torque-htcondor-and-slurm.
 
The PyTorch Foundation (2024). PyTorch. URL: pytorch.org.
 
The R Foundation (2024). The R Project for Statistical Computing. URL: r-project.org.
 
Wikipedia (2024a). Environment Modules (software). URL: en.wikipedia.org/wiki/Environment_Modules_(software).
 
Wikipedia (2024b). Minimal reproducible example. URL: en.wikipedia.org/wiki/Minimal_reproducible_example.
 
Wikipedia (2024c). RSA (cryptosystem). URL: en.wikipedia.org/wiki/RSA_(cryptosystem).
 
Wikipedia (2024d). Shell (computing). URL: en.wikipedia.org/wiki/Shell_(computing).
 
Wikipedia (2024e). Slurm Workload Manager. URL: en.wikipedia.org/wiki/Slurm_Workload_Manager.
 
Wikipedia (2024f). TOP500. URL: en.wikipedia.org/wiki/TOP500.
 
Yoo AB, Jette MA, Grondona M (2003). SLURM: Simple Linux Utility for Resource Management. Lecture Notes in Computer Science, vol. 2862.

PDF XML
PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
cluster computing connection setup Slurm Workload Manager software installation tips

Metrics
since February 2021
165

Article info
views

200

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy