BwHPC BPG Data Management - bwHPC Wiki BwHPC BPG Data Management - bwHPC Wiki

BwHPC BPG Data Management

From bwHPC Wiki
Jump to: navigation, search

1 Local File Systems

In addition to computing capacity the bwHPC clusters are equipped with parallel file systems. For local data management it is important to differentiate if data is frequently used and persistent or quick access during a job's lifetime is desicive.

For each registered user a $HOME directory is provided in the parallel file system. A regular backup secures user's files stored in this directory. But quick access from compute nodes is not possible. For data that is read or written during a job's lifetime additional storage without backup is temporarily placed at the disposal. Since implementation varies between the bwHPC clusters, please visit the sites of bwUniCluster or bwForCluster JUSTUS for details.

Directory Characteristics Kind of Data
$HOME with backup, limited, global file system software packages, configuration files, important results, ...
Workspaces, $WORK, ... quick access, limited, temporary, global file system input/output files
$TMPDIR,$TMP local file system, temporarily limited to batch job's lifetime intermediate results

As a matter of principle, following rule should be observed: Do not compute in $HOME!

Disk space is like all HPC resources limited. If disk space is not sufficient, external storage services like bwFileStorage can be used.

2 External Storage

2.1 bwFileStorage

Each user of bwHPC clusters can use the storage service bwFileStorage. Since authentication and authorization is implemented via bwIDM mechanisms, group memberships at bwHPC clusters and bwFileStorage are identical.

Basically, data transfer between bwFileStorage and bwHPC clusters can be realized by customary transfer tools like scp, sftp or rsync. At bwUniCluster bwFileStorage is prototypically mounted via dedicated hardware, so called data mover nodes. Data transfer is remotely executed from the login nodes via the user interface rdata.

bwFileStorage is not only a central storage between bwHPC systems but also between local workstations and bwHPC clusters. At workstations, bwFileStorage can be mounted via tools like sshfs or cifs (only on KIT workstations), so pre- and post-processing of computing data can be performed locally.

3 Data Transfer

Transfer of large files achieves higher throughput than transferring files of small size. It is recommended to collect files to a compressed archive file with tools like zip, tar, xz or others before transfer.

3.1 Transfer Tools

Type Software Remark Executable on Data transfer from/to
Local° bwUniCluster bwForCluster bwFileStorage www bwHPC cluster bwFileStorage
Command line tool scp Throughput < 150 MB/s (depending on cipher) + + + + + +
sftp + + + + + +
rsync + + + + + +
rdata Throughput to 350-400 MB/s + +
wget Download only + + + + + +
Client WinSCP based on SCP/SFTP, Windows only + + +
FileZilla based on SFTP + + +

° Depending on workstation's OS.

An extended list of tools you can find here.

3.2 Hosts

System Host
bwUniCluster uc1.scc.kit.edu
bwForCluster JUSTUS justus.uni-ulm.de
bwForCluster MLS&WISO Production bwfor.cluster.uni-mannheim.de
bwForCluster MLS&WISO Production bwforcluster.bwservices.uni-heidelberg.de
bwFileStorage bwfilestorage.lsdf.kit.edu
bwFileStorage (SSH) bwfilestorage-login.lsdf.kit.edu