|  
|
  |
|
Getting Started with HPC storage systems ...
|
HPC users have a number of file systems available for their
use. Effective use of the HPC resources requires some
understanding of the types of available file systems and
their intended use.
The general types of storage available are:
- home directory
- local scratch space
- shared scratch space
- mass storage space
The following sections will describe each of these in some detail
including the intended use of these storage resources.
Home Directory
Each user has a home directory. The user's home directory on
the Linux clusters (henry2 and tim) is identical. User's home
directory on the POWER5 system is the p5 subdirectory of their
cluster home directory. That is if
a user has a home directory /home/desmith on the cluster
their p5 home directory will be /home/desmith/p5. Quota for
home directory space includes both cluster and power5 use.
Total available space in the home file system is relatively small
and quotas are used to manage the available space. Home directories
are intended to be used to hold commonly used scripts,
environment configuration files, and modest size source trees.
Home directories are backed up daily. Only one copy of
each file is retained in the backup. Files which have been
deleted for more than 7 days are subject to being deleted from
the backup.
Scratch Space
Scratch space is intended to be used for the storage
requirements for running jobs. In particular, large input or
output files should use scratch space during job execution.
Scratch file systems are world writable. Users should create
a directory for their use to avoid potential file name conflicts
with other users.
Scratch space is not backed up.
Local Scratch Space
Local scratch space is directly connected to the compute node.
On the Linux clusters
the local scratch file system available to users is
/scratch. Local scratch file system contents
are only available to the local node to which the
file system is directly connected. Use of the local
scratch space must be managed from the user's LSF
script (since there is no way to know ahead of time
which nodes a job may be assigned) both movement of files
to the space and removal of files after execution
completes. Local scratch space on the cluster is
subject to immediate removal of files at the completion
of the LSF job.
No local scratch space is available to user jobs on
the power5 system.
Shared Scratch Space
In addition to local scratch space the Linux
clusters also have shared scratch space. These
file systems are available via NFS from the login
nodes and from all of the compute nodes (across both
clusters). /share and /share3
are currently available to any user needing
scratch space.
Shared scratch file systems are subject to
periodic purge and are not backed up.
Any file in shared scratch space is subject
to removal at any time. A purge is used to
maintain free space in the file system. While
the purge generally allows files to remain on
the shared scratch file systems for a week or
more, during periods of high disk use this may
not be true and files that are only a day or
two old may also be removed by the purge.
As with local scratch space this storage is
intended to provide large storage space required
by jobs during execution.
On the power5 system the /scratch file system
is available from all nodes. This file system
uses IBM's general parallel file system (gpfs).
The storage is connected by fibre channel to
each node.
A GPFS file system is also available on the
Linux cluster (henry2). This file system /gpfs_share
has a 1TB per project quota. Codes spending significant
amounts of time doing parallel I/O - and any code
using MPI-IO - should use /gpfs_share. Projects
requiring more than 1TB of run-time storage may
purchase additional GPFS or NFS storage.
Mass Storage System
Mass storage space is intended to hold important
files that are too large to be stored in users'
home directories. Users requiring mass storage
space should request that a mass storage directory
be created for their use.
It is anticipated that research groups will have
up to a 100GB group quota for mass storage space
with options to purchase additional quota if
required.
Mass storage space is available from all systems
intended to support interactive logins. A disk backup
of mass storage space is maintained as described
below.
Mass storage space is not available from
compute nodes and can not be used as an alternative
to scratch space for running jobs.
- Configuration
There are currently two mass storage file systems,
/ncsu/volume1 and /ncsu/volume2. Users will only
be provided a directory on one of these file systems.
Each file system has 8TB of disk space.
The disks for the mass storage file systems are part
of the university Dell/EMC storage management
system that is also used for AFS, IMAP, NDS,
and other university storage needs.
Separate file servers are used for /ncsu/volume1
and /ncsu/volume2. Volume1 is served by an IBM
JS20 running AIX. Voluem2 is served by a Sun 280R
running Solaris and using Veritas volume manager
and Veritas file sytem. Both file systems are availabe
from login nodes via NFS.
Backups
Backup frequency for the HPC storage system is
daily from the /home, /ncsu/volume1, and /ncsu/volume2
file systems to a tape library. One copy of
each file is maintained in the tape library.
When a file is modified on disk the new version
of the file replaces any previous backup of that
file.
Files removed from /home, /ncsu/volume1, or
/ncsu/volume2 file system will remain in the
backup for at least one week.
A consequence of the backup policy is that
files that are updated with the same name
will overwrite the backup version during the
daily update. Files that are being modified
for which previous versions may be needed
should be modified using a file naming scheme
to retain previous versions with unique file
names.
HSM
An additional level of management is utilized
on /ncsu/volume1. Tivoli Space Manager is used
to migrate older, larger files from the file
system disk to tape. Migrated files are retrieved
automatically if they are accessed.
Space manager seeks to maintain the disk usage
level for /ncsu/volume1 between 85% and 90%.
|
|
Last modified: February 01 2007 18:36:45.
Copyright © 2003-2007 by
NC State University and
others, All Rights Reserved.
HPC & Grid (Version
1.4
/
Site access count: 732808)
- Site/Content Notice
Site contact: Eric Sills, E-mail:
eric_sills at ncsu dot edu , Tel: 919-513-0324, Fax: 919-513-1893,
HPC and Grid Operations, Information Technology Division,
Box 7109, North Carolina State University, Raleigh,
NC27695-7914, USA
|
|