Skip title Accessibility statement: we seek to make the HPC web pages accessible to all users. If you encounter accessibility issues with HPC web pages please send a description of the problem by email to eric_sills@ncsu.edu - thank you.
 
High Performance and Grid Computing
   
Skip menu side bar
Home
About

OpNews

Help/Accounts

Staff

Partners

User Projects


Services

Hardware

Software

Grid

Monitor


HowTo/FAQ

Docs & Pubs

Courses

Other Resources

 Services ...

    Overview

    NC State Office of Information Technology (OIT) offers a number of intermediate level HPC services to support research and instruction. These services are available to all NC State faculty.

      To access HPC services an NC State faculty member requests an HPC project. Once the project is established, normally within one business day of the request, the faculty member can add individual accounts for students or collaborators. The faculty member who requested the project is responsible for all resource use by that project.

    Distributed Memory Computing

    Distributed memory computing services are provided by a Linux cluster henry2. Access to distributed memory computing services is via ssh to login nodes attached to the Linux cluster login nodes from which jobs can be submitted to the resource management and queuing system.

    Resource intensive interactive access (eg application graphical user interfaces) is also available using an HPC image from the Virtual Computing Laboratory (VCL).

    Shared Memory Computing

    Shared memory computing services are provided by an IBM POWER5 system running AIX. Access to shared memory computing services is via ssh to a login node that is part of the POWER5 system. From the login node compute intensive shared memory jobs are submitted to the resource management and queuing system.

    Storage

    Three types of storage service are available for users of the distributed memory and shared memory compute services: 1) home directory storage is shared between both distributed memory and shared memory services and provides up to about a gigabyte of backed up storage; 2) independent scratch storage services, including parallel file systems, are available for distributed memory and shared memory jobs providing up to a few terabytes of volatile storage with no backups; and 3) a shared mass storage service provides several terabytes of backed up storage for important files.

    Applications

    A suite of applications is maintained for use on HPC compute services. These applications include Fortran and C/C++ compilers, code development tools, and libraries with which users can build their own custom applications.

    Consulting and Collaboration

    HPC computational science staff are available to assist with effective and efficient utilization of HPC compute services. This assistance may take the form of help with short, specific questions or issues using HPC services (consulting) or more general sustained assistance to a project or activity using HPC services (collaboration).




    Disaster Recovery and Business Continunity Plan

    Overview

    HPC hardware resources are distributed between two data centers on NC State campus. As described in more detail below, there is some balancing of resources between the two data centers. However, HPC users currently play a vital role in ensuring that their activities are resilient and can continue with minimal disruption in the case of any event affecting HPC services.

    Hardware is easily replaced. Software, data, and staff expertise are the components that are essential for continuity of work after an event.

    Distributed Memory Compute Resources

    HPC distributed memory compute resources share a common hardware architecture with the Virtual Computing Laboratory (VCL) that is based on IBM BladeCenter technology. VCL/HPC BladeCenter hardware is distributed between NC State data center 1 (DC1) and data center 2 (DC2) with more resources in DC2 than in DC1.

    In the event of one of the data centers being unavailable, the BladeCenter resources in the available data center would provide a reduced level of service - requiring that applications using BladeCenter services be prioritized.

    Shared Memory Compute Resource

    HPC shared memory compute resource is the IBM POWER5 system which is located in DC1. Software currently running on the POWER5 system can be run on the BladeCenter hardware, albeit at substantially reduced performance.

    Therefore in the event that DC1 (or just the POWER5) were unavailable work from the POWER5 would be shifted to the BladeCenter. This work would have to be prioritized along with distributed memory work to determine which jobs would receive priority.

    HPC Storage

    1. Home directory storage is physically located on disks in DC2 and backed up to a tape library in DC2. An event affecting DC2 would affect both the primary and backup copy of HPC home directory storage.
    2. Scratch storage is physically located with the resource it is supporting - POWER5 scratch storage in DC1 and BladeCenter cluster scratch storage in DC2. Due to the nature of scratch storage it is not backed up and it should not contain any critical data that could not be regenerated.
    3. Mass storage is distribued between DC1 and DC2 utilizing the university storage management system (SMS) hardware for disk storage and the HPC tape library for tape storage.

      HPC uses 16 TB of SMS space, 8 TB is physically located in DC1 and 8 TB in DC2. All HPC SMS space is backed up to the HPC tape library in DC2. Therefore, no data would be at risk from an event affecting DC1, but an event affecting DC2 could affect all of the data physically residing on DC2 disks backed up to DC2 tape library.

    Bottom line on HPC storage is that there are scenarios that place the centrally managed data (both primary and backup copies) at risk. HPC group is seeking affordable ways to improve the resilience of centrally managed HPC data. However, for now users must take responsibility for maintaining a copy of critical source code and data in a location outside the centrally managed HPC storage.

    Applications

    HPC application software used on both the POWER5 system and BladeCenter cluster are stored on disks physically located in DC2 and backed up to tape library in DC2. These applications were delivered on media that is stored in HLB or were downloaded via the network. Therefore, in the event that the primary and backup copies of the applications were lost, they could be reloaded from physical media or downloaded again.

    License keys for the applications also reside in disks in DC2 and are backed up to the tape libary in DC2. If these copies of the license keys were to be lost, the keys were delivered via email which is stored on the campus email server(s) and could be restored from there.

    HPC Staff Resources

    There are three full time HPC staff members and there is only limited overlap between their responsibilities. Absence of a single HPC staff member would result in reduced level of service across a broad range of activities as another staff member would have to prioritize activities of two full time positions, one including unfamiliar responsibilities. Absence of two (or more) HPC staff members would leave substantial areas (or all) with no knowledgeable person to fulfill those responsibilities.

    HPC Business Continuity

    Data essential for HPC business continuity is maintained in a MySQL database operated by OIT Systems. This database is accessed using web interfaces that are implemented on web servers operated by both OIT Systems and HPC.

    Internal HPC operational documentation is maintained on a web server operated by HPC. This documentation is backed up to HPC SMS storage physically located in DC1 with backup in DC2.

Last modified: June 06 2008 12:40:31.
Copyright © 2003-2007 by NC State University and others, All Rights Reserved.
HPC & Grid (Version 1.4 / Site access count: 696743) - Site/Content Notice

Site contact: Eric Sills, E-mail: eric_sills at ncsu dot edu , Tel: 919-513-0324, Fax: 919-513-1893, HPC and Grid Operations, Information Technology Division, Box 7109, North Carolina State University, Raleigh, NC27695-7914, USA