HPC Systems Admin
Company: TPA technologies
Location: Brookline
Posted on: June 17, 2022
|
|
Job Description:
Description: HPC Systems Administrator&... JOB DESCRIPTION
HPC Systems Administrator Skills HPC Manage multi-vendor
filesystems such as XFS and GPFS including upgrades, patching,
space management, GPFS cluster management, diagnostics Manage
workload schedulers torque and slurm including upgrades, patches,
diagnostics, user resource consumption and configuration Manage
Bright cluster management software including creating and
maintaining images, managing job schedules and upgrades/patching
Installation and configuration of hardware, operating systems, and
commercial software packages managing user accounts, tuning system
performance, installing system wide software and allocate mass
storage space Systems Administration Advanced RHEL systems
administration including hardware set up, upgrades, patching
Remediation of vulnerabilities Performance tuning and server
hardening Disk space management Diagnostics (slowness, nodes down,
etc) Software experience Torque Slurm Bright Moab Mathlab
Experience working with containers (docker, singularity, podman,
kubernetes) a plus Experience in working with Git and supporting
CI/CD pipelines a plus Job Responsibilities Installation,
configuration, fine-tuning, and troubleshooting multi-vendor Linux
HPC servers Building and deploying open source software and
software from vendors/partners Diagnosing and resolving system
operational problems quickly and effectively Verifying full
operation of systems including network, systems and storage
performance Configuration of the scheduling and queuing system
Troubleshoot and maintain Infiniband and ethernet networks
Understands, maintains, supports high performance parallel storage
system Assists users/research team running applications on the HPC
cluster Manage, maintain, monitor and control interactive and batch
processes (scheduled and unscheduled) Requirements Expert knowledge
of HPC server hardware including HP, Dell Expert knowledge of
CentOS and Red Hat Expert knowledge of related parallel distributed
file system like IBM GPFS Advanced knowledge of cluster storage
systems including Isilon Advanced knowledge of the Linux Operating
system such as: kernel compiles, boot up command line options,
selinux, rpm, yum Advanced level of proficiency with NIS, NFS,
autofs, TCP/IP, Linux network configuration, local storage,
lm_sensors, ipmi required Intermediate knowledge of HPC resource
Managers such as PBS, Torque and Moab Intermediate level of
knowledge with Bash, Perl, PHP, awk, sed, grep, HTML Intermediate
skill with scripting tools and leveraging solutions Ability to
provide day-to-day 24 x 7 and participate in on-call rotation Must
be able to lift and move 40lbs Hybrid preferred but open to
remote
Keywords: TPA technologies, Brookline , HPC Systems Admin, Other , Brookline, Massachusetts
Click
here to apply!
|