Who Can Benefit
Students who can benefit from this course are those responsible for installing, configuring, using, and troubleshooting Sun Grid Engine.
Prerequisites
Skills Gained
- Perform installation tasks and run basic user commands, including: describing the features and architecture of the Grid Engine software; describing distributed resource management (DRM); performing a Network File System (NFS) installation of Grid Engine; and submitting batch and interactive jobs to the grid
- Configure Grid Engine, including performing basic queue, host, and cluster configuration; administering complexes; and integrating applications in the grid
- Perform advanced tasks, including administering and troubleshooting Grid Engine and configuring scheduling policies
Course Content
Module 1 - Introducing the Grid Software
- Define grid computing and DRM
- Describe the types of grids
- Define the architecture of Sun Grid Engine
- Describe jobs, queues, user types, and host types
- Schedule jobs in the grid
- Describe the flow of information in Sun Grid Engine
- Define High Performance Computing (HPC) environments
- Describe the Grid Engine project
Module 2 - Installing the Grid Software
- Describe the various types of Grid Engine installations
- Describe the various spooling types
- Describe the default scheduler profiles
- Perform an NFS installation of Grid Engine
- Describe the contents of the primary Grid Engine directories
Module 3 - Submitting Jobs to the Grid
- Describe Grid Engine commands and job types
- Submit batch jobs using the qsub command
- Submit interactive jobs using the qsh, qrsh, qtcsh, and qlogin commands
- Obtain status information for submitted jobs
- Administer submitted jobs
Module 4 - Modifying Configuration Parameters
- Describe the types of Grid Engine parameters you can configure
- Configure the Grid Engine cluster parameters
- Configure the Grid Engine host parameters
- Configure the Grid Engine queue parameters
- Configure the Grid Engine scheduler
- Configure Grid Engine users
Module 5 - Configuring Resource Management and Load Parameters
- Describe items that affect resource management: job requirements; resources; global, queue, host, and user-defined resource attributes; and inheritance rules
- Administer the system complex list, including global, host, user-defined, and queue-related resource attributes
- Configure the default load parameters and define custom load sensors
Module 6 - Controlling the Event Chain and Integrating Applications
- Define the Grid Engine event chain and application integration
- Describe the execution methods
- Integrate custom applications into Grid Engine
- Integrate HPC environments
Module 7 - Administering and Troubleshooting Sun Grid Engine
- Perform routine administration of Grid Engine queues
- Examine Grid Engine log files to troubleshoot failures
- Use command debugging to resolve failed job submissions
- Obtain and apply patches for Grid Engine
- Back up the grid engine system configuration
- Diagnose problems in Grid Engine
Module 8 - Resource Allocation and Scheduling Policies
- Describe resource allocation and scheduling
- Describe and configure Grid Engine scheduling parameters
- Describe and configure the share tree scheduling policy
- Describe and configure the functional scheduling policy
- Describe and configure the override scheduling policy
Module 9 - Usage Accounting and Reporting
- Understand the various methods of gathering accounting and reporting statistics
- Install and configure the Grid Engine reporting database module
- Install and use the Grid Engine web-based Accounting and Reporting Console (ARCo)




