Simultaneous and heterogeneous multithreading

From Wikipedia, the free encyclopedia

Simultaneous and heterogeneous multithreading (SHMT) is a software framework that takes advantage of heterogeneous computing systems that contain a mixture of central processing units (CPUs), graphics processing units (GPUs), and special purpose machine learning hardware, for example Tensor Processing Units (TPUs).[1][2]

Each component processes information differently. Often data has to move among processors, which can create bottlenecks, with one processor starving while waiting on another to finish.[1]

The system defines virtual processors and virtual operations (VOPs). VOPs decompose into one or more high-level operations (HLOPs). It then distributes the operations across the processors. The runtime system then dynamically maps virtual processors to physical processors, assessing resource availability in order to keep all the processors busy. The scheduler employs a light-weight, quality-aware work-stealing (QAWS) policy.[1]

Conventional runtimes use assign one processor (set) to each subtask, leaving other types of processors idle. In other words, the CPU(s) run (possibly in parallel), then when that subtask completes, the next subtask is handed to the GPU(s). When they finish the next subtask is handed to the TPU(s).[2]

Adding software pipelining allows the second subtask to run using partial results from the first subtask, which improves resource utilization.[2]

SHMT takes things a step further, identifying subtasks that can run independently of others to the appropriate processor type, allow even better parallelism. Some subtasks can be performed on multiple processor types. SHMT can divide a single subtask across such processor types. Thus the fundamental breakthrough is to keep more processors working simultaneously, reducing time and energy costs.[2]

Benchmark

See also

References

Related Articles

Wikiwand AI