Draft:Apache TVM
From Wikipedia, the free encyclopedia
| Draft article not currently submitted for review.
This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit the draft click on the "Edit" tab at the top of the window. To be accepted, a draft should:
It is strongly discouraged to write about either yourself or your business or employer. If you do so, you must declare it. Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Last edited by Yhjc2692 (talk | contribs) 2 months ago. (Update) |
Apache TVM
| Apache TVM | |
|---|---|
| Original author |
|
| Developer | Apache Software Foundation |
| Operating system | |
| Platform | x86-64, ARM64, Vulkan, CUDA, Metal |
| Available in | English |
| Website | tvm |
| Repository | github |
Apache TVM is an open-source end-to-end machine learning compiler framework. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend, including central processing units (CPUs), graphics processing units (GPUs), and machine learning accelerators. TVM effectively bridges the gap between deep learning frameworks (such a
s PyTorch, TensorFlow, and MXNet) and the diverse hardware backends available for deployment.
History
TVM began as a research project at the Paul G. Allen School of Computer Science & Engineering at the University of Washington within the SAMPL group. It was initially proposed by Tianqi Chen and collaborators to address the growing fragmentation between high-level machine learning frameworks and the exploding variety of hardware targets.
- 2017: TVM is released as an open-source project.
- 2019: The project enters the Apache Incubator.
- 2020: The Apache Software Foundation announces TVM
Architecture and Design
The TVM stack is designed to provide high performance and portability for deep learning inference workloads. It decomposes the compilation process into several layers, using various Intermediate Representations (IR) to optimize models at different levels of abstraction.
1. Frontends (Import)
TVM supports importing trained models from most major deep learning frameworks, including:
- PyTorch
- TensorFlow / Keras
- ONNX (Open Neural Network Exchange)
- MXNet
- PadlePaddle
2. Relax (High-Level IR)
Relax is the primary high-level Intermediate Representation. Relax addresses the key limitations of the older, now deprecated Relay IR in handling dynamic workloads by introducing:
- First-Class Symbolic Shapes:
- Cross-Level Interaction:
- Dataflow and Control Flow:
3. TensorIR ()
TIR is the low-level IR where the specific implementation of operators (like matrix multiplication or attention mechanisms) is defined. In the Unity architecture, Relax functions often call into TensorIR functions. Optimizations at this level focus on:
- Loop Transformations: Tiling, vectorization, and unrolling
- Memory Management: Optimizing buffer allocation and memory scope (e.g., global vs. shared memory on GPUs)
- Hardware Intrinsics: Mapping operations to specific hardware accelerators (e.g., Tensor Cores).
4.
