Apache Arrow

Software framework From Wikipedia, the free encyclopedia

Apache Arrow is a language-agnostic software framework for developing data analytics applications that process columnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modern CPU and GPU hardware.[2][3][4][5][6] This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints of dynamic random-access memory.[7]

Initial releaseOctober 10, 2016; 9 years ago (2016-10-10)
Stable release
22.0.0[1] Edit this on Wikidata / 24 October 2025; 4 months ago (24 October 2025)
Written inC, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, Rust
Quick facts Developer, Initial release ...
Apache Arrow
DeveloperApache Software Foundation
Initial releaseOctober 10, 2016; 9 years ago (2016-10-10)
Stable release
22.0.0[1] Edit this on Wikidata / 24 October 2025; 4 months ago (24 October 2025)
Written inC, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, Rust
TypeData format, algorithms
LicenseApache License 2.0
Websitearrow.apache.org
Repositorygithub.com/apache/arrow
Close

Interoperability

Arrow can be used with Apache Parquet, Apache Spark, NumPy, PySpark, pandas and other data processing libraries. The project includes native software libraries written in C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python (PyArrow[8]), R, Ruby, and Rust. Arrow allows for zero-copy reads and fast data access and interchange without serialization overhead between these languages and systems.[2]

Applications

Arrow has been used in diverse domains, including analytics,[9] genomics,[10][7] and cloud computing.[11]

Comparison to Apache Parquet and ORC

Apache Parquet and Apache ORC are popular examples of on-disk columnar data formats. Arrow is designed as a complement to these formats for processing data in-memory.[12] The hardware resource engineering trade-offs for in-memory processing vary from those associated with on-disk storage.[13] The Arrow and Parquet projects include libraries that allow for reading and writing data between the two formats.[14]

Governance

Apache Arrow was announced by The Apache Software Foundation on February 17, 2016,[15] with development led by a coalition of developers from other open source data analytics projects.[16][17][6][18][19] The initial codebase and Java library was seeded by code from Apache Drill.[15]

References

Related Articles

Wikiwand AI