“Apache Arrow 1.0” Released, columnar format is now stable

July 24, the development team behind “Apache Arrow”, a cross-language development platform for in-memory analytics, announced its latest major release “Apache Arrow 1.0.0”.

Apache Arrow is a language-independent in-memory data processing platform for building high-performance applications that can process and transport large data sets. It provides a columnar memory format which includes a language-agnostic in-memory data structure specification and a protocol for serialization. It is also equipped with a set of libraries, which includes C/C++/C#, Go, Java, JavaScript. Python, MATLAB, R, Ruby, and Rust.

Despite the version “1.0”, Apache Arrow 1.0 is the 18th major release, and it marks a transition to binary stability of the columnar format. It is the first release with Semantic Versioning.

As to the columnar format, which guarantees backward compatibility as it became stable, its metadata version has been updated to V5. Also, dictionary indices now support not only signed integers but also unsigned integers.

A “Feature” enum has been added to use specific optional features in an IPC stream. There is an option added to the IPC format, which enables buffer compression using LZ4 or ZStandard.

There are many other feature improvements included in this release.

Apache Arrow
https://arrow.apache.org