Skip to content

decide if we want to support Pandas StringArray and ArrowStringArray internally #1181

@gdementen

Description

@gdementen

in 0.35.1, I fixed the issues converting Pandas Series and DataFrames which use those types internally (by using .to_numpy() instead of .values). However, this often yields object arrays and prevents zero-copy for string types, which is clearly less than optimal.

We might want to support using those types instead of only numpy arrays everywhere we use numpy arrays (Axis labels, group keys, Array .data and probably other places). This is a very large undertaking (especially regarding testing) for an undefined benefit (could be very large in some cases but will probably be very modest if not insignificant for our current models).

See https://dev.to/kaniel_outis/pandas-30s-pyarrow-string-revolution-a-deep-dive-into-memory-and-performance-357g for benchmarks.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions