decide if we want to support Pandas StringArray and ArrowStringArray internally

in 0.35.1, I fixed the issues converting Pandas Series and DataFrames which use those types internally (by using `.to_numpy()` instead of `.values`). However, this often yields object arrays and prevents zero-copy for string types, which is clearly less than optimal. 

We might want to support using those types instead of only numpy arrays everywhere we use numpy arrays (Axis labels, group keys, Array .data and probably other places). This is a very large undertaking (especially regarding testing) for an undefined benefit (could be very large in some cases but will probably be very modest if not insignificant for our current models).

See https://dev.to/kaniel_outis/pandas-30s-pyarrow-string-revolution-a-deep-dive-into-memory-and-performance-357g for benchmarks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decide if we want to support Pandas StringArray and ArrowStringArray internally #1181

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

decide if we want to support Pandas StringArray and ArrowStringArray internally #1181

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions