Skip to content

[python] Fix pyarrow6 compatibility issue with path parsing and get_file_info#7292

Open
XiaoHongbo-Hope wants to merge 12 commits intoapache:masterfrom
XiaoHongbo-Hope:py36_path_fix_again
Open

[python] Fix pyarrow6 compatibility issue with path parsing and get_file_info#7292
XiaoHongbo-Hope wants to merge 12 commits intoapache:masterfrom
XiaoHongbo-Hope:py36_path_fix_again

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Feb 14, 2026

Purpose

  1. REST read and write fail with error:
    oss://bucket_name/xxxx/yyy/sample/bucket-0/test-0.parquet: When listing objects under key 'yyy/sample/bucket-0' in bucket 'xxxx': AWS Error [code 133]: The specified key does not exist.

PR #7180 addressed OSS + PyArrow 6 by passing key-only (no netloc), but PyArrow 6 still has a issue about parsing the first path segment as bucket.

Behavior difference:

  • PyArrow 6: treats path a/b/c as bucket=a, key=b/c
  • PyArrow 7+: uses bucket from connection and treats entire string as key a/b/c
  1. get_file_info() inconsistency:
    • PyArrow 7+: returns FileInfo with FileType.NotFound for non-existent paths
    • PyArrow 6: throws OSError with message AWS Error [code 133]: The specified key does not exist

This PR will fix the above issue.

Tests

file_io_test.test_exists
file_io_test.test_filesystem_path_conversion

API and Format

Documentation

Generative AI tooling

co-authord by Cursor 2.4.28

@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] Fix pyArrow 6 compatibility for S3/OSS FileIO [python] Fix pyArrow 6 compatibility with path and get_file_info Feb 14, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] Fix pyArrow 6 compatibility with path and get_file_info [python] Fix pyArrow6 compatibility issue with path and get_file_info Feb 14, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review February 14, 2026 14:03
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] Fix pyArrow6 compatibility issue with path and get_file_info [python] Fix pyarrow6 compatibility issue with path and get_file_info Feb 14, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] Fix pyarrow6 compatibility issue with path and get_file_info [python] Fix pyarrow6 compatibility issue with path parsing and get_file_info Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant