Skip to content

Fix UnicodeDecodeError on Windows by standardizing to UTF-8 encoding#311

Open
dvmukul wants to merge 1 commit intoDavidyz:mainfrom
dvmukul:fix/missing-utf8-encodings
Open

Fix UnicodeDecodeError on Windows by standardizing to UTF-8 encoding#311
dvmukul wants to merge 1 commit intoDavidyz:mainfrom
dvmukul:fix/missing-utf8-encodings

Conversation

@dvmukul
Copy link
Copy Markdown

@dvmukul dvmukul commented Apr 7, 2026

This PR standardises Unicode encodings to UTF-8 across the codebase by adding explicit encoding='utf-8' to open() calls.

Rationale

On Windows, the default locale encoding (e.g., CP1252 or CP949) often differs from UTF-8. When vectorcode attempts to read code files or configuration files containing non-ASCII characters without an explicit encoding, it throws a UnicodeDecodeError.

This has been reported in Issue #307, specifically for the query command.

Changes

  • Updated src/vectorcode/subcommands/query/__init__.py to use UTF-8 when reading document source during query result building.
  • Updated src/vectorcode/subcommands/vectorise.py to use UTF-8 for include/exclude specifications.
  • Updated src/vectorcode/subcommands/init.py for git hook and configuration file operations.
  • Updated src/vectorcode/cli_utils.py for general configuration loading.
  • Updated src/vectorcode/mcp_main.py for MCP tool result processing.

These changes ensure that vectorcode remains robust and cross-platform compatible when handling modern UTF-8 codebase files.

Added explicit encoding='utf-8' to multiple open() calls to ensure
consistent behavior across different platforms, specifically fixing
UnicodeDecodeError on Windows when reading repository files or
configuration specs.

Fixes Issue Davidyz#307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant