Skip to content

Bump chardet from 5.2.0 to 7.4.2 in /docker/google-vision-api#43892

Open
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/docker/google-vision-api/chardet-7.4.2
Open

Bump chardet from 5.2.0 to 7.4.2 in /docker/google-vision-api#43892
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/docker/google-vision-api/chardet-7.4.2

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Apr 13, 2026

Bumps chardet from 5.2.0 to 7.4.2.

Release notes

Sourced from chardet's releases.

7.4.2

Patch release: fixes a crash on short inputs and closes a bunch of WHATWG/IANA alias gaps.

Bug Fixes

  • Fixed RuntimeError: pipeline must always return at least one result on ~2% of all possible two-byte inputs (e.g. b"\xf9\x92"). Multi-byte encodings like CP932 and Johab could score above the structural confidence threshold on very short inputs, but then statistical scoring would return nothing, leaving an empty result list instead of falling through to the fallback. (#367, #368, thanks @​jasonwbarnett)

Improvements

  • Added ~90 encoding aliases from the WHATWG Encoding Standard and IANA Character Sets registry so that <meta charset> labels like x-cp1252, x-sjis, dos-874, csUTF8, and the cswindows* family all resolve correctly through the markup detection stage. Every alias was driven by a failing spec-compliance test, not speculative. (#366)
  • Added a spec-compliance test suite covering Python decode round-trips for all 86 registry encodings, WHATWG label resolution, IANA preferred MIME names, and Unicode/RFC conformance (BOM sniffing, UTF-8 boundary cases, UTF-16 surrogate pairs). This is the test suite that would have caught the 7.4.1 BOM bug before release. (#366)

Full Changelog: chardet/chardet@7.4.1...7.4.2

7.4.1

Bug Fixes

  • BOM-prefixed UTF-16/32 input now returns utf-16/utf-32 instead of utf-16-le/utf-16-be/utf-32-le/utf-32-be. The endian-specific codecs don't strip the BOM on decode, so callers were getting a stray U+FEFF at the start of their text. BOM-less detection is unchanged. (#364, #365)

Full Changelog: chardet/chardet@7.4.0...7.4.1

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

  • New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

  • Eliminated train/test data overlap via content fingerprinting
  • Added MADLAD-400 and Wikipedia as supplemental training sources
  • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF
  • Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)
  • Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

chardet 7.4.0 (mypyc) chardet 6.0.0 charset-normalizer 3.4.6
Accuracy (2,517 files) 99.3% 88.2% 85.4%
Speed 551 files/s 12 files/s 376 files/s
Language detection 95.7% 40.0% 59.2%

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.3.0

... (truncated)

Changelog

Sourced from chardet's changelog.

7.4.2 (2026-04-12)

Bug Fixes:

  • Fixed RuntimeError: pipeline must always return at least one result on ~2% of all possible two-byte inputs (e.g. b"\xf9\x92"). Multi-byte encodings like CP932 and Johab could score above the structural confidence threshold on very short inputs, but then statistical scoring would return nothing, leaving the pipeline with an empty result list instead of falling through to the no_match_encoding fallback. (Jason Barnett <https://github.com/jasonwbarnett>_ via Claude, [#367](https://github.com/chardet/chardet/issues/367) <https://github.com/chardet/chardet/issues/367>, [#368](https://github.com/chardet/chardet/issues/368) <https://github.com/chardet/chardet/pull/368>)

Improvements:

  • Added ~90 encoding aliases from the WHATWG Encoding Standard and IANA Character Sets registry so that <meta charset> labels like x-cp1252, x-sjis, dos-874, csUTF8, and the cswindows* family all resolve correctly through the markup detection stage. Every alias was driven by a failing spec-compliance test. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#366](https://github.com/chardet/chardet/issues/366) <https://github.com/chardet/chardet/pull/366>_)
  • Added a spec-compliance test suite covering Python decode round-trips for all 86 registry encodings, WHATWG web-platform label resolution, IANA preferred MIME names, and Unicode/RFC conformance (BOM sniffing, UTF-8 boundary cases, UTF-16 surrogate pairs). This is the test suite that would have caught the 7.4.1 BOM bug before release. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#366](https://github.com/chardet/chardet/issues/366) <https://github.com/chardet/chardet/pull/366>_)

7.4.1 (2026-04-07)

Bug Fixes:

  • BOM-prefixed UTF-16 and UTF-32 input now reports utf-16 and utf-32 instead of the endian-specific variants. Python's utf-16-le/utf-16-be/utf-32-le/utf-32-be codecs keep the BOM as a U+FEFF in the decoded string, while utf-16/utf-32 strip it, so callers passing the detection result directly to .decode() were getting a stray BOM at the start of their text. BOM-less UTF-16/32 detection (via null-byte patterns) is unchanged and still returns the endian-specific name. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#364](https://github.com/chardet/chardet/issues/364) <https://github.com/chardet/chardet/issues/364>, [#365](https://github.com/chardet/chardet/issues/365) <https://github.com/chardet/chardet/pull/365>)

... (truncated)

Commits
  • 3cc0960 docs: changelog for 7.4.2
  • 9079efc Fix RuntimeError on ~2% of two-byte inputs (#368)
  • ea7e547 Add spec-compliance test suite, close WHATWG/IANA alias gaps (#366)
  • d9ae78d docs: changelog for 7.4.1
  • 2a54c68 Return utf-16/utf-32 (not -le/-be) when a BOM is present (#365)
  • c63c632 Address GitHub code quality findings and add missing test coverage
  • 1ad8e6a Revert "Add PyInstaller hook to collect mypyc shared runtime library (#359)" ...
  • 7fb0563 Add PyInstaller hook to collect mypyc shared runtime library (#359)
  • 2d75e6d Link to blogpost in README
  • e37cf3c fix: prevent dirty-tree version in Windows mypyc wheel builds
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.4.2.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.4.2)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Apr 13, 2026
@xsoar-bot
Copy link
Copy Markdown

Docker Image Ready - Dev

Docker automatic build has deployed your docker image: devdemisto/google-vision-api:1.0.0.8223674
It is available now on docker hub at: https://hub.docker.com/r/devdemisto/google-vision-api/tags
Get started by pulling the image:

docker pull devdemisto/google-vision-api:1.0.0.8223674

Docker Metadata

  • Image Size: 116.70 MB
  • Image ID: sha256:b1e8fbe41ba9887b5e6fed583a90d43084b2b5fec7e9299f900e714a9c78f95e
  • Created: 2026-04-13T15:38:07.937155915Z
  • Arch: linux/amd64
  • Command: ["python3"]
  • Environment:
    • PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    • LANG=C.UTF-8
    • GPG_KEY=7169605F62C751356D054A26A821E680E5FA6305
    • PYTHON_VERSION=3.12.12
    • PYTHON_SHA256=fb85a13414b028c49ba18bbd523c2d055a30b56b18b92ce454ea2c51edc656c4
    • DOCKER_IMAGE=devdemisto/google-vision-api:1.0.0.8223674
  • Labels:
    • org.opencontainers.image.authors:Demisto <containers@demisto.com>
    • org.opencontainers.image.revision:2c43e5ec4351ee7befc12f4bd91fdd48516c7b3b
    • org.opencontainers.image.version:1.0.0.8223674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant