RPM Search

Changelog for python38-charset-normalizer-2.1.1-1.1.noarch.rpm :

* Sat Sep 17 2022 Dirk Müller - update to 2.1.1:
* Function `normalize` scheduled for removal in 3.0
* Removed useless call to decode in fn is_unprintable (#206)
* Thu Aug 18 2022 Ben Greiner - Clean requirements: We don\'t need anything
* Tue Jul 19 2022 Dirk Müller - update to 2.1.0:
* Output the Unicode table version when running the CLI with `--version`
* Re-use decoded buffer for single byte character sets
* Fixing some performance bottlenecks
* Workaround potential bug in cpython with Zero Width No-Break Space located
* in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space
* CLI default threshold aligned with the API threshold from
* Support for Python 3.5 (PR #192)
* Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0
* Tue Feb 15 2022 Dirk Müller - update to 2.0.12:
* ASCII miss-detection on rare cases (PR #170)
* Explicit support for Python 3.11 (PR #164)
* The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels
* Mon Jan 10 2022 Dirk Müller - update to 2.0.10:
* Fallback match entries might lead to UnicodeDecodeError for large bytes sequence
* Skipping the language-detection (CD) on ASCII
* Mon Dec 06 2021 Dirk Müller - update to 2.0.9:
* Moderating the logging impact (since 2.0.8) for specific environments
* Wrong logging level applied when setting kwarg `explain` to True
* Mon Nov 29 2021 Dirk Müller - update to 2.0.8:
* Improvement over Vietnamese detection
* MD improvement on trailing data and long foreign (non-pure latin)
* Efficiency improvements in cd/alphabet_languages
* call sum() without an intermediary list following PEP 289 recommendations
* Code style as refactored by Sourcery-AI
* Minor adjustment on the MD around european words
* Remove and replace SRTs from assets / tests
* Initialize the library logger with a `NullHandler` by default
* Setting kwarg `explain` to True will add provisionally
* Fix large (misleading) sequence giving UnicodeDecodeError
* Avoid using too insignificant chunk
* Add and expose function `set_logging_handler` to configure a specific StreamHandler
* Fri Nov 26 2021 Dirk Müller - require lower-case name instead of breaking build
* Thu Nov 25 2021 Matej Cepl - Use lower-case name of prettytable package
* Sun Oct 17 2021 Martin Hauke - Update to version 2.0.7
* Addition: bento Add support for Kazakh (Cyrillic) language detection
* Improvement: sparkle Further improve inferring the language from a given code page (single-byte).
* Removed: fire Remove redundant logging entry about detected language(s).
* Improvement: zap Refactoring for potential performance improvements in loops.
* Improvement: sparkles Various detection improvement (MD+CD).
* Bugfix: bug Fix a minor inconsistency between Python 3.5 and other versions regarding language detection.- Update to version 2.0.6
* Bugfix: bug Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x.
* Bugfix: bug Fix CLI crash when using --minimal output in certain cases.
* Improvement: sparkles Minor improvement to the detection efficiency (less than 1%).- Update to version 2.0.5
* Improvement: sparkles The BC-support with v1.x was improved, the old staticmethods are restored.
* Remove: fire The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead.
* Improvement: sparkles The Unicode detection is slightly improved, see #93
* Bugfix: bug In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection.
* Bugfix: bug Some rare \'space\' characters could trip up the UnprintablePlugin/Mess detection.
* Improvement: art Add syntax sugar __bool__ for results CharsetMatches list-container.- Update to version 2.0.4
* Improvement: sparkle Adjust the MD to lower the sensitivity, thus improving the global detection reliability.
* Improvement: sparkle Allow fallback on specified encoding if any.
* Bugfix: bug The CLI no longer raise an unexpected exception when no encoding has been found.
* Bugfix: bug Fix accessing the \'alphabets\' property when the payload contains surrogate characters.
* Bugfix: bug pencil2 The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (in #72)
* Bugfix: bug Submatch factoring could be wrong in rare edge cases (in #72)
* Bugfix: bug Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (in #72)
* Internal: art Fix line endings from CRLF to LF for certain files.- Update to version 2.0.3
* Improvement: sparkles Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. #63 Fix #62
* Improvement: sparklesAccording to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case.- Update to version 2.0.2
* Bugfix: bug Empty/Too small JSON payload miss-detection fixed.
* Improvement: sparkler Don\'t inject unicodedata2 into sys.modules- Update to version 2.0.1
* Bugfix: bug Make it work where there isn\'t a filesystem available, dropping assets frequencies.json.
* Improvement: sparkles You may now use aliases in cp_isolation and cp_exclusion arguments.
* Bugfix: bug Using explain=False permanently disable the verbose output in the current runtime #47
* Bugfix: bug One log entry (language target preemptive) was not show in logs when using explain=True #47
* Bugfix: bug Fix undesired exception (ValueError) on getitem of instance CharsetMatches #52
* Improvement: wrench Public function normalize default args values were not aligned with from_bytes #53- Update to version 2.0.0
* Performance: zap 4x to 5 times faster than the previous 1.4.0 release.
* Performance: zap At least 2x faster than Chardet.
* Performance: zap Accent has been made on UTF-8 detection, should perform rather instantaneous.
* Improvement: back The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
* Improvement: sparkle The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
* Code: art The program has been rewritten to ease the readability and maintainability. (+Using static typing)
* Tests: heavy_check_mark New workflows are now in place to verify the following aspects: Performance, Backward- Compatibility with Chardet, and Detection Coverage in addition# to currents tests. (+CodeQL)
* Dependency: heavy_minus_sign This package no longer require anything when used with Python 3.5 (Dropped cached_property)
* Docs: pencil2 Performance claims have been updated, the guide to contributing, and the issue template.
* Improvement: sparkle Add --version argument to CLI
* Bugfix: bug The CLI output used the relative path of the file(s). Should be absolute.
* Deprecation: red_circle Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
* Improvement: sparkle If no language was detected in content, trying to infer it using the encoding name/alphabets used.
* Removal: fire Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
* Improvement: sparkle utf_7 detection has been reinstated.
* Removal: fire The exception hook on UnicodeDecodeError has been removed.- Update to version 1.4.1
* Improvement: art Logger configuration/usage no longer conflict with others #44- Update to version 1.4.0
* Dependency: heavy_minus_sign Using standard logging instead of using the package loguru.
* Dependency: heavy_minus_sign Dropping nose test framework in favor of the maintained pytest.
* Dependency: heavy_minus_sign Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
* Dependency: wrench heavy_minus_sign Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
* Bugfix: bug BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
* Improvement: sparkler Return ASCII if given sequences fit.
* Performance: zap Huge improvement over the larges payload.
* Change: fire Stop support for UTF-7 that does not contain a SIG. (Contributions are welcome to improve that point)
* Feature: sparkler CLI now produces JSON consumable output.
* Dependency: Dropping PrettyTable, replaced with pure JSON output.
* Bugfix: bug Not searching properly for the BOM when trying utf32/16 parent codec.
* Other: zap Improving the package final size by compressing frequencies.json.
* Thu May 20 2021 pgajdosAATTsuse.com- version update to 1.3.9
* Bugfix: bug In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload #40
* Bugfix: bug Empty given payload for detection may cause an exception if trying to access the alphabets property. #39
* Bugfix: bug The legacy detect function should return UTF-8-SIG if sig is present in the payload. #38
* Tue Feb 09 2021 John Vandenberg - Switch to PyPI source- Add Suggests: python-unicodedata2- Remove executable bit from charset_normalizer/assets/frequencies.json- Update to v1.3.6
* Allow prettytable 2.0- from v1.3.5
* Dependencies refactor and add support for py 3.9 and 3.10
* Fix version parsing
* Mon May 25 2020 Petr Gajdos - %python3_only -> %python_alternative
* Mon Jan 27 2020 Marketa Calabkova - Update to 1.3.4
* Improvement/Bugfix : False positive when searching for successive upper, lower char. (ProbeChaos)
* Improvement : Noticeable better detection for jp
* Bugfix : Passing zero-length bytes to from_bytes
* Improvement : Expose version in package
* Bugfix : Division by zero
* Improvement : Prefers unicode (utf-8) when detected
* Apparently dropped Python2 silently
* Fri Oct 04 2019 Marketa Calabkova - Update to 1.3.0
* Backport unicodedata for v12 impl into python if available
* Add aliases to CharsetNormalizerMatches class
* Add feature preemptive behaviour, looking for encoding declaration
* Add method to determine if specific encoding is multi byte
* Add has_submatch property on a match
* Add percent_chaos and percent_coherence
* Coherence ratio based on mean instead of sum of best results
* Using loguru for trace/debug <3
* from_byte method improved
* Thu Sep 26 2019 Tomáš Chvátal - Update to 1.1.1:
* from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content
* Sequence having lenght bellow 10 chars was not checked
* Legacy detect method inspired by chardet was not returning
* Various more test updates
* Fri Sep 13 2019 Tomáš Chvátal - Update to 0.3:
* Improvement on detection
* Performance loss to expect
* Added --threshold option to CLI
* Bugfix on UTF 7 support
* Legacy detect(byte_str) method
* BOM support (Unicode mostly)
* Chaos prober improved on small text
* Language detection has been reviewed to give better result
* Bugfix on jp detection, every jp text was considered chaotic
* Fri Aug 30 2019 Tomáš Chvátal - Fix the tarball to really be the one published by upstream
* Wed Aug 28 2019 John Vandenberg - Initial spec for v0.1.8