Name : perl-IO-HTML
| |
Version : 1.001
| Vendor : openSUSE
|
Release : 1.1
| Date : 2021-02-02 11:11:43
|
Group : Development/Libraries/Perl
| Source RPM : perl-IO-HTML-1.001-1.1.src.rpm
|
Size : 0.04 MB
| |
Packager : https://bugs_opensuse_org
| |
Summary : Open an HTML file with automatic charset detection
|
Description :
IO::HTML provides an easy way to open a file containing HTML while automatically determining its encoding. It uses the HTML5 encoding sniffing algorithm specified in section 8.2.2.2 of the draft standard.
The algorithm as implemented here is:
* 1.
If the file begins with a byte order mark indicating UTF-16LE, UTF-16BE, or UTF-8, then that is the encoding.
* 2.
If the first 1024 bytes of the file contain a \'< meta>\' tag that indicates the charset, and Encode recognizes the specified charset name, then that is the encoding. (This portion of the algorithm is implemented by \'find_charset_in\'.)
The \'< meta>\' tag can be in one of two formats:
< meta charset=\"...\"> < meta http-equiv=\"Content-Type\" content=\"...charset=...\">
The search is case-insensitive, and the order of attributes within the tag is irrelevant. Any additional attributes of the tag are ignored. The first matching tag with a recognized encoding ends the search.
* 3.
If the first 1024 bytes of the file are valid UTF-8 (with at least 1 non-ASCII character), then the encoding is UTF-8.
* 4.
If all else fails, use the default character encoding. The HTML5 standard suggests the default encoding should be locale dependent, but currently it is always \'cp1252\' unless you set \'$IO::HTML::default_encoding\' to a different value. Note: \'sniff_encoding\' does not apply this step; only \'html_file\' does that.
|
RPM found in directory: /vol/rzm3/linux-opensuse/ports/armv7hl/distribution/leap/15.4/repo/oss/noarch |