Name : perl-IO-HTML
| |
Version : 1.004
| Vendor : obs://build_opensuse_org/devel:languages:perl
|
Release : lp155.1.1
| Date : 2023-07-20 15:11:26
|
Group : Development/Libraries/Perl
| Source RPM : perl-IO-HTML-1.004-lp155.1.1.src.rpm
|
Size : 0.04 MB
| |
Packager : https://www_suse_com/
| |
Summary : Open an HTML file with automatic charset detection
|
Description :
IO::HTML provides an easy way to open a file containing HTML while automatically determining its encoding. It uses the HTML5 encoding sniffing algorithm specified in section 8.2.2.2 of the draft standard.
The algorithm as implemented here is:
* 1.
If the file begins with a byte order mark indicating UTF-16LE, UTF-16BE, or UTF-8, then that is the encoding.
* 2.
If the first \'$bytes_to_check\' bytes of the file contain a \'< meta>\' tag that indicates the charset, and Encode recognizes the specified charset name, then that is the encoding. (This portion of the algorithm is implemented by \'find_charset_in\'.)
The \'< meta>\' tag can be in one of two formats:
< meta charset=\"...\"> < meta http-equiv=\"Content-Type\" content=\"...charset=...\">
The search is case-insensitive, and the order of attributes within the tag is irrelevant. Any additional attributes of the tag are ignored. The first matching tag with a recognized encoding ends the search.
* 3.
If the first \'$bytes_to_check\' bytes of the file are valid UTF-8 (with at least 1 non-ASCII character), then the encoding is UTF-8.
* 4.
If all else fails, use the default character encoding. The HTML5 standard suggests the default encoding should be locale dependent, but currently it is always \'cp1252\' unless you set \'$IO::HTML::default_encoding\' to a different value. Note: \'sniff_encoding\' does not apply this step; only \'html_file\' does that.
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-I/15.5/noarch |