Name : python3-html-text
| |
Version : 0.6.2
| Vendor : Fedora Project
|
Release : 1.fc40
| Date : 2024-10-25 05:45:29
|
Group : Unspecified
| Source RPM : python-html-text-0.6.2-1.fc40.src.rpm
|
Size : 0.03 MB
| |
Packager : Fedora Project
| |
Summary : Extract text from HTML
|
Description :
How is html_text different from .xpath(\'//text()\') from LXML or .get_text() from Beautiful Soup?
- Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users;
- html_text normalizes whitespace, but in a way smarter than .xpath(\'normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation;
- html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers.
|
RPM found in directory: /vol/rzm3/linux-fedora-buffet/fedora-secondary/updates/40/Everything/ppc64le/Packages/p |
Hmm ... It's impossible ;-) This RPM doesn't exist on any FTP server
Provides :
python-html-text
python3-html-text
python3.12-html-text
python3.12dist(html-text)
python3dist(html-text)
Requires :