Name : perl-Text-Fracture
| |
Version : 1.02
| Vendor : obs://build_opensuse_org/devel:languages:perl
|
Release : lp155.60.1
| Date : 2023-07-20 18:16:24
|
Group : Development/Libraries/Perl
| Source RPM : perl-Text-Fracture-1.02-lp155.60.1.src.rpm
|
Size : 0.03 MB
| |
Packager : https://www_suse_com/
| |
Summary : Text::Fracture Perl module
|
Description :
This module implements a text subdivision technique. It generates a list of logical fragments (paragraphs/chunks/snippets) from the input text;
The border of a logical fragement is primarily defined by blank lines. (e.g. \"\ \ \"). Add ing a few blank lines near the beginning of the input text is obviously likely to change end of the fragment where this change is in. Once the end of a previous fragment changes, one might expect that all subsequent fragments are likely to change place too. The chosen algorithm tries to prevent such effects to a large degree. Thus local text changes can be expected to only have a local effect on one or few fragments.
Further details how the algorithm works can be seen in the source. An description of an early implementation is given below.
A fragment will have up to \'max_lines\' newline characters (\"\ \") after applying the following rules:
* Carriage-return newline character combinations (\"\\r\ \", \"\ \\r\") or carriage-return characters (\"\\r\") are all counted if they were newline characters. (Motivation: make blank line recognition independent of file type.)
* A line longer than C< max_cpl> has its last non-alphanumeric character before the C< max_cpl> position handled as if it were a newline character. (Motivation: handle the absence of newline characters gracefully)
* [Outdated] In the absence of blank lines, the shortest logical text line between line number C< min_lines> and C< max_lines> is counted as if followed by a blank line. (Motivation: handle the absence of blank lines gracefully)
* [Outdated] The last C< readaheadsz> characters of a line may be repeated without increasing the logical length of the line. (Motivation: Make ascii-art rulers very likley to be come fragment ends.)
* [Outdated] Lines that only contain characters found in the last C< readaheadsz> of a fragment are considered part of the fragment. (Motivation: include all closing braces of a nested code block in the same fragment, up to but not including the next keyword.)
* If the previous line began with whitespace and this line does not, it is a candidate for a new fragment. (Motivation: End of indentation indcates something new.)
* If, after skipping any whitespace, the previous line began with a non-alphanumeric character, and this line begins (again after skipping whitespace) with an alphanumeric character, or vice versa, it is a candidate for a new fragment. \'$\' and \'_\' count as alphanumeric in this context. (Motivation: Comment characters \'%#//*\"\' and blockbuilding structures \'(){}[]\' are thus separated from keywords or names, which often introduce new logical blocks.)
This ruleset is intended to work equally well with source code, plain text, XML, HTML, postscript, or other textual file formats.
The return value of fract() is a reference to an array of arrays. Each of which has 4 numeric elements. These are:
* byte_offset of the first byte in the fragment.
* length of the fragment in bytes. Including trailing newline characters.
* offset of the fragement in lines.
* number_of_lines in the fragment.
The number_of_lines is normally equal to the number of newline characters in the fragment. In the last fragment, the number_of_lines may be one more than the number of newline characters if there is no trailing newline character. E.g. the fragments \"foo\ bar\ \" and \"foo\ bar\" are both reported as two lines long.
init() should be called before the first call to frac(). It need only be called again to change one of its parameters.
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-T/15.5/x86_64 |