Name : perl-Hadoop-Streaming
| |
Version : 0.143060
| Vendor : obs://build_opensuse_org/devel:languages:perl
|
Release : lp154.1.1
| Date : 2023-01-27 16:56:22
|
Group : Development/Libraries/Perl
| Source RPM : perl-Hadoop-Streaming-0.143060-lp154.1.1.src.rpm
|
Size : 0.07 MB
| |
Packager : https://www_suse_com/
| |
Summary : Contains Mapper, Combiner and Reducer roles to simplify writing Hadoop S[cut]
|
Description :
Hadoop::Streaming::* provides a simple perl interface to the Streaming interface of Hadoop.
Hadoop is a system \"reliable, scalable, distributed computing.\" Hadoop was developed at Yahoo! and is now maintained by the Apache Software Foundation.
Hadoop provides a distributed map/reduce framework. Mappers take lines of unstructured file data and produce key/value pairs. These key/value pairs are merged and sorted by key and provided to Reducers. Reducers take key/value pairs and produce higher order data. This works for data that where output key/value pairs can be determined from a single line of data in isolation. The Reducer is provided sho
* Hadoop\'s Streaming Interface
The Streaming interface provides a simple API for writing Hadoop jobs in any language. Jobs are provided input on STDIN and output is expected on STDOUT. Key value pairs are separated by a TAB character.
Streaming map jobs are provided an input of lines instead of key-value pairs. See Hadoop::Streaming::Mapper INTERFACE DETAILS for an explanation.
Reduce jobs are provided a stream of key\\tvalue lines. multivalued keys appear on an input line once for each key\\value. The stream is guaranteed to be sorted by key. The reduce job must track the key/value pairs and manually detect a key change.
* Hadoop::Streaming::Mapper interface
Hadoop::Mapper consumes and chomps lines from STDIN and calls map($line) once per line. This is initiated by the run() method.
example mapper input:
line1 line2 line3
Hadoop::Mapper transforms this into 3 calls to map()
map(line1) map(line2) map(line3)
* Hadoop::Streaming::Reducer interface
Hadoop::Reducer abstracts this stream into an interface of (key, value-iterator). reduce() is called once per key, instead of once per line. The reduce job pulls values from the iterator and outputs key/value pairs to STDOUT. emit() is provided as a convenience for outputing key/value pairs.
example reducer input:
key1 value1 key2 valuea key2 valuec key2 valueb key3 valuefoo key3 valuebar
Hadoop::Streaming::Reduce transforms this input into three calls to reduce():
reduce( key, iterator_over(qw(value1)) ); reduce( key2, iterator_over(qw(valuea valuec valueb)) ); reduce( key3, iterator_over(qw(valuefoo valuebarr)) );
* Hadoop::Streaming::Combiner interface
The Hadoop::Streaming::Combiner interface is analagous to the Hadoop::Streaming::Reducer interface. combine() is called instead of reduce() for each key. The above example would produce three calls to combine():
combine( key, iterator_over(qw(value1)) ); combine( key2, iterator_over(qw(valuea valuec valueb)) ); combine( key3, iterator_over(qw(valuefoo valuebarr)) );
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-H/15.4/noarch |