A distributed grep
, implemented with the MapReduce model.
This repository consists of an exercise made in the context of the Sistemi Distribuiti e Cloud Computing (SDCC) course, taught at Faculty of Engineering, University of Rome Tor Vergata.
Realize a distributed grep
using the MapReduce paradigm: DistGrep returns the lines of text of a large input file given in input that match a specific pattern (i.e.,regular expression) specified. Requirements: use Go and RPC (or gRPC)
The program consists of three entities: client, server, worker. Workers can play the role of mappers and reducers.
- A client entity sends a
distgrep(file, substr)
request to a server entity, with a sourcefile
and asubstr
to search within it - The server splits the
file
inn
blocks, and assigns it ton
workers with amap(block, substr)
request - (Map) The worker entities execute the Map function, and emit a sequence of key-value pairs
(k,v)
where the keyk
is a single line of the received block matching thesubstr
, andv
is"1"
(actual implementation: the nativemap
structure does not allow multiple values, sov
is actually the number of occurrence of the linek
into the block sent to the single worker) - (Shuffle and Sort) The server receives the pairs from the workers and re-arranges them in such a way as to obtain a new sequence of key-value pairs
(k,v')
wherev'
is now a sequence of all the occurrences ofk
into all the blocks processed by the mappers. - (Reduce) The server splits again these pairs among the workers, that now act as reducers, with a
reduce(pairs)
request. They will return a final sequence of key-value pairs(k,v'')
, wherev''
is now the number of occurrences ofk
into the wholefile
. - The server returns back the pairs, re-arranged as a string, to the client, that will print it.
Communication is implemented via gRPC (on-demand, at each map/reduce request), with the interfaces declared with Protocol Buffer as IDL.
- Run
setup.sh
to create the gRPC stubs, arrange all the dependencies and generate executables. - Then you can either run individually the single entities (the executables generated in the
/<entity>/bin
folder orgo run main.go <params>
) or execute a ready-to-use demonstration by runningdemo.sh
.
The input file provided, ILIAD_1STBOOK_IT_ALTERED
, is the italian translation of the first book of the Homer's Iliad, but altered in such a way that some lines are randomly repeated across the whole file.
For each entity, please refer to the -help
flag to see all the options available.
If something goes wrong, please run teardown.sh
to reset the environment.
- Final output: it is slightly different from a typical
grep
output:- Since with MapReduce we work on the occurrences of the lines, they are both printed in output.
- Since the
map
native Go structure is not sorted in any way, the output has a different sorting at each run.
Realized with vim-go, xed, GoClipse.