Eg. filtering data, sampling data, generating top N from list
Eg. counting records, min/max, statistics, create index
- Inverted Index Mapper - count how many times a word is used on the forum
- Inverted Index Reducer - count how many times a word is used on the forum
Eg. word/record count, mean, median, standard deviation
- Mean Mapper - is there any correlation between the day of the week and how much people spent on items
- Mean Reducer - is there any correlation between the day of the week and how much people spent on items
-> Use of combiners (like semi-reducers in MapReduce)
gedit ~/.bashrc
Eg. combining data sets, RDBMS to Hadoop (take advantage of hierarchical data)
-
Data sources linked by foreign keys
-
Data must be structured and row based