Skip to content

Latest commit

 

History

History
34 lines (23 loc) · 1.78 KB

DesignPatterns.md

File metadata and controls

34 lines (23 loc) · 1.78 KB

Design Patterns

Filtering Patterns

Eg. filtering data, sampling data, generating top N from list

Summarization Patterns

Eg. counting records, min/max, statistics, create index

Numerical Summarization

Eg. word/record count, mean, median, standard deviation

-> Use of combiners (like semi-reducers in MapReduce)

gedit ~/.bashrc

Structural Patterns

Eg. combining data sets, RDBMS to Hadoop (take advantage of hierarchical data)

  1. Data sources linked by foreign keys

  2. Data must be structured and row based