Skip to content

3. Presentation ideas

James Arthur Cattell edited this page May 26, 2015 · 1 revision

Anything we could do, say or show at the Bank of England presentation. This is a place to list or test assumptions, eg there will be a projector

Do

  • Chris and James to present
  • Hanif to answer questions
  • Print copies of handouts for judges (on recycable paper)

Say

  • James to explain multidimensional scaling and Partitioning around mediods. MDS was actually one of three methods we used to cluster the data. SEE DEFINITIONS BELOW
  • To point out what are the most important or interesting patterns we managed to find from the analysis/visualization
  • All of our team work was done via the Internet, ie we never met

Show

  • The original dataset and its journey to visualisation
  • Other examples of where MDS is used, preferably completely different to finance
  • Where the presentation answers the three judging criteria:
  1. Novel or insightful that is relevant to the Bank
  2. Clear and easy to understand
  3. Aesthetically pleasing and original

Presentation structure: draft

  • Opening: (Use submission template) We want to show how the Bank can identify underlying groups of respondents with regards to financial attitudes, separate from their actual financial position. The hope is that this represents an insight into the determinants of household economic behaviour.

  • Method: (Edit/clarify as necessary)

  1. We identified questions specifically related to financial attitude for three years of the household survey data. We then measured how similarly every pair of households answered these questions using a distance measure called Gower's distance.

  2. We then used Multidimensional scaling to project these distances onto two dimensions to visualize them, similar to projecting distances between cities onto a map. Multidimensional scaling finds a 2-dimensional representation of the data that best preserves the distances between the households.

  3. Separately, we used a clustering algorithm called Partitioning Around Medoids to group households together into the observed types. This method assigns households to a cluster in a way that minimizes the distance between households and the center of that cluster.

  • Originality: MDS as a visualization technique has been used in genetics research to group individuals together based on similarity of a panel of genetic markers, and the grouping often reveals novel relationships in human ancestry. Similarly, we reveal novel relationships between households in terms of financial attitudes. {EXAMPLE}

This is an additional layer of insight building on the standard analysis techniques employed by the Bank. E.g. in the latest QB Quarterly Bulletin using the household survey data (http://www.bankofengland.co.uk/publications/Documents/quarterlybulletin/2014/qb14q405.pdf) only the aggregate proportion of respondents who are highly/somewhat concerned about their debt is shown. With our method we reveal that concern about debt is tied to other factors as well e.g. certainty in household income.

Definitions

3 Steps to the analysis

  • Calculate distance between people using general dissimilarity coefficient of Gower (1971)
  • Visualise those distances in 2 dimensions using MDS
  • Group the people using PAM - Partitioning around mediods

Distance matrix

This may be the hardest part to explain. Scale the observations so they are within the range [0, 1] then calculate dissimilarity coefficient. given by

http://www.clustan.talktalk.net/gower_similarity.html

Two references http://www.researchgate.net/profile/John_Gower2/publication/232128574_A_General_Coefficient_of_Similarity_and_Some_of_Its_Properties/links/0c960524e95bedf928000000.pdf http://adn.biol.umontreal.ca/~numericalecology/Reprints/Gower&Leg_JClass86.pdf

MDS

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. Each object is then assigned coordinates in each of the N dimensions. The number of dimensions of an MDS plot N can exceed 2 and is specified a priori. Choosing N=2 optimizes the object locations for a two-dimensional scatterplot. Ordination orders objects that are characterized by values on multiple variables (i.e., multivariate objects) so that similar objects are near each other and dissimilar objects are farther from each other

PAM

This is a partitional algorithm which attempts to break the data up into groups. It does this by attempting to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster. It is more robust to noise and outliers as compared to k-means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances.