-
Notifications
You must be signed in to change notification settings - Fork 2
3. Presentation ideas
Anything we could do, say or show at the Bank of England presentation. This is a place to list or test assumptions, eg there will be a projector
- Chris and James to present
- Hanif to answer questions
- Print copies of handouts for judges (on recycable paper)
- James to explain multidimensional scaling and Partitioning around mediods. MDS was actually one of three methods we used to cluster the data. SEE DEFINITIONS BELOW
- To point out what are the most important or interesting patterns we managed to find from the analysis/visualization
- All of our team work was done via the Internet, ie we never met
- The original dataset and its journey to visualisation
- Other examples of where MDS is used, preferably completely different to finance
- Where the presentation answers the three judging criteria:
- Novel or insightful that is relevant to the Bank
- Clear and easy to understand
- Aesthetically pleasing and original
-
Opening: (Use submission template) We want to show how the Bank can identify underlying groups of respondents with regards to financial attitudes, separate from their actual financial position. The hope is that this represents an insight into the determinants of household economic behaviour.
-
Method: (Edit/clarify as necessary)
-
We identified questions specifically related to financial attitude for three years of the household survey data. We then measured how similarly every pair of households answered these questions using a distance measure called Gower's distance.
-
We then used Multidimensional scaling to project these distances onto two dimensions to visualize them, similar to projecting distances between cities onto a map. Multidimensional scaling finds a 2-dimensional representation of the data that best preserves the distances between the households.
-
Separately, we used a clustering algorithm called Partitioning Around Medoids to group households together into the observed types. This method assigns households to a cluster in a way that minimizes the distance between households and the center of that cluster.
- Originality: MDS as a visualization technique has been used in genetics research to group individuals together based on similarity of a panel of genetic markers, and the grouping often reveals novel relationships in human ancestry. Similarly, we reveal novel relationships between households in terms of financial attitudes. {EXAMPLE}
This is an additional layer of insight building on the standard analysis techniques employed by the Bank. E.g. in the latest QB Quarterly Bulletin using the household survey data (http://www.bankofengland.co.uk/publications/Documents/quarterlybulletin/2014/qb14q405.pdf) only the aggregate proportion of respondents who are highly/somewhat concerned about their debt is shown. With our method we reveal that concern about debt is tied to other factors as well e.g. certainty in household income.
3 Steps to the analysis
- Calculate distance between people using general dissimilarity coefficient of Gower (1971)
- Visualise those distances in 2 dimensions using MDS
- Group the people using PAM - Partitioning around mediods
This may be the hardest part to explain. Scale the observations so they are within the range [0, 1] then calculate dissimilarity coefficient. given by
http://www.clustan.talktalk.net/gower_similarity.html
Two references http://www.researchgate.net/profile/John_Gower2/publication/232128574_A_General_Coefficient_of_Similarity_and_Some_of_Its_Properties/links/0c960524e95bedf928000000.pdf http://adn.biol.umontreal.ca/~numericalecology/Reprints/Gower&Leg_JClass86.pdf
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. Each object is then assigned coordinates in each of the N dimensions. The number of dimensions of an MDS plot N can exceed 2 and is specified a priori. Choosing N=2 optimizes the object locations for a two-dimensional scatterplot. Ordination orders objects that are characterized by values on multiple variables (i.e., multivariate objects) so that similar objects are near each other and dissimilar objects are farther from each other
This is a partitional algorithm which attempts to break the data up into groups. It does this by attempting to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster. It is more robust to noise and outliers as compared to k-means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances.