Tristan - Determine what kind of machine learning method to use (supervised, reinforcement, unsupervised, transfer, etc.)
Tristan - Investigate how to partition data into training, validation and test subsets.
Tristan - Run the clustering algorithm using the different data subset partitions.
Tristan - Select a cross validation method to get a more accurate rating of the performance of each model.
Tristan - Interpret the clustering results and make adjustments if needed.
Ryan - Determine optimal amount of vector space reduction that can be achieved without losing anything above minimal classification data.
Ryan - Decide methodology of dimension reduction and document rationale behind choice.
Ryan - Actual implementation of dimension reduction.
Ryan - Compare application results when reducing to different numbers of dimensions.
Ryan - Document if the optimal number of dimensions is relatively consistent across different document embedding techniques.
Ryan - Design quality and informative visualizations for reduced 2D vector space.
Ryan - *Potential. Employ different method/tool in order to reduce dimensions. Convey whether results remain consistent or differ.
Yahya - Break document into word pairs
Yahya - Break document into sentences
Yahya - Filter data to keep relevant word tokens
Yahya - Embed tokens into high dimensionality semantic vectors
Yahya - Compare embedding methods and optimize using the best method

Provide feedback

Saved searches