Skip to content

Latest commit

 

History

History
17 lines (17 loc) · 1.46 KB

Tasklist.md

File metadata and controls

17 lines (17 loc) · 1.46 KB
  1. Tristan - Determine what kind of machine learning method to use (supervised, reinforcement, unsupervised, transfer, etc.)
  2. Tristan - Investigate how to partition data into training, validation and test subsets.
  3. Tristan - Run the clustering algorithm using the different data subset partitions.
  4. Tristan - Select a cross validation method to get a more accurate rating of the performance of each model.
  5. Tristan - Interpret the clustering results and make adjustments if needed.
  6. Ryan - Determine optimal amount of vector space reduction that can be achieved without losing anything above minimal classification data.
  7. Ryan - Decide methodology of dimension reduction and document rationale behind choice.
  8. Ryan - Actual implementation of dimension reduction.
  9. Ryan - Compare application results when reducing to different numbers of dimensions.
  10. Ryan - Document if the optimal number of dimensions is relatively consistent across different document embedding techniques.
  11. Ryan - Design quality and informative visualizations for reduced 2D vector space.
  12. Ryan - *Potential. Employ different method/tool in order to reduce dimensions. Convey whether results remain consistent or differ.
  13. Yahya - Break document into word pairs
  14. Yahya - Break document into sentences
  15. Yahya - Filter data to keep relevant word tokens
  16. Yahya - Embed tokens into high dimensionality semantic vectors
  17. Yahya - Compare embedding methods and optimize using the best method