Skip to content

Latest commit

 

History

History
63 lines (35 loc) · 2.23 KB

README.md

File metadata and controls

63 lines (35 loc) · 2.23 KB

GitHub Contributors Image Generic badge Generic badge Linux visitor badge

Trajectories-Clustering

This project is realized as a computer science project in the Master 1 of data science at the university of Lille. 01/11/2020

Overview

This repository contains :

  • trajectory_class.py : python script for the class Trajectory.

  • grid_class.py : python script for the class Grid.

  • test_trajectory.py : python script for testing the trajectory class with the TDD(Test-Driven Development) method.

  • clustering trajectories.ipynb : A jupyter notebook version of the 3 sripts above that contains all the classes (Trajectory, Grid, Test)

Data

For the txt files used in this project, they are located in the folder cabspottingdata (1 file = 1 trajectory) which is available in this link.

Test-Driven Development (TDD) steps

    1. Add a new unit test
    1. Check it fails
    1. Make it pass (with the others)
    1. Refactor (if needed)
    1. Commit (& push)
    1. Go to (i).

TDD on Project

Identify input/output structures

  • Input: Set of trajectories

    • Classes: Trajectory, Point
  • Output: Graph of paths

    • Classes: Graph, Node, Edge

Apply TDD to develop clustering algorithm

Importing & plotting data

  • Download the mobility dataset from here (SF cabs)

  • Use the CSV python parser to load each filcabspottingdata.tar.gze in the directory (1 file = 1 trajectory) as an input trajectory for your algorithm

  • Run clustering algorithm on the loaded dataset by creating a new Application class that combines the parser with the clustering algorithm

  • Plot the resulting graph using matplotlib for basic rendering of the resulting graph, or:

  • Explore the effects of changing the clustering “depths” on the resulting graph