Skip to content

Latest commit

 

History

History
62 lines (35 loc) · 3.77 KB

Instructor_Notes.md

File metadata and controls

62 lines (35 loc) · 3.77 KB

Instructor Notes

Readings, Texts, and References

For the overall course, we recommend the following books as potentially being useful:

Additionally, we recommend Towards Data Science as a useful resource for this space.

Courses Using OpenDS4All Materials

Background Material

Students may find the following resources to be useful as background:

Suggested Configuration of Modules

The OpenDS4All modules can be "mixed and matched" at the discretion of the instructor, according to preferences, time constraints, and the target audience. However, certain elements do have dependencies. We suggest a "core" outline as follows:

  1. Overview, 1.5 lecture hours (basic)

    • Optional recitation: review of Python basics, including data structures
  2. Acquiring, wrangling, integrating, and cleaning data, 3-4 lecture hours (basic-intermediate)

    • Optional recitation: basics of HTML and the Document Object Model
    • Optional recitation: basics of regular expressions (often used for pattern matching) and XPath (which builds on some ideas from regular expressions and traverses XML trees)
  3. Modeling data: types, graphs, schemas, 2-4 lecture hours

    • Optional recitation: encoding tree- or graph-structured data in relations, and traversing the data
  4. Performance:

    • Foundations: Computer architecture basics, 1 hour (basic, provides an overview of CPU and memory)

    • Efficient data processing, 3-7 lecture hours (intermediate, appropriate for a more computational and big data audience)

    • Optional recitation: Use merge and merge_map algorithms from Lecture Notebook to study performance of alternative strategies. Use %%time and SQLite to study performance of database indices.

  5. Building machine learning models

    • Overview and Unsupervised Models, 1 lecture hour, basic.
    • Supervised Models, Decision Trees, Random Forests, 1-1.5 lecture hours, basic.
    • Linear and Logistic Regression, 1-1.5 lecture hours, basic.
    • Neural Networks, builds upon linear and logistic regression, 2-4 lecture hours, intermediate [requires understanding of calculus].
  6. Validating and tuning models, 1.5-3 hours, basic

Additional and advanced topics: