AS.030.421 Data Science Tools for the Chemical and Materials Sciences

Course Webpage: http://occamy.chemistry.jhu.edu/courses/AS.030.421/fall_2022/index.php

Last Updated: December 8, 2022


FALL 2022


TOPIC

Advances in measurement techniques and simulations have driven an explosion in the variety, quality, and quantity of data collected when investigating chemical and materials processes. Advances in computing have led to the practicality of machine learning (ML) and related analytical methods to explore and extract meaning from this cornucopia of data, and data science has been called the fourth pillar of the scientific method. This course will provide an introduction to modern tools of data science, including the Python programming language, Jupyter notebooks, ML algorithms and their practical implementation, and high performance computing, with specific emphasis on applying these tools to data of chemical relevance, including UV/Vis, IR and NMR spectra, 3-D micro computed tomography and hyperspectral imaging, and physical property measurements. Use of data flow languages such as LabView will also be included. Key aspects of data organization and curation will also be covered.


Class Times: TTh 12-1:15 PM Eastern Time
Classroom: In Person Remsen 300 (Class on December 6th in Bloomberg 475). Zoom 918-3920-9447, Passcode: 494368 (Add to Calendar)

INSTRUCTORS:
Prof. Tyrel M. McQueen
mcqueen@jhu.edu
Shannon Bernier
sbernier@jhu.edu
Zoom Office: 410-516-6201
Office: New Chemistry Building #312 and Bloomberg #301
Office Hours: by appointment or just stopping by ("open door policy")

Grading: 60% Homework, 5% Class Participation, 15% midterm, 20% final exam project

Lowest homework score will be dropped.


Required Texts:

"None" (but the supplementary resources will be of value)


Supplementary Resources:
  1. SciServer
  2. Materials, Automated
  3. JHU Data Services
  4. LabVIEW Graphical Programming

Tentative Schedule (can and will change!)
Week 1: The Big Picture, Introduction/Review of Basic Python/Progamming/Data Structures [TBD]
Week 2: Introduction/Review Continued, Introduction to Data flow languages (LabView) [TBD]
Week 3: Numerical Solutions of Classic Chemical Kinetics Models [TBD]
Week 4: Kinetics Models to Peak Fitting [TBD]
Week 5: Automatic Peak Fitting (e.g. for UV/Vis) [TBD]
Week 6: Interfacing with Hardware, The Science of Color [TBD]
Week 7: Interfacing with Hardware Cont., Midterm Exam (THURSDAY) [TBD]
Week 8: Data Organization, Storage, and Curation, Fall Break Day [No new homework]
Week 9: Strategies for Automating "Unautomatable" Tools and Eliding Proprietary Data Formats (e.g. for IR/NMR) [TBD]
Week 10: Introduction to Machine Learning Methods [TBD]
Week 11: Sample Application of TensorFlow to Lego Sorting [TBD]
Week 12: Effective Data Visualization (e.g. 3D micro-CT, Hyperspectral) and AI/ML over such data [TBD]
Week 13: Bug Hunting and Validation [TBD]
Week 14: Labview and Independent Projects [TBD]
Final December 12th, 9 AM to Noon [Bloomberg 462]

Lectures
  1. Introductions and Conceptual Overview
  2. Digital Precision and Algorithm Performance [Lecture2-notes.py, Lecture2.ipynb, Lecture2.pdf]
  3. Python 101 Part 1 [Lecture3-notes.py, Lecture3.ipynb, Lecture3.pdf]
  4. LabVIEW Basics [Lecture4Examples.zip]
  5. LabVIEW Basics Part 2, Python 101 Part 2
  6. Chemical Kinetics Part 1 [Lecture6-notes.py, Lecture6.ipynb, Lecture6.pdf]
  7. Chemical Kinetics Part 2 [Lecture7-notes.py, Lecture7.ipynb, Lecture7.pdf]
  8. UV/Vis Curve Fitting Part 1 [Lecture8-notes.py, Lecture8.ipynb, Lecture8.pdf]
  9. UV/Vis Curve Fitting Part 2 [Lecture9-notes.py, Lecture9.ipynb, Lecture9.pdf, Spectra.zip]
  10. UV/Vis Curve Fitting Part 3 [Lecture10-notes.py, Lecture10.ipynb, Lecture10.pdf]
  11. LabVIEW Interfacing with Hardware [Lecture11-notes.pdf]
  12. Colorimetry and Good Data Visualization [Lecture12-Waters.pdf]
  13. LabVIEW File Reading and Queuing [Lecture13-notes.pdf, ScanCellLengthsProgrammatically.vi, ScanCellLengthsProgrammatically-withEnum.vi, MultipleLoopsQueues.vi, ScanSingleCellLength.vi]
  14. Data Collection, Storage, and Curation [Lecture14-notes.pdf]
  15. Reversing Proprietary File Formats [Nb3Br8.hs2, Lecture15.ipynb]
  16. Automating Unautomatable Tools [Lecture16-notes.ipynb, DIFFaX.zip, Lecture16.ipynb, See also "Running an External Program: Simulating Diffraction with DIFFaX" at Materials Automated]
  17. Introduction to Machine Learning [Lecture17-notes.pdf]
  18. A Tensorflow Example [Lecture18.ipynb, Lecture18-notes.pdf, The Tutorial]
  19. Lego Brick Sorting Part 1 [Lego-Brick-Sorting.zip, Lecture19.ipynb, Lecture19-notes.pdf]
  20. Lego Brick Sorting Part 2 [Lecture20.ipynb]
  21. Data Visualization Part 1 [Lecture21-data.zip, Lecture21.ipynb, Lecture21-notes.pdf]
  22. Data Visualization Part 2 [Lecture22.ipynb, Lecture22-notes.pdf]
  23. Debugging and Coding Practices Part 1 [Lecture23-notes.pdf, Lecture23-BestPracticesWhiteboard.pdf]
  24. Debugging and Coding Practices Part 2
  25. More Python and LabView Tips 1 [Depixelater, Feynman Result, Lecture25-notes.pdf, Launch Control]
  26. Computing and Speed

These will be posted weekly.


Homework Assigments

These will be posted weekly.


Exam Information
  1. Exam 1: Exam1.pdf [Exam1-Problem1.ipynb, Exam1-Problem3.zip] (Sample Correct Answers), Open note and internet and classmates, no time limit other than due date, please acknowledge others you worked with. Class time set aside on Thursday, October 13th, for questions and working on the exam.
  2. Final Presentations: December 12th, 9 AM to Noon in Bloomberg 462 (Rubric)

These will be posted as needed.


Handouts

These will be posted as mentioned in the class.


Lecture Notes

As a matter of course policy, lecture notes are not available online. You are welcome to stop by TMM's office to view them anytime.


Audit Policy

Graduate students are allowed to audit the course. It is required that you attend most of the lectures, and strongly recommended that you look at and complete the homework assignments.