AS.030.421 Data Science Tools for the Chemical and Materials Sciences

Course Webpage: http://occamy.chemistry.jhu.edu/courses/AS.030.421/fall_2024/index.php

Last Updated: November 14th, 2024


FALL 2024


TOPIC

Advances in measurement techniques and simulations have driven an explosion in the variety, quality, and quantity of data collected when investigating chemical and materials processes. Advances in computing have led to the practicality of machine learning (ML) and related analytical methods to explore and extract meaning from this cornucopia of data, and data science has been called the fourth pillar of the scientific method. This course will provide an introduction to modern tools of data science, including the Python programming language, Jupyter notebooks, ML algorithms and their practical implementation, and high performance computing, with specific emphasis on applying these tools to data of chemical relevance, including UV/Vis, IR and NMR spectra, 3-D micro computed tomography and hyperspectral imaging, and physical property measurements. Use of data flow languages such as LabView will also be included. Key aspects of data organization and curation will also be covered.


Class Times: TTh 12-1:15 PM Eastern Time
Classroom: Remsen 300

INSTRUCTORS:
Prof. Tyrel M. McQueen
mcqueen@jhu.edu
Zoom Office: Login to view details.
Office: New Chemistry Building #312 and Bloomberg #301
Office Hours: by appointment or just stopping by ("open door policy")

TEACHING ASSISTANT (TA)
To Be Named

Grading: 60% Homework, 5% Class Participation, 15% midterm, 20% final exam project

Lowest homework score will be dropped.


Required Texts:

"None" (but the supplementary resources will be of value)


Supplementary Resources:
  1. SciServer
  2. Materials, Automated
  3. JHU Data Services
  4. What is a Neural Network (Series)
  5. Grace Hopper: Future of Computing (1982)

Tentative Schedule (can and will change!)
Week 1: The Big Picture, Introduction/Review of Basic Python/Progamming/Data Structures [Thursday in Remsen 140]
Week 2: Introduction/Review Continued, Introduction to AI/ML Methods [TBD]
Week 3: Sample Application of TensorFlow to Lego Sorting [TBD]
Week 4: Finding useful niches for AI/ML tools [TBD]
Week 5: Numerical Solutions of Classic Chemical Kinetics Models [TBD]
Week 6: Chemical Kinetics Continued, Intro to Peak Fitting [TBD]
Week 7: Manual and Automated Peak Fitting (e.g. for UV/Vis) [Takehome Midterm]
Week 8: Data Organization, Storage, and Curation, Fall Break Day [TBD]
Week 9: Interfacing with Hardware [TBD]
Week 10: Strategies for Automating "Unautomatable" Tools and Eliding Proprietary Data Formats (e.g. for IR/NMR) [TBD]
Week 11: The Science of Color and Effective Data Visualization (e.g. 3D micro-CT, Hyperspectral) [TBD]
Week 12: Bug Hunting and Validation [TBD]
Week 13: Introduction to Supercomputing and Advanced Topics [TBD]
Thanksgiving Break
Week 14: Independent Projects [TBD]
Final Friday, December 13th, 2-5 PM [TBD]

Lectures
  1. Introductions and Conceptual Overview
  2. Bits and Bytes and Intro to Sciserver [Lecture2.ipynb, Lecture2.pdf]
  3. Simple Data Repesentations and Preparing for AI/ML [Lecture3.ipynb, Lecture3.pdf]
  4. A Tensorflow Example [Lecture4.ipynb, Lecture4.pdf, Tutorial]
  5. Lego Sorting Part I [Lego-Brick-Sorting.zip, Lecture5.ipynb, Lecture5.pdf]
  6. Lego Sorting Part II [Lecture6.ipynb, Lecture6.pdf]
  7. Lego Sorting III [Lecture7.ipynb, Lecture7.pdf]
  8. Niche Uses of AI/ML NN
  9. Chemical Kinetics I [Lecture9.ipynb, Lecture9.pdf]
  10. Chemical Kinetics II [Lecture10.ipynb, Lecture10.pdf]
  11. Chemical Kinetics III [Lecture11.ipynb, Lecture11.pdf]
  12. Peak Fitting I
  13. Peak Fitting II [Lecture13.ipynb, Lecture13.pdf, SampleSpectra.zip]
  14. Peak Fitting III [Lecture14.ipynb, Lecture14.pdf]
  15. Data Organization and Curation
  16. Interfacing With Hardware I
  17. Interfacing With Hardware II [Lecture17.ipynb, Lecture17.pdf]
  18. Proprietary File Formats [Lecture18.ipynb, Lecture18.pdf]
  19. Automating Unautomatable Tools [Lecture19.ipynb, Lecture19.pdf, DIFFaX.zip]
  20. Effective Data Visualization [Lecture20.ipynb, Lecture20.pdf, Sample-uCT.h5, Sample-HySpec.zip]
  21. The Science of Coloriometry (Asynchronous) [Slides]
  22. Bug Hunting / Validation I
  23. Bug Hunting / Validation II [Lecture23.ipynb, Lecture23.pdf]

These will be posted approximately weekly.


Homework Assigments

These will be posted weekly.


Exam Information

These will be posted as needed.


Handouts

These will be posted as mentioned in the class.


Lecture Notes

As a matter of course policy, lecture notes are not available online. You are welcome to stop by TMM's office to view them anytime.


Use of Web Resources

As described in the first day of class, use of web resources is not only permitted, but might even be encouraged. However, you MUST include URL links to all resources used as part of your inline code documentation. Remember just because it is the top web search result doesn't mean it is correct...


Use of "AI" Assistants/Generative "AI"

As described in the first day of class, use of such assistive technologies is not only permitted, but might even be encouraged. However, you MUST include all verbatim QUERIES and OUTPUTS from the use of such tools along with your transmutation of the outputs to functional and understandable solutions. You might find this paper of interest: ChatGPT is bullshit


Audit Policy

Graduate students are allowed to audit the course. It is required that you attend most of the lectures, and strongly recommended that you look at and complete the homework assignments.