This portfolio showcases my work for the Explainable AI (XAI) course, where I explored a spectrum of explainability techniques across different types of machine learning models—from interpretable classical models to complex neural networks.
Through five hands-on projects and a detailed blog post, I investigated:
- Interpretable Machine Learning
- Explainable Techniques for Complex Models
- Explainable Deep Learning
- Mechanistic Interpretability
- XAI for Large Language Models (LLMs)
- SHAP Values for Model-Agnostic Interpretability
Each project focuses on making AI models more transparent, understandable, and accountable—addressing the growing need for explainability in real-world AI applications.
-
Techniques: Logistic Regression, Generalized Additive Models (GAM), Linear Regression
-
Explainability Tools:
- Coefficient interpretation
- Partial Dependence Plots (PDP)
- Individual Conditional Expectation (ICE) plots
- Accumulated Local Effects (ALE) plots
-
Key Learning: Compared global vs. local feature importance to understand drivers of customer churn in a telecommunications dataset. Provided actionable business recommendations based on interpretable models.
-
Techniques: Comparative model evaluation using interpretability tools
-
Explainability Tools:
- Partial Dependence Plots (PDP)
- Individual Conditional Expectation (ICE) plots
- Accumulated Local Effects (ALE) plots
-
Key Learning: Used multiple visualization techniques to reveal both global trends and individual variability in feature effects. Highlighted the importance of combining different XAI tools for a holistic understanding of model behavior.
-
Techniques: CNN classification on MNIST, FGSM adversarial attacks
-
Explainability Tools:
- Saliency Maps
-
Hypothesis Tested: Do adversarial examples significantly change feature importance?
-
Key Learning: Found that in this case, adversarial perturbations did not significantly alter the model’s focus (feature importance), demonstrating robustness. Used saliency maps to validate model stability under perturbations.
-
Techniques: Toy models of superposition, Sparse Coding, Dictionary Learning
-
Explainability Focus:
- Visualizing compressed feature representations
- Measuring reconstruction accuracy of entangled features
-
Key Learning: Demonstrated how neural networks can store more features than neurons via superposition. Recovered features using sparse coding, providing insight into how models might efficiently compress and retrieve information.
-
Techniques: Sentence embeddings using
all-MiniLM-L6-v2from Hugging Face -
Explainability Tools:
- PCA, t-SNE, UMAP for dimensionality reduction
-
Key Learning: Visualized how Large Language Models semantically organize sentences in embedding space. Used dimensionality reduction to make high-dimensional representations interpretable and to uncover latent clustering behavior.
- Concept: Explained how SHAP values provide individualized, mathematically grounded explanations for machine learning predictions.
- Key Learning: Discussed why SHAP is critical for trust, fairness, and accountability in AI. Provided real-world examples from healthcare, finance, and human resources to illustrate its impact.
| Concept | Example Project |
|---|---|
| Interpretable ML | Churn Prediction |
| Explainable Techniques | PDP, ICE, ALE |
| Explainable Deep Learning | Saliency Maps on Adversarial Examples |
| Mechanistic Interpretability | Superposition Toy Models |
| XAI in LLMs | Sentence Embedding Visualizations |
| SHAP Values | SHAP Blog Post |
As AI systems become increasingly embedded in decision-making processes, explainability is no longer optional.
- Trust: Users need to understand how predictions are made.
- Fairness: Exposing feature contributions can reveal biases.
- Accountability: Transparent models enable responsible deployment.
- Improvement: Explainability helps developers debug and refine models.
This portfolio demonstrates a comprehensive understanding of XAI, spanning from simple interpretable models to complex deep learning systems, and illustrates how explainability can be achieved across the AI spectrum.
Each project is housed in a separate GitHub repository (links provided above).
Within each repository, you will find:
- Jupyter Notebooks with detailed, step-by-step workflows
- Visualizations (heatmaps, saliency maps, embedding plots)
- Summary reports of findings and key reflections
For questions, feedback, or collaboration, feel free to reach out:
- Divya Sharma
- LinkedIn: DivyaSharma0795
- Email: divya.sharma@duke.edu