Explainable AI: A Portfolio of Techniques and Applications

Overview

This portfolio showcases my work for the Explainable AI (XAI) course, where I explored a spectrum of explainability techniques across different types of machine learning models—from interpretable classical models to complex neural networks.

Through five hands-on projects and a detailed blog post, I investigated:

Interpretable Machine Learning
Explainable Techniques for Complex Models
Explainable Deep Learning
Mechanistic Interpretability
XAI for Large Language Models (LLMs)
SHAP Values for Model-Agnostic Interpretability

Each project focuses on making AI models more transparent, understandable, and accountable—addressing the growing need for explainability in real-world AI applications.

Portfolio Summary

1. Interpretable Machine Learning: Churn Prediction

Techniques: Logistic Regression, Generalized Additive Models (GAM), Linear Regression
Explainability Tools:
- Coefficient interpretation
- Partial Dependence Plots (PDP)
- Individual Conditional Expectation (ICE) plots
- Accumulated Local Effects (ALE) plots
Key Learning: Compared global vs. local feature importance to understand drivers of customer churn in a telecommunications dataset. Provided actionable business recommendations based on interpretable models.

2. Explainable AI Techniques: PDP, ICE, and ALE Plots

Techniques: Comparative model evaluation using interpretability tools
Explainability Tools:
- Partial Dependence Plots (PDP)
- Individual Conditional Expectation (ICE) plots
- Accumulated Local Effects (ALE) plots
Key Learning: Used multiple visualization techniques to reveal both global trends and individual variability in feature effects. Highlighted the importance of combining different XAI tools for a holistic understanding of model behavior.

3. Explainable Deep Learning: Hypothesis Testing with Saliency Maps

Techniques: CNN classification on MNIST, FGSM adversarial attacks
Explainability Tools:
- Saliency Maps
Hypothesis Tested: Do adversarial examples significantly change feature importance?
Key Learning: Found that in this case, adversarial perturbations did not significantly alter the model’s focus (feature importance), demonstrating robustness. Used saliency maps to validate model stability under perturbations.

4. Mechanistic Interpretability: Superposition in Neural Networks

Techniques: Toy models of superposition, Sparse Coding, Dictionary Learning
Explainability Focus:
- Visualizing compressed feature representations
- Measuring reconstruction accuracy of entangled features
Key Learning: Demonstrated how neural networks can store more features than neurons via superposition. Recovered features using sparse coding, providing insight into how models might efficiently compress and retrieve information.

5. XAI in LLMs: Visualizing Sentence Embeddings

Techniques: Sentence embeddings using all-MiniLM-L6-v2 from Hugging Face
Explainability Tools:
- PCA, t-SNE, UMAP for dimensionality reduction
Key Learning: Visualized how Large Language Models semantically organize sentences in embedding space. Used dimensionality reduction to make high-dimensional representations interpretable and to uncover latent clustering behavior.

6. SHAP Values: Model-Agnostic Interpretability (Blog Post)

Concept: Explained how SHAP values provide individualized, mathematically grounded explanations for machine learning predictions.
Key Learning: Discussed why SHAP is critical for trust, fairness, and accountability in AI. Provided real-world examples from healthcare, finance, and human resources to illustrate its impact.

Concepts Explored Across the Portfolio

Concept	Example Project
Interpretable ML	Churn Prediction
Explainable Techniques	PDP, ICE, ALE
Explainable Deep Learning	Saliency Maps on Adversarial Examples
Mechanistic Interpretability	Superposition Toy Models
XAI in LLMs	Sentence Embedding Visualizations
SHAP Values	SHAP Blog Post

Why This Matters

As AI systems become increasingly embedded in decision-making processes, explainability is no longer optional.

Trust: Users need to understand how predictions are made.
Fairness: Exposing feature contributions can reveal biases.
Accountability: Transparent models enable responsible deployment.
Improvement: Explainability helps developers debug and refine models.

This portfolio demonstrates a comprehensive understanding of XAI, spanning from simple interpretable models to complex deep learning systems, and illustrates how explainability can be achieved across the AI spectrum.

How to Explore

Each project is housed in a separate GitHub repository (links provided above).

Within each repository, you will find:

Jupyter Notebooks with detailed, step-by-step workflows
Visualizations (heatmaps, saliency maps, embedding plots)
Summary reports of findings and key reflections

Contact

For questions, feedback, or collaboration, feel free to reach out:

Divya Sharma
LinkedIn: DivyaSharma0795
Email: divya.sharma@duke.edu

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Explainable AI: A Portfolio of Techniques and Applications

Overview

Portfolio Summary

1. Interpretable Machine Learning: Churn Prediction

2. Explainable AI Techniques: PDP, ICE, and ALE Plots

3. Explainable Deep Learning: Hypothesis Testing with Saliency Maps

4. Mechanistic Interpretability: Superposition in Neural Networks

5. XAI in LLMs: Visualizing Sentence Embeddings

6. SHAP Values: Model-Agnostic Interpretability (Blog Post)

Concepts Explored Across the Portfolio

Why This Matters

How to Explore

Contact

About

Uh oh!

Releases

Packages

License

DivyaSharma0795/Explainable_AI

Folders and files

Latest commit

History

Repository files navigation

Explainable AI: A Portfolio of Techniques and Applications

Overview

Portfolio Summary

1. Interpretable Machine Learning: Churn Prediction

2. Explainable AI Techniques: PDP, ICE, and ALE Plots

3. Explainable Deep Learning: Hypothesis Testing with Saliency Maps

4. Mechanistic Interpretability: Superposition in Neural Networks

5. XAI in LLMs: Visualizing Sentence Embeddings

6. SHAP Values: Model-Agnostic Interpretability (Blog Post)

Concepts Explored Across the Portfolio

Why This Matters

How to Explore

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages