Forest-Guided Clustering - Shedding light into the Random Forest Black Box#
Why Use Forest-Guided Clustering?
Forest-Guided Clustering (FGC) is an explainability method for Random Forest models that addresses one of the key limitations of many standard XAI techniques: the inability to effectively handle correlated features and complex decision patterns. Traditional methods like permutation importance, SHAP, and LIME often assume feature independence and focus on individual feature contributions, which can lead to misleading or incomplete explanations. As machine learning models are increasingly deployed in sensitive domains like healthcare, finance, and HR, understanding why a model makes a decision is as important as the decision itself. This is not only a matter of trust and fairness, but also a legal requirement in many jurisdictions, such as the European Union’s GDPR which mandates a “right to explanation” for automated decisions.
FGC offers a different approach: instead of approximating the model with simpler surrogates, it uses the internal structure of the Random Forest itself. By analyzing the tree traversal patterns of individual samples, FGC clusters data points that follow similar decision paths. This reveals how the forest segments the input space, enabling a human-interpretable view of the model’s internal logic. FGC is particularly useful when features are highly correlated, as it does not rely on assumptions of feature independence. It bridges the gap between model accuracy and model transparency, offering a powerful tool for global, model-specific interpretation of Random Forests.
📢 New! Forest-Guided Clustering is now on arXiv
Please see our paper Forest-Guided Clustering - Shedding Light into the Random Forest Black Box for a detailed description of the method, its theoretical foundations, and practical applications. Check it out to learn more about how FGC reveals structure in your Random Forest models!
Prefer a visual walkthrough? Watch our short introduction video by clicking below:
Curious how Forest-Guided Clustering compares to standard methods? See our notebook: Introduction to FGC: Comparison of Forest-Guided Clustering and Feature Importance.
🚀 GETTING STARTED
📚 TUTORIALS
- Introduction to Forest-Guided Clustering (FGC): Simple Use Cases
- Introduction to FGC: Comparing Forest-Guided Clustering and Feature Importance
- Special Case: Inference with Forest-Guided Clustering
- Special Case: Tree Pruning for Explainability
- Special Case: Running Forest-Guided Clustering on Large Datasets
đź§© API DOCUMENTATION:
🤝 Contributing#
We welcome contributions of all kinds—whether it’s improvements to the code, documentation, tutorials, or examples. Your input helps make Forest-Guided Clustering more robust and useful for the community.
To contribute:
Fork the repository.
Make your changes in a feature branch.
Submit a pull request to the main branch.
We’ll review your submission and work with you to get it merged.
If you have any questions or ideas you’d like to discuss before contributing, feel free to reach out to Lisa Barros de Andrade e Sousa.
📝 How to cite#
If you find Forest-Guided Clustering useful in your research or applications, please consider citing it:
@article{barros2025forest,
title = {Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box},
author = {Lisa Barros de Andrade e Sousa,
Gregor Miller,
Ronan Le Gleut,
Dominik Thalmeier,
Helena Pelin,
Marie Piraud},
journal = {ArXiv},
year = {2025},
url = {https://doi.org/10.48550/arXiv.2507.19455}
}
🛡️ License#
The fgclustering package is released under the MIT License. You are free to use, modify, and distribute it under the terms outlined in the LICENSE file.