Basic Usage
Basic Usage#
To get explainability of your Random Forest model via Forest-Guided Clustering, you simply need to run the following commands:
from fgclustering import FgClustering
# initialize and run fgclustering object
fgc = FgClustering(model=rf, data=data, target_column='target')
fgc.run()
# visualize results
fgc.plot_global_feature_importance()
fgc.plot_local_feature_importance()
fgc.plot_decision_paths()
# obtain optimal number of clusters and vector that contains the cluster label of each data point
optimal_number_of_clusters = fgc.k
cluster_labels = fgc.cluster_labels
where
model=rf
is a trained Random Forest Classifier or Regressor object,data=data
is a dataset containing the same features as required by the Random Forest model, andtarget_column='target'
is the name of the target column (i.e. target) in the provided dataset.
For detailed instructions, please have a look at Introduction to FGC: Simple Use Cases.
Usage on big datasets
If you are working with the dataset containing large number of samples, you can use one of the following strategies:
Use the cores you have at your disposal to parallelize the optimization of the cluster number. You can do so by setting the parameter
n_jobs
to a value > 1 in therun()
function.Use the faster implementation of the pam method that K-Medoids algorithm uses to find the clusters by setting the parameter
method_clustering
to fasterpam in therun()
function.Use subsampling technique
For detailed instructions, please have a look at Special Case: FGC for Big Datasets.