Interpretable feature subset selection: A Shapley value based approach

Abstract

While performing Feature Subset Selection (FSS) to identify important features, a weight is assigned to each feature that is not necessarily meaningful or interpretable w.r.t. final task and in turn leads to non-actionable information. To provide a solution to this problem of interpretable FSS, we introduce a novel notion of classification game with features as players and hinge loss based characteristic function. We use the Shapley value of this game to apportion the total training error to explicitly compute the contribution of each feature (Shapley Value based Error Apportioning, SVEA) to the total training error. We formalize the notion of interpret ability in FSS by identifying 3 final task related conditions. We empirically demonstrate that features with SVEA values less than zero are the dominant ones; this set is unique for a dataset as Shapley value is unique for a game instance. For the datasets that had negative apportioning, we observe a high value of the power of classification, P SV . It compares the performance of a set of linear and non-linear classifiers learned on Shapley value-based important features and the full feature set, in most of the cases. We customize a known Monte Carlo based approximation algorithm to avoid expensive Shapley value computations. We demonstrate the sample bias robustness of SVEA scheme by providing interval estimates. We illustrate all the above aspects on both synthetic and real datasets and showed that our scheme out-performs many existing approaches like recursive feature elimination and ReliefF in most of the cases.

Publication
2020 IEEE International Conference on Big Data (Big Data)
Sandhya Tripathi
Sandhya Tripathi
Postdoctoral Research Associate

My research interests include clinical prediction model, fairness in AI models, database matching, and learning in the presence of label noise.