Using classifier cascades for scalable e-mail classification
Title | Using classifier cascades for scalable e-mail classification |
Publication Type | Journal Articles |
Year of Publication | 2011 |
Authors | Pujara J, Daumé H, Getoor L |
Journal | Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, ACM International Conference Proceedings Series |
Date Published | 2011/// |
Abstract | In many real-world scenarios, we must make judgments in the presence of computational constraints. One common computational constraint arises when the features used to make a judgment each have differing acquisition costs, but there is a fixed total budget for a set of judgments. Par- ticularly when there are a large number of classifications that must be made in a real-time, an intelligent strategy for optimizing accuracy versus computational costs is essential. E-mail classification is an area where accurate and timely results require such a trade-off. We identify two scenarios where intelligent feature acquisition can improve classifier performance. In granular classification we seek to clas- sify e-mails with increasingly specific labels structured in a hierarchy, where each level of the hierarchy requires a differ- ent trade-off between cost and accuracy. In load-sensitive classification, we classify a set of instances within an ar- bitrary total budget for acquiring features. Our method, Adaptive Classifier Cascades (ACC), designs a policy to combine a series of base classifiers with increasing compu- tational costs given a desired trade-off between cost and ac- curacy. Using this method, we learn a relationship between feature costs and label hierarchies, for granular classification and cost budgets, for load-sensitive classification. We eval- uate our method on real-world e-mail datasets with realistic estimates of feature acquisition cost, and we demonstrate su- perior results when compared to baseline classifiers that do not have a granular, cost-sensitive feature acquisition policy. |