Variable Selection and Ranking for Analyzing Automobile Traffic Accident Data
- April 1st, 2005
- in
This paper explores a data mining process in which the original dataset is first transformed through a variable subset selection process followed by the application of a machine learning algorithm. A variable ranking technique, called the Sum of Maximum Gain Ratio (SMGR), is applied. This technique computes a score that is based on the over-representation of attribute values. Essentially, SMGR is the ratio of the number of cases that could potentially be reduced by an effective countermeasure to the total number of cases associated with the over-represented value. SMGR was shown empirically to provide comparable results to alternative techniques, but it had significantly improved runtime performance.