Recently, NSF funded a project lead by Amy N.Langville on ranking and clustering by integer and linear optimization.
The research component of this proposal is about ranking and clustering. A given collection of items is to be ordered according to some criterion (e.g., from most to least important). In clustering, the goal is to group items so that similar items appear together. Though well-studied, ranking and clustering research has been dominated by heuristic methods that produce approximate results quickly. Optimization methods that produce exact optimal results have been far less studied, possibly because these methods require longer computational times, and the gains in accuracy do not outweigh the computational costs in many applications. This is unfortunate since optimization methods are based on a beautiful theoretical framework and can lead to novel and interesting results. For example, preliminary work by the PI on a ranking optimization method, indicates that multiple optimal solutions can be linked to ties in the ranked list. On the other hand, heuristic methods are not designed to handle ties in the output ranking. Two main goals of the proposed research are: (1) to reveal interesting theoretical connections, and (2) to increase the size limits of optimally solvable problems using both classical and clever new relaxation techniques.
Ranking, also known as linear ordering (LOR), is close in spirit to the Traveling Salesman Problem (TSP): both are simple to state, yet hard to solve optimally. Over several decades, huge gains have been made on the size of solvable TSPs, that have resulted in new and unforeseen uses. Notable examples occur in the transportation industry (involving thousand-city TSPs) and in the microprocessor industry (with even larger TSPs routing copper wiring on circuit-boards). Similar progress is expected for the LOR, whose current limit is a few hundred to a few thousand items in some cases. Yet applications for much larger LORs abound, e.g. rankings of genes, products, and webpages. Furthermore, breakthroughs for the TSP (and likewise the proposed LOR work) have not solely been related to scale, but theoretical advances and progress in understanding connections to other problems and fields have been equally crucial in the development of general purpose methods for integer programming, such as cutting plane and branch and cut techniques.
The proposed research has great impact. Ranking and clustering have become standard data analysis tools with many uses. For example, Google uses ranking to order webpages resulting from user queries, and also uses clustering for its "Find Similar Pages" function. Amazon uses clustering in its "Customers Who Bought X Also Bought Y" feature. Facebook and Twitter can generate ads and friend recommendations based on ranking and clustering algorithms. To ensure the results of the proposed research reach these consumers, the work will be widely disseminated through journal publications and two books. The project's novel student training plan, which includes peer mentoring and an international exchange program, will broaden participation of underrepresented groups. The educational component and its Calculus Activity Book, being aimed at changing students attitudes towards the sciences and at instilling confidence in scientific ability, will have the stronger impact on those students who more commonly fail to be retained (likely a relatively large number coming from underrepresented groups), and thus will help further diversify and broaden participation in STEM disciplines.
Ranking, also known as linear ordering (LOR), is close in spirit to the Traveling Salesman Problem (TSP): both are simple to state, yet hard to solve optimally. Over several decades, huge gains have been made on the size of solvable TSPs, that have resulted in new and unforeseen uses. Notable examples occur in the transportation industry (involving thousand-city TSPs) and in the microprocessor industry (with even larger TSPs routing copper wiring on circuit-boards). Similar progress is expected for the LOR, whose current limit is a few hundred to a few thousand items in some cases. Yet applications for much larger LORs abound, e.g. rankings of genes, products, and webpages. Furthermore, breakthroughs for the TSP (and likewise the proposed LOR work) have not solely been related to scale, but theoretical advances and progress in understanding connections to other problems and fields have been equally crucial in the development of general purpose methods for integer programming, such as cutting plane and branch and cut techniques.
The proposed research has great impact. Ranking and clustering have become standard data analysis tools with many uses. For example, Google uses ranking to order webpages resulting from user queries, and also uses clustering for its "Find Similar Pages" function. Amazon uses clustering in its "Customers Who Bought X Also Bought Y" feature. Facebook and Twitter can generate ads and friend recommendations based on ranking and clustering algorithms. To ensure the results of the proposed research reach these consumers, the work will be widely disseminated through journal publications and two books. The project's novel student training plan, which includes peer mentoring and an international exchange program, will broaden participation of underrepresented groups. The educational component and its Calculus Activity Book, being aimed at changing students attitudes towards the sciences and at instilling confidence in scientific ability, will have the stronger impact on those students who more commonly fail to be retained (likely a relatively large number coming from underrepresented groups), and thus will help further diversify and broaden participation in STEM disciplines.
No comments:
Post a Comment