Thesis Proposal

Committee Members
Advisor: Javed Aslam – webpage
Internal: Jonathan Ullman – webpage
Internal: Byron Wallace – webpage
External: Emilie Kaufmann – webpage,  CV

Advisor’s Statement on Committee Members
Javed Aslam is Maryam’s PhD thesis advisor.  Jonathan Ullman has expertise in machine learning, and he has studied the multi-arm bandit problem which is the subject of Maryam’s thesis work.  Byron Wallace has expertise in machine learning and its applications.  Emilie Kaufmann is one of the world’s leading experts in multi-arm bandit problems.

Thesis Proposal Abstract

How would one go about choosing a near-best option from an effectively infinite set of options when one has a finite amount of time to make a decision and imperfect knowledge of the quality of the options?  Such problems are well-modeled by a variant of the classical multi-arm bandit problem.

Consider drug testing, for example.  One may have access to more candidate drugs (e.g. chemical compounds found in the environment) than can be tested accurately in the time horizon of interest (e.g. a limit on the total number of tests), and yet one’s goal is to identify a “near-optimal” drug within the budget allowed.  Here the drugs are the “bandit arms”, a single drug test gives imperfect (stochastic) knowledge about the quality of the bandit arm, and one’s goal is to identify a “near-optimal” arm from a range of choices that cannot be fully explored.

We consider the pure exploration version of the multi-arm bandit problem wherein a player is confronted with many options, each of which has an unknown stochastic reward, and the goal is to quickly identify the option with the best (or “near best”) mean reward by repeatedly choosing options in some intelligent manner.  The multi-arm bandit framework is used to model numerous problem in computer science and beyond.

Our specific focus is on the “infinitely” multi-armed bandit problem, wherein the number of choices exceeds the number of possibilities that can be explored within the budget or timeframe of interest, and the “pure exploration” version of this problem, wherein one is concerned with the quality of the “best” arm identified in the end and not the rewards collected along the way.  We examine this problem in both the fixed budget setting, wherein one attempts to maximize performance with limited resources, and the fixed confidence setting, wherein one attempts to minimize budget while meeting a performance constraint.