Logout succeed
Logout succeed. See you again!

Reducing contextual bandits to supervised learning PDF
Preview Reducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Loop: 1. Patient arrives with symptoms, medical history, genome... 2. Prescribe treatment. 3. Observe impact on patient’s health (e.g., improves, worsens). Goal: prescribe treatments that yield good health outcomes. Learning to interact: example #1 Practicing physician 2 2. Prescribe treatment. 3. Observe impact on patient’s health (e.g., improves, worsens). Goal: prescribe treatments that yield good health outcomes. Learning to interact: example #1 Practicing physician Loop: 1. Patient arrives with symptoms, medical history, genome... 2 3. Observe impact on patient’s health (e.g., improves, worsens). Goal: prescribe treatments that yield good health outcomes. Learning to interact: example #1 Practicing physician Loop: 1. Patient arrives with symptoms, medical history, genome... 2. Prescribe treatment. 2 Goal: prescribe treatments that yield good health outcomes. Learning to interact: example #1 Practicing physician Loop: 1. Patient arrives with symptoms, medical history, genome... 2. Prescribe treatment. 3. Observe impact on patient’s health (e.g., improves, worsens). 2 Learning to interact: example #1 Practicing physician Loop: 1. Patient arrives with symptoms, medical history, genome... 2. Prescribe treatment. 3. Observe impact on patient’s health (e.g., improves, worsens). Goal: prescribe treatments that yield good health outcomes. 2 Loop: 1. User visits website with profile, browsing history... 2. Choose content to display on website. 3. Observe user reaction to content (e.g., click, “like”). Goal: choose content that yield desired user behavior. Learning to interact: example #2 Website operator 3 2. Choose content to display on website. 3. Observe user reaction to content (e.g., click, “like”). Goal: choose content that yield desired user behavior. Learning to interact: example #2 Website operator Loop: 1. User visits website with profile, browsing history... 3 3. Observe user reaction to content (e.g., click, “like”). Goal: choose content that yield desired user behavior. Learning to interact: example #2 Website operator Loop: 1. User visits website with profile, browsing history... 2. Choose content to display on website. 3 Goal: choose content that yield desired user behavior. Learning to interact: example #2 Website operator Loop: 1. User visits website with profile, browsing history... 2. Choose content to display on website. 3. Observe user reaction to content (e.g., click, “like”). 3