- Thompson Sampling for Contextual Bandits with Linear Payoffs
- Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
- A Survey on Contextual Multi-armed Bandits
- A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit
- Variational inference for the multi-armed contextual bandit
- Medoids in almost linear time via multi-armed bandits
- Learning Structural Weight Uncertainty for Sequential Decision-Making
- Contextual Bandits with Stochastic Experts
- Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
- Semiparametric Contextual Bandits
- Learning Contextual Bandits in a Non-stationary Environment
- Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming
- Greybox fuzzing as a contextual bandits problem
- On-line Adaptative Curriculum Learning for GANs
- Machine Teaching of Active Sequential Learners
- Decentralized Cooperative Stochastic Bandits
- Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling
- Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging
- Practical Bayesian Neural Networks via Adaptive Optimization Methods
- Adapting multi-armed bandits policies to contextual bandits scenarios
- Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
- The Assistive Multi-Armed Bandit
- From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization
- Batched Multi-armed Bandits Problem
- Introduction to Multi-Armed Bandits
- Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
- Model selection for contextual bandits
- Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards
- Empirical Likelihood for Contextual Bandits
- Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
- Bayesian Optimisation over Multiple Continuous and Categorical Inputs
- Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback
- Practical Calculation of Gittins Indices for Multi-armed Bandits
- Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
- Thompson Sampling via Local Uncertainty
- Persistency of Excitation for Robustness of Neural Networks
- Neural Contextual Bandits with UCB-based Exploration
- Safe Exploration for Optimizing Contextual Bandits
- Adaptive Estimator Selection for Off-Policy Evaluation
- Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation
- Thompson Sampling for Linearly Constrained Bandits
- An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning
- Gaussian Gated Linear Networks
- Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
- Finding All ε-Good Arms in Stochastic Bandits
- Recurrent Neural-Linear Posterior Sampling for Non-Stationary Contextual Bandits
- Lenient Regret for Multi-Armed Bandits
- Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems
- Carousel Personalization in Music Streaming Apps with Contextual Bandits
- Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
- Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
- Offline Contextual Bandits with High Probability Fairness Guarantees
- Thompson Sampling for Multinomial Logit Contextual Bandits
- Residual Loss Prediction: Reinforcement Learning with no Incremental Feedback
- SIC -MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits
manjunath5496 / multi-armed-bandits-papers Goto Github PK
View Code? Open in Web Editor NEW"At some point we have to give up and say that's just the way it is. Or, not give up and push on."― Leonard Susskind,