Friday, February 24, 2019, 3-4 pm, 3401 Walnut St Room 401B
Optimizing probability distributions for learning: sampling meets optimization
Optimization and sampling are both of central importance in
large-scale machine learning problems, but they are typically viewed as
very different problems. This talk presents recent results that exploit
the interplay between them. Viewing Markov chain Monte Carlo sampling
algorithms as performing an optimization over the space of probability
distributions, we demonstrate analogs of Nesterov’s acceleration
approach in the sampling domain, in the form of a discretization of an
underdamped Langevin diffusion. In the other direction, we view
stochastic gradient optimization methods, such as those that are common
in deep learning, as sampling algorithms, and study the finite-time
convergence of their iterates to an invariant distribution.
Joint work with Xiang Cheng, Niladri S. Chatterji, and Michael Jordan.
Peter Bartlett is a professor in the Computer Science
Division and Department of Statistics and Associate Director of the
Simons Institute for the Theory of Computing at the University of
California at Berkeley. His research interests include machine learning
and statistical learning theory. He is the co-author, with Martin
Anthony, of the book Neural Network Learning: Theoretical Foundations.
He has served as an associate editor of the journals Bernoulli,
Mathematics of Operations Research, the Journal of Artificial
Intelligence Research, the Journal of Machine Learning Research, and the
IEEE Transactions on Information Theory, and as program committee
co-chair for COLT and NIPS. He was awarded the Malcolm McIntosh Prize
for Physical Scientist of the Year in Australia in 2001, and was chosen
as an Institute of Mathematical Statistics Medallion Lecturer in 2008,
an IMS Fellow and Australian Laureate Fellow in 2011, and a Fellow of
the ACM in 2018. He was elected to the Australian Academy of Science in
Friday, November 16, 3-4 pm, 3401 Walnut St Room 401B
Sample-Efficient Reinforcement Learning with Rich Observations
We study a version of reinforcement learning in which the agent
must learn how to choose actions based on observations so as to maximize
long-term reward. We focus especially on when the observations may be
realistically rich, such as images, text documents, patient records,
etc. We introduce a new algorithm for systematic exploration, in other
words, for discovering through experimentation how best to choose
actions. Along the way, we also propose a new measure called the
“Bellman rank” which we argue captures the degree to which the learning
problem exhibits underlying structure, and which can be favorably
bounded in a number of previously studied cases. We show that the
Bellman rank determines the statistical efficiency of our algorithm,
which, although not computationally efficient, requires a number of
samples that is polynomial in the Bellman rank as well as more standard
parameters, but which is entirely independent of the size of the
This work is joint with Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal,
and John Langford.
Robert Schapire is a Principal Researcher at Microsoft
Research in New York City. He received his PhD from MIT in 1991. After
a short post-doc at Harvard, he joined the technical staff at AT&T Labs
(formerly AT&T Bell Laboratories) in 1991. In 2002, he became a
Professor of Computer Science at Princeton University. He joined
Microsoft Research in 2014. His awards include the 1991 ACM Doctoral
Dissertation Award, the 2003 Gödel Prize, and the 2004 Kanelakkis Theory
and Practice Award (both of the last two with Yoav Freund). He is a
fellow of the AAAI, and a member of both the National Academy of
Engineering and the National Academy of Sciences. His main research
interest is in theoretical and applied machine learning, with particular
focus on boosting, online learning, game theory, and maximum entropy.
Friday, April 27, 3-4 pm, 3401 Walnut St Room 401B
Using interaction for simpler and better learning
In the usual setup of supervised learning, the learner is given a stack of labeled examples and told to fit a classifier to them. It would be quite unnatural for a human to learn in this way, and indeed this model is known to suffer from a variety of fundamental hardness barriers. However, many of these hurdles can be overcome by moving to a setup in which the learner interacts with a human (or other information source) during the learning process.
We will see how interaction makes it possible to:
1. Learn DNF (disjunctive normal form) concepts.
2. Perform machine teaching in situations where the student’s concept class is unknown.
3. Improve the results of unsupervised learning. We will present a generic approach to “interactive structure learning” that, for instance, yields simple interactive algorithms for topic modeling and hierarchical clustering. Along the way, we will present a novel cost function for hierarchical clustering, as well as an efficient algorithm for approximately minimizing this cost.
Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He works on algorithms for machine learning, with a focus on unsupervised and interactive learning.
Friday, April 13, 3-4 pm, 3401 Walnut St Room 401B
Stability and Generalization in Adaptive Data Analysis
Datasets are often used multiple times with each successive analysis depending on the outcomes of previous analyses on the same dataset. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. In this talk I will overview a recent line of work on adaptive data analysis that studies the problem of answering a sequence of “queries” about the data distribution where each query may depend arbitrarily on answers to previous queries. I’ll then focus on a new notion of algorithmic stability with the important property that it “composes” well when data is reused. I’ll demonstrate that this notion of stability implies that the algorithm reveals little information about its input and, in particular, cannot lead to overfitting. Finally, I’ll describe simple algorithms based on this notion that can answer statistical queries about the dataset with substantially improved accuracy guarantees for low-variance queries.
Based in part on a joint work with Thomas Steinke.
Vitaly Feldman is a research scientist at Google Brain working on design and theoretical analysis of machine learning algorithms. His recent research includes foundations of adaptive data analysis, complexity of learning with constrained access to data, and privacy-preserving learning. He has also worked on understanding of natural learning systems: learning by the brain and evolution as learning. Vitaly holds a PhD from Harvard (2006) and was previously a research scientist at IBM Research – Almaden.
Friday, February 23, 3-4 pm, 3401 Walnut St Room 401B
Curiosity Driven First Person Embodied Cognition
Learning by looking, and imitating by poking around is a natural way for children to learn. Curiosity often starts with watching and repeating other people’s action. Watching other people’s actions, however, only gives a partial understanding: we know what (s)he is doing, but not why. What is missing is an embodied cognition: putting ourselves in other people’s shoes.
The goal of our work is to learning intelligence skilled behavior by observing from a first-person vision (FPV). We solve the problems of A) attention: detecting what the attention objects are, and whether they are important to us for the task that we are performing; B) intention: predicting what an agent will do in the next few seconds (5-15 secs) and how the objects of attention will respond to an agent’s action, and C) decision: making goal-oriented decisions and plans when multiple plausible choices exist.
We use basketball (5-on-5, or 1-on-1) as a testbed for our research. Some of our results include methods for a) assessing the skill of the basketball player by watching their game; b) generating goal oriented plays in a one-on-one basketball game; c) future path prediction and visual planning. Our approaches use first-person vision (FPV) input as training and testing. They are unsupervised and do not require human labeling.
This is a joint work with Gedas Bertasius, HyunSoo Park, and Stella Yu.
Friday, January 26, 3-4 pm, 3401 Walnut St Room 401B
HiGrad: Statistical Inference for Stochastic Approximation and Online Learning
Stochastic gradient descent (SGD) is an immensely popular approach for online learning in settings where data arrives in a stream or data sizes are very large. However, despite an ever-increasing volume of works on SGD, much less is known about the statistical inferential properties of predictions based on SGD solutions. In this talk, we introduce a novel procedure termedHiGrad to conduct statistical inference for online learning, without incurring additional computational cost compared with the vanilla SGD. The HiGrad procedure begins by performing SGD iterations for a while and then split the single thread into a few, and this procedure hierarchically operates in this fashion along each thread. With predictions provided by multiple threads in place, a t-based confidence interval is constructed by decorrelating predictions using covariance structures given by the Ruppert–Polyak averaging scheme. Under certain regularity conditions, the HiGrad confidence interval is shown to attain asymptotically exact coverage probability. Finally, the performance of HiGrad is evaluated through extensive simulation studies and a real data example.
Weijie Su is an Assistant Professor of Statistics in the Department of Statistics at the Wharton School, University of Pennsylvania. Prior to joining Penn in Summer 2016, Su obtained his Ph.D. in Statistics from Stanford University in 2016, under the supervision of Emmanuel Candès. He received his bachelor’s degree in Mathematics from Peking University in 2011. Su’s research interests are in statistical machine learning, high-dimensional inference, multiple testing, and privacy-preserving data analysis.
Friday, November 17, 3-4 pm, 3401 Walnut St Room 401B
K-Means: A Nonconvex Problem with Fast and Provable Algorithms
Non-convex optimization problems are ubiquitous and arise in many modern applications of statistics and machine learning. It is well known that finding a local optimum of a non-convex problem is NP-hard in general, and hence, most of the non-convex optimization procedures can only ensure convergence to stationary points without any guarantees. In some exceptional cases, such as K-means clustering, the geometric structure of the problem can be exploited to design efficient algorithms with guarantees on the quality of the solution. Such a geometric structure is usually captured in the “seeding” step of K-means algorithms. Seeding – the task of finding initial cluster centers – is critical in obtaining high-quality clusterings for K-Means.
In this talk, I will discuss fast and scalable seeding methods that produce provably good clusterings without any assumptions on the data. I will then demonstrate analytically that such methods allow for a favourable trade-off between solution quality and computational cost, speeding up the state-of-the-art algorithms by up to several orders of magnitude. I will conclude with discussing experimental results and future directions.
Hamed Hassani is currently an assistant professor of Electrical and Systems Engineering department at the University of Pennsylvania. Prior to that, he was a research fellow at Simons Institute for the Theory of Computing (UC Berkeley), and a post-doctoral researcher in the Computer Science department at ETH Zurich. He received a Ph.D. degree in Computer and Communication Sciences from EPFL, Lausanne. Hamed’s fields of interest include coding and information theory, machine learning as well as theory and applications of graphical models. He is the recipient of the 2014 IEEE Information Theory Society Thomas M. Cover Dissertation Award. His co-authored paper at ISIT 2015 received the IEEE Jack Keil Wolf ISIT Student Paper Award.