Center for Algorithms and Machine Learning (CAML) is housed at the School of Informatics and Computing at Indiana University, Bloomington. In the recent years machine learning has flourished with the availability of data and computational resources leading to unprecedented successes in prediction and control.
Our mission is to bring together researchers across different departments at the school in order to foster excellence in algorithm development for machine learning. Our center is at the forefront of theoretical foundations of algorithm development as well as large scale applications to computer vision and health. The core set of members of CAML also includes faculty from the Department of Statistics which plays a crucial role in applications of machine learning to data analysis.
In high dimensional problems, the usual two groups problem of model selection is impossible due to the combinatorial complexity of the model space. In recent years, a set of one group models that approximates the two groups problem have been developed. Of these, the Horseshoe prior is is perhaps the most famous and places a Beta(1/2,1/2) prior on the local shrinkage parameters. There are many modifications and extensions of this framework, and we propose a new modification. Specifically, we model the local shrinkage parameter as a Beta(p,1-p) for each parameter under consideration in order to mimic the model selection problem. Placing priors on the p produces a prior distribution with extremely heavy tails that produces both very strong shrinkage of small signals and unbiased estimation of large signals, producing overall better risk behavior.
Andrew Womack is an Assistant Professor and Zikun Yang is a PhD candidate in the Department of Statistics. Their joint research efforts focus on theoretical and algorithmic aspects of model selection from a Bayesian perspective.
Scene understanding from images opens the doorway for solving various complex tasks such as visual navigation, object manipulation and reasoning about the world in 3D. These challenging tasks can be better tackled with the availability of depth information. This talk covers techniques that effectively exploit both the image data and the depth data for scene understanding. First, a semantic label propagation framework is presented that utilizes a few annotated keyframes and propagates the annotated labels to a large number of unlabeled video frames. The proposed technique alleviates the need for a labor-intensive manual annotation effort, and the propagated labels have been utilized to improve a deep neural network model for semantic segmentation - the task of simultaneously segmenting and labeling an image. This talk concludes with an approach to generate training data for distant objects in outdoor scenes and proposes two deep learning solutions for depth recovery of those objects.
Reza is a Postdoctoral Associate in the Department of Informatics, Computer Science, and Engineering (SICE) at Indiana University Bloomington. He received his Ph.D. in Computer Science from George Mason University in 2018 and M.S. in Computer Science from Drexel University in 2011. His research focuses on developing novel algorithms for perception tasks such as semantic segmentation, object detection exploiting the 3D information.
I will describe Badger Rampage, a simple general framework for designing practical algorithms for d-dimensional balanced partitioning of graphs with several billions of vertices and up to 10^12 edges. Badger Rampage is based on applying noisy projected gradient descent to a non-convex relaxation of the problem. For an appropriately chosen relaxation, the gradient descent step can be implemented simply as local averaging of the neighboring values. The technical core of our work is optimizing the computationally expensive projection step for which we give an algorithm with complexity O(|V| log^d |V|) for d <= 2. Experiments on Apache Giraph using subsets of the Facebook graph show comparable or superior performance against existing literature on one-dimensional graph partitioning with the added benefits of simplicity and generalizability to the multi-dimensional case. Analysis of convergence properties of Badger Rampage is left as an open problem.
Joint work with Dmitrii Avdiukhin (Indiana University) and Sergey Pupyrev (Facebook).
In this talk, sparse representation based and manifold optimization based image signal processing techniques are mainly discussed. These two techniques are ones of the dimension reduction methods which reduce the high dimensionality of system matrix to reduce the system complexity. Especially for deep learning process, the back propagation process for training is a time consuming process because of its iteration based optimization of nonlinear/nonconvex high dimensional cost function. While recent technologies such as deep learning make statistical decision more accurate, computational time to handle high dimensional cost function for big data is getting slower. For above reasons, dimension reduction is a key technology to achieve real-time process. Among lots of dimension reduction related techniques, sparse representation is one of the matrix factorization techniques to factor to low dimension matrices using sparse properties, and manifold optimization is a kind of piece-wise linear approximation of nonconvex/nonlinear space cost function which represents on non-Euclidean space. Additionally, random projection and tensor factorization techniques will be discussed for another possibilities to speed up the optimization processes.
Standard approaches to probabilistic reasoning require that one possesses an explicit model of the distribution in question. But, the empirical learning of models of probability distributions from partial observations is a problem for which efficient algorithms are generally not known. In this work we consider the use of bounded-degree fragments of the "sum-of-squares" logic as a probability logic. Prior work has shown that we can decide refutability for such fragments in polynomial-time. We propose to use such fragments to answer queries about whether a given probability distribution satisfies a given system of constraints and bounds on expected values. We show that in answering such queries, such constraints and bounds can be implicitly learned from partial observations in polynomial-time as well. It is known that this logic is capable of deriving many bounds that are useful in probabilistic analysis. We observe that it furthermore captures key polynomial-time fragments of resolution. Thus, these fragments are also quite expressive.
David Crandall is an Associate Professor in the School of Informatics and Computing at Indiana University, Bloomington, where he is a member of the programs in Computer Science, Informatics, Cognitive Science, and Data Science. He received the Ph.D. in computer science from Cornell University in 2008 and the M.S. and B.S. degrees in computer science and engineering from the Pennsylvania State University in 2001. He was a Postdoctoral Research Associate at Cornell from 2008-2010, and a Senior Research Scientist with Eastman Kodak Company from 2001-2003. His research in computer vision and applied machine learning has been funded by the National Science Foundation (including a CAREER award in 2013), IARPA, U.S. Navy, the Air Force Office of Scientific Research, Google, the Lilly Endowment, and Eastman Kodak Company.
David Leake is Executive Associate Dean and Professor of Computer Science in the School of Informatics and Computing at Indiana University Bloomington, a member of the Indiana University Cognitive Science faculty and data science program, and affiliate member of the Data to Insight Center. He received his Ph.D. in Computer Science from Yale University in 1990. His research interests include case-based reasoning, context, explanation, human-centered computing, intelligent user interfaces, introspective reasoning, and knowledge capture and management. He has over 150 publications in these areas, with over 7000 Google Scholar citations. He chairs the AAAI Publications Committee and served as Editor in Chief of AI Magazine, the official magazine of AAAI, for 17 years.
Chris Raphael heads the Music Informatics program in the School of Informatics, Computing, and Engineering at Indiana University, as well as holding adjunct appointments in the Jacobs School of Music, Cognitive Science, and Statistics. After receiving his PhD in Applied Mathematics from Brown University in 1991, he worked on a wide range of problems in both industry and academia including Arabic character recognition, magnetic resonance spectroscopy and mine detection before coming to focus on music. His musical research includes accompaniment systems, computer generated musical analysis, musical signal processing, and modeling of musical interpretation.
Michael S. Ryoo is an Assistant Professor of the School of Informatics and Computing at Indiana University. His research interest is within the areas of Computer Vision and Robotics, with a particular emphasis on human activity recognition/learning, first-person computer vision, and human-robot interaction. Before joining IU, Dr. Ryoo was a staff researcher within the Robotics Section of the NASA's Jet Propulsion Laboratory (JPL) from 2011 to 2015. Dr. Ryoo received the Ph.D. degree from the University of Texas at Austin in 2008, and the B.S. degree from the Korea Advanced Institute of Science and Technology (KAIST) in 2004. He has been organizing a number of tutorials and workshops at major Computer Vision conferences including CVPR 2011/2014/2016, and is the winner of the best vision paper award at ICRA 2016.
Chung-chieh Shan is an assistant professor at the School of Informatics and Computing at Indiana University, Bloomington. He studies what things mean that matter. He works to tap into and enhance the amazing human ability to create concepts, combine concepts, and share concepts, by lining up formal representations and what they represent. To this end, in the short term, he develops programming languages that divide what to do and how to do it into modules that can be built and reused separately. In particular, he develops so-called probabilistic programming languages, which divide stochastic models and inference algorithms into modules that can be built and reused separately. In the long term, he hopes to supplant first-order logic by something that does not presuppose a fact of the matter what things there are, though there may be a fact of the matter what stuff there is.
Donald Williamson is an assistant professor in the School of Informatics and Computing at Indiana University. His research broadly addresses ways that enable computers to process, understand, and response to sound information. He has specific interests in the areas of speech separation, speech recognition, speaker identification, and music processing, to name a few, where he is interested in using these methods in real-world devices, such as cell phones, hearing aids, and robots. A combination of machine learning, signal processing, and statistical-based techniques are used. Donald completed his Ph.D. in the Computer Science and Engineering department at The Ohio State University, under the supervision of Prof. DeLiang Wang. Prior to that, he was fortunate to be a Member of the Engineering Staff at Lockheed Martin for a few years. He received a Masters of Science degree from the Department of Electrical and Computer Engineering at Drexel University and a Bachelor's degree from the Department of Electrical and Computer Engineering at the University of Delaware.
Grigory Yaroslavtsev is an assistant professor of Computer Science at Indiana University. Prior to that he was a postdoctoral fellow at the Warren Center for Network and Data Sciences at the University of Pennsylvania. He was previously a Postdoctoral Fellow in Mathematics at Brown University, ICERM. He received his Ph.D. in Theoretical Computer Science in 2014 from Pennsylvania State University and an M.Sc. in Applied Mathematics and Physics from the Academic University of the Russian Academy of Sciences in 2010. Grigory works on efficient algorithms for sparsification, summarization and testing properties of large data, including approximation, parallel and online algorithms, learning theory and property testing, communication and information complexity and private data release.
Yuan Zhou is an assistant professor of Computer Science at Indiana University. Prior to that he was an instructor in applied mathematics at MIT. Yuan received his Ph.D. from Carnegie Mellon University in 2014. His research interests span theoretical computer science and operations research with emphasis on linear programming and semidefinite programming relaxations, discrete optimization, approximation algorithms and hardness of approximation, harmonic analysis of discrete functions, process flexibility and decision under uncertainty with applications to crowdsourcing.
Minje Kim received his PhD degree in Computer Science from the University of Illinois at Urbana-Champaign (2016). Before joining UIUC, he worked as a researcher in ETRI, a national lab in Korea, from 2006 to 2011. He did his Bachelor’s and Master’s studies in the Division of Information and Computer Engineering at Ajou University and in the Department of Computer Science and Engineering at POSTECH in 2004 and 2006, respectively. His research focuses on designing machine learning models applied to signal processing (e.g. audio and the other time series data), stressing their computational efficiency in the resource-constrained environments or in the implementations involving large unorganized datasets. He received Richard T. Cheng Endowed Fellowship from UIUC in 2011. Google and Starkey grants also honored his ICASSP papers as the outstanding student papers in 2013 and 2014, respectively.
Daniel McDonald is an Assistant Professor of Statistics and Adjunct Assistant Professor of Computer Science at Indiana University, Bloomington. He was an undergraduate at IU, receiving a B.S.O.F. in cello performance from the IU Jacobs School of Music and a B.A. in economics and mathematics. After college, he worked at the Federal Reserve Bank of St. Louis before enrolling at Carnegie Mellon to pursue a doctorate in statistics. Daniel's main research interests involve the estimation and quantification of prediction risk. He is especially interested in developing methods for evaluating the predictions made using complex dependent data. This includes the application of statistical learning techniques to time-series prediction problems in the context of economic forecasting, as well as investigations of cross-validation and the bootstrap for risk estimation, and examining the trade-offs between prediction risk and computational costs.
Michael W. Trosset is Professor and Chair of IU’s Department of Statistics. As an undergraduate, he studied mathematics at Rice University; as a graduate student, he studied statistics at the University of California at Berkeley. Before joining IU in 2006, he taught at the University of Arizona and the College of William & Mary. His primary research interests lie in the general areas of high-dimensional multivariate data analysis and computational statistics. Much of his work is concerned with techniques for constructing low-dimensional Euclidean representations of data, as in multidimensional scaling, graph embedding, and manifold learning.
Andrew Womack is an assistant professor in the Department of Statistics and Indiana University. Before joining IU, he was a postdoctoral researcher at the University of Florida and received his doctorate from Washington University in St. Louis. Andrew's research focuses on "objective" Bayesian methodologies in complex problems and considerations of model space uncertainty in inference. This includes the development of model space priors for low dimensional representations of high dimensional objects and the efficient computation thereof.
Russell Lyons is James H. Rudy Professor of Mathematics and Adjunct Professor of Statistics. He obtained his Ph.D. at the University of Michigan in 1983, then enjoyed a postdoc in Paris and a job at Stanford University. He frequently visits Microsoft Research. His primary area of research is discrete probability and its connections to other areas of mathematics, including ergodic theory, geometric group theory, and combinatorics. He is also very interested in the teaching of statistics and has done some research in statistics. Lyons was a Sloan Foundation Fellow, a Visiting Miller Research Professor, an Institute of Mathematical Statistics Medallion Lecturer, an Invited Speaker at the International Congress of Mathematicians, and gave an Hour Address at the Joint Mathematics Meetings. He is a Fellow of the American Mathematical Society.
John Langford studied Physics and Computer Science at the California Institute of Technology, earning a double bachelor's degree in 1997, and received his Ph.D. from Carnegie Mellon University in 2002. Since then, he has worked at Yahoo!, Toyota Technological Institute, and IBM's Watson Research Center. He is also the primary author of the popular Machine Learning weblog, hunch.net and the principle developer of Vowpal Wabbit. Previous research projects include Isomap, Captcha, Learning Reductions, Cover Trees, and Contextual Bandit learning. For more information visit his homepage.
Edo Liberty is a Principal Scientist at Amazon's AWS Machine Learning group. Previously he was a Research Director and Yahoo and the head of Yahoo Research in New York. He received his BSc in CS and Physics from Tel Aviv University and his PhD in CS from Yale. After his postdoctoral position at Yale in the Applied Math department he co-founded a New York based startup. In 2009 he joined Yahoo Research. His research focuses on the theory and practice of large scale data algorithms.
Vahab Mirrokni is a Principal Research Scientist, heading the algorithms research group at Google Research, New York. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 1999. He joined Google Research in New York in 2008, after spending a couple of years at Microsoft Research, MIT and Amazon.com. He is the co-winner of a SODA'05 best student paper award and ACM EC'08 best paper award. His research areas include algorithms, algorithmic game theory, combinatorial optimization, and social networks analysis. At Google, he is mainly working on algorithmic and economic problems related to search and online advertising. Recently he is working on online ad allocation problems, distributed algorithms for large-scale graph mining, and mechanism design for advertising exchanges.
Maxim Sviridenko is a Principal Research Scientist at Yahoo! Research, NYC where he manages the Scalable Algorithms and Machine Learning group. The group focuses on solving Machine Learning, Mathematical Optimization and Computational Economics problems on Web-scale data using latest techniques and algorithms from these fields. Previously Maxim was a staff member at the IBM T.J. Watson Research Center and a professor at the University of Warwick. His research interests span optimization, scheduling, distributed and approximation algorithms among other things.
Dirk van Gucht is a professor of Computer Science at Indiana University, Bloomington. His general areas of research are database theory and data mining.
Richard Shiffrin is a Distinguished Professor and Luther Dana Waterman Professor of Psychological and Brain Sciences. His research interest include short and long term memory, attention stages of information processing, retrieval and forgetting, mathematical computer simulation, neural network models.
Haixu Tang is a professor of Informatics and Computing at Indiana University, Bloomington. He is working on algorithmic and statistical problems in bioinformatics including computational mass spectrometry, mobile genetic elements, genome privacy, bacterial genomics and metagenomics.
Stanley Wasserman is the James H. Rudy Professor of Statistics, Psychology, and Sociology, and is a member of the Department of Psychological and Brain Sciences and the Department of Statistics. His research interests are network science and, specifically, statistical models for social networks.
Computer Science and Informatics