This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 1 Learning Deep Architectures for AI Yoshua Bengio Dept. IRO, Universite de Montreal C.P. 6128, Montreal, Qc, H3C 3J7, Canada Yoshua.Bengio@umontreal.ca http://www.iro.umontreal.ca/ bengioy To appear in Foundations and Trends in Machine Learning Abstract Theoretical results suggest that in order to learn the kind of complicated functions that can represent high- level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures . Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks. 1 Introduction Allowing computers to model our world well enough to exhibit what we call intelligence has been the focus of more than half a century of research. To achieve this, it is clear that a large quantity of information about our world should somehow be stored, explicitly or implicitly, in the computer. Because it seems daunting to formalize manually all that information in a form that computers can use to answer questions and generalize to new contexts, many researchers have turned to learning algorithms to capture a large fraction of that information. Much progress has been made to understand and improve learning algorithms, but the challenge of artificial intelligence (AI) remains. Do we have algorithms that can understand scenes and describe them in natural language? Not really, except in very limited settings. Do we have algorithms that can infer enough semantic concepts to be able to interact with most humans using these concepts? No. If we consider image understanding, one of the best specified of the AI tasks, we realize that we do not yet have learning algorithms that can discover the many visual and semantic concepts that would seem to be necessary to interpret most images on the web. The situation is similar for other AI tasks. Consider for example the task of interpreting an input image such as the one in Figure 1. When humans try to solve a particular AI task (such as machine vision or natural language processing), they often exploit their intuition about how to decompose the problem into sub-problems and multiple levels of representation, e.g., in object parts and constellation models (Weber, Welling, & Perona, 2000; Niebles & Fei-Fei, 2007; Sudderth, Torralba, Freeman, & Willsky, 2007) where models for parts can be re-used in different object in-...
View Full Document