Online Conference Program and PDF Conference Program are now up.
Instructions for paper presentation is now up.
Instructions for camera ready submission and visa letters are now up.
Hotel and Conference registration are now open.
Deep network models of the deep network mechanisms of (part of) human visual intelligence
Session Chair: Yingli Tian
The human species is embarked on a bold scientific quest — to understand the mechanisms of human intelligence in engineering terms. Recent progress in multiple subfields of brain research suggests that key next steps in this quest will result from building real-world capable, systems-level network models that aim to abstract, emulate and explain the mechanisms underlying natural intelligent behavior. Over the past decade, neuroscience, cognitive science and computer science converged to create specific, image-computable, deep neural network models intended to appropriately abstract, emulate and explain the mechanisms of primate core visual object and face recognition. Based on a large body of primate neurophysiological and behavioral data, some of these network models are currently the leading (i.e. most accurate) scientific theories of the internal mechanisms of the primate ventral visual stream and how those mechanisms support the ability of humans and other primates to rapidly and accurately infer latent world content (e.g. object and face identity, position, pose, etc.) from the set of pixels in most natural images. While still far from complete, these leading scientific models already have many uses in brain science and beyond. In this talk, I will highlight one particular use: the design of patterns of light energy on the retina (i.e. new images) that neuroscientists can use to precisely modulate neuronal activity deep in the brain. Our most recent experimental work suggests that, when targeted in this new way, the responses of individual high-level primate neurons are exquisitely sensitive to barely perceptible image modifications. While surprising to many neuroscientists — ourselves included — this result is in line with the predictions of the current leading scientific models (above), it offers guidance to contemporary computer vision research, and it suggests a currently untapped non-pharmacological avenue to approach clinical interventions.
Jim DiCarlo is a Professor of Systems and Computational Neuroscience, and Director of the MIT Quest for Intelligence at the Massachusetts Institute of Technology. He trained in engineering, medicine, systems neurophysiology, and computing at Northwestern (BSE), Johns Hopkins (MD/Ph.D.), and Baylor College of Medicine (Postdoc). His overall research goal is to discover and artificially emulate the brain mechanisms that underlie human visual intelligence. He has been awarded the Alfred P. Sloan Research Fellowship (2002), the Pew Scholar Award in Biomedical Sciences (2002-2006), and the McKnight Scholar Award in Neuroscience (2006-2009). Over the past 20 years, using the non-human primate animal model, DiCarlo and his collaborators have helped developed our contemporary, engineering-level understanding of the neural mechanisms that underlie visual information processing in the ventral stream (a deep stack of cortical neural processing layers) and how that processing supports core cognitive abilities such as object and face recognition. His group’s most recent and ongoing work is in: building and testing new computational models that are the current leading scientific hypotheses of the neural mechanisms operating along the ventral stream, utilizing those models to non-invasively modulate neural activity patterns deep in the brain, testing those models using direct neural perturbations (e.g. optogenetics, chemogenetics), and exploring how the mechanisms contained in those models might develop from unsupervised biological visual experience. Based on that work and other results in the field, his group is seeking an engineering level understanding of the neural mechanisms of human visual processing - from time-varying images on the eyes to multi-level patterns of neuronal activity, to perceptual reports about the world. They aim to use this understanding to guide the development of more robust artificial vision systems (Al”), to reveal new ways to beneficially modulate brain activity via patterns of light striking the eyes, to expose new methods of accelerating visual learning, to provide a basis for new neural prosthetics (brain-machine interfaces) to restore lost senses, and to provide a scientific foundation to understand how sensory processing is altered in conditions such as agnosia, autism, and dyslexia.