One paradigm in psychology for modelling perception is that of cue integration, where a few sensory cues are presented to a subject, and researchers elicit judgements of some parameter value of the subject's model of the data generating process. This method is a bit primitive, making no use of modern neural imaging technology or statistical learning theory, so we expect that the cue integration paradigm must fail to shed light on many aspects perception and reasoning, such as the formation of structured conceptual categories and intuitive theories of, say, classical dynamics, natural language grammar, or social interactions. Still we have learned some things from cue integration studies, and this post will engage with some of the ideas of the sub-field.
Some signals (or our distributions for them) are mutually informative: learning testable information about one signal reduces the entropy of our distribution on the other signal. We know intuitively to look for relations and correspondences between cues which are coincident (co-occurring or source co-located). When percepts are higher dimensional, they can often be summarized with spatiotemporal coordinates, and correspondences between the coordinates often take the form of alignments, which might be produced by transforming signals to a common frame of reference. For example, neuronal columns originating in our retinas follow their initial retinal organization (their eccentricity (central vs. peripheral location) and their polar angle) surprisingly far into the brain, but their data are eventually transformed into a common coordinate frame through depth estimation, which is largely done by inverting the parallax disparity of the two visual streams. More abstract percepts with coordinates, like scene maps, are also fit objects for alignment.
The correspondences mentioned so far (signal co-occurrence supporting an inference to a common signal source, and alignment of spatio-temporal coordinates when high dimensional percepts are gained through different perspectives of a common referent) are sort of, like, workable inferences from immediately present data. If we leverage prior information, then we might get more "semantic" data correspondences, like the inference that small people have higher voices (or that deeper heard voices probably originate in larger vocal tracts). That inference is sort of supported by an intuitive theory of acoustic resonance, and other semantic correspondences often have interpretations as relying on intuitive modelling of domain structure. For example, cultures intuitively cluster animals by a hierarchy of types, with degrees of similarities described through familial relations ("chimps are cousins to humans"), which is like an intuitive theory of phylogenetic origin of species through natural selection on individuals with mutated phenotypes in a breeding population.
Whatever the source of structure, perceptual cues are often mutually informative, and brains leverage this in estimation of world states. One simple way to combine cues is to take a weighted average of measured values (for example, estimates of a thing's position based on visual sense vs. haptic sense). If our state of knowledge about the signal modality is normal (like if we just know the first two moments of our received data over past observations and we expect that these statistics will hold for future observations), then we are licensed by our information to model the data sources with normal distributions - and if we take each signal's averaging weight according to the reliability of the distribution (the inverse of its covariance), then we're probably doing something like maximum likelihood estimation with a Kalman filter, and our end estimate will be an improved Gaussian (because the reliabilities of the measurements sum, maybe) over the thing's position.
That kind of combination of signal evidence in estimation is the thing that's most often meant by "cue integration". Another thing you could kind of call cue integration, or better yet "sensory fusion", concerns how things get efficiently represented in the brain, and is more related to the domain structure learning (like of concepts and theories), which then support those simpler cue integration parameter estimations.
Short of bayesian nonparametric structure learning or whatever the hell you're doing when you throw a neural net at a problem, we could just model the relation between two recurrent sensory cues in isolation. One way to examine how the brain relates two mutually informative sensory cues is to experimentally introduce systematic bias in one of the cues, and observe the learning times before subjects recalibrate their estimates. For example, psychologists going back to Helmholtz have enjoyed placing prismatic goggles on subjects and watching them stumble around. In addition to manipulating mean values of sensory cues, you could probably introduce changes in signal covariance, with interesting results (I guess it would be like gain modulation, if averaging weights are based on reliabilities).
I feel like I should have more to say confidently about sensory fusion, but I'm mostly drawing a blank for anything motivated by even a hint of mathematical understanding. How about some wild speculation?
Speculation on reliable coupling vs. efficient coding: the more reliable is the relation between two recurrent sensory cues, the more likely the brain is to produce a sensory fusion, i.e. to consider or remember only the final combined estimate. This reduction of redundant information to sparse signals is why we think intuitively of vision and taste-smell as being roughly two and half sense modalities, rather than because of the differential presence of receptive organelles at the periphery of those senses. Also, something something short codes for things of common importance, degrees of neural representation as a determinant of estimate dominance in addition to distribution reliability.
Speculation on reliable coupling vs. learning plasticity: if the relation between two recurrent sensory cues is important but unreliable (where "importance" as an experimental construct might come from, like, introducing a loss function in a decision problem), then recalibration occurs faster than if the relation were reliable. If a relation is expected to change only slowly, then beliefs will be revised only slowly (or by small revisions).
Further, if cue coupling reliably occurs in one of a few parameter regimes, then learning will be fast in changing between those regimes as representations, but slow in calibrating to new ones. For example, your eyes are basically a fixed width apart, but there's a tiny bit of variation because they can swivel in their orbits; thus wild speculation predicts that the brain will have dedicated machinery for quickly adapting its interpretation of binocular signals over a small range of monocular swivels, but that perceptions will be screwed up if eyes are swivelled a lot, or if parallax disparity between monocular signals doesn't match what you would see for eye sockets situated a fixed distance apart within your skull. Or, instead of continuous variation in the parameter regime, maybe you're just used to having psychologists put prismatic goggles on you, and so you can switch your motor coordination pretty quickly with altered context.
The end?
No comments:
Post a Comment