1. Introduction
Simulation has emerged as an increasingly popular account of folk
psychological
(FP) talents at mindreading: predicting and explaining human mental states.
Where its rival (the theory-theory) postulates that these abilities are
explained by mastery of laws describing the connections between beliefs,
desires, and action, simulation theory proposes that we mindread by
"putting
ourselves in another's shoes". We pretend to be in the other's situation
and then coopt the very same processes that control our own thoughts and
actions to determine what we would think and do under those circumstances.
By running these processes "off-line" we are able to imaginatively
assess the mental states to be projected onto the other person. Simulation
theory appears especially economical because it requires no separate mindreading
machinery. Instead, it reemploys cognitive capacities already known to
exist on independent grounds, such as being able to reason, to imagine
a case different from our own and to appreciate what is relevantly different
about it.
There are two different ways of understanding the goals of ST (Heal 1998;
Stich and Ravenscroft 1996). On the externalist reading, ST hopes to analyse
and explain the conceptual structures found in folk psychology, while on
the internalist reading, ST is to provide an account of the mechanisms
in the brain that implement FP abilities. This paper concerns the
internalist reading. Given that ST means to describe the nature of the
brain processing that supports mindreading talents, how can support for
ST can be based on evidence concerning the brain's computational architecture?
Theories of cognitive architecture have polarized between classicists who
advocate symbolic processing, and connectionists who prefer varieties of
non-symbolic representation. Cruz (1998) has argued that PDP architecture
(connectionist processing over distributed non-symbolic representations)
is especially well suited to ST. This paper explores the connections between
PDP architecture and the TT-ST debate. Problems with the linkage between
PDP architecture and support for ST will be uncovered. Some reasons will
be given for thinking that PDP architecture is the enemy of both
TT and ST. PDP architecture suggests mechanisms for mindreading that may
defy easy classification under TT and ST rubrics.
2. Representation in Simulation Theory
To prepare the ground, a simple minded attempt to link PDP models with
ST will be presented and rejected. The purpose will be to lay bare issues
about representation in ST which will be central to what follows. The simple
minded attempt goes like this. ST is "representation poor", while
TT is essentially "representation rich". If the brain makes use
of FP laws as the TT presumes, then those laws must be somehow explicitly
represented. However, ST presumes that the brain does mindreading by running
"off-line" procedures already needed for basic cognition. Therefore
ST makes do without explicitly representing FP principles. This difference
is an excellent match with a salient difference between classical and PDP
architectures. In classical architectures, data is explicitly and symbolically
represented in memory. On the other hand, PDP models exhibit competence
at cognitive tasks in their dispositions to behave, but that competence
is nowhere explicitly represented. If the brain uses PDP architecture,
then there is no symbolic representation of FP, hence ST should be preferred
to TT.
However, this purported linkage between ST and PDP architecture does not
hold up to more careful scrutiny. There are problems with both the
thesis that TT is representation rich and the thesis that ST is representation
poor. Although TT requires that the brain make use of laws of FP,
it is not clear that the it requires that those laws be explicitly
represented
in symbolic form.1 So at best, PDP architecture would rule out
only those versions of TT that explicitly represent FP.
Of course TT supposes that something is represented, for TT still
needs to represent the beliefs, desires and other mental states of the
person one hopes to predict or explain. However, ST postulates representations
of exactly the the same kind. ST contends that in attributing (for example)
beliefs to others, I use exactly the same mechanism (processor) that is
responsible for fixing my own beliefs. This means that the processor has
access to representations consisting of my own mental states on one side
and imaginary mental states on the other, along with machinery to insure
that the outputs for imaginary mental states control my reasoning about
you rather than my own actions. The simulation story explicitly mentions
mental states such as beliefs which are inputs to, and outputs from a processor.
If ST is to be taken at its word, then representations for propositional
attitudes exist in the brain.
It follows that those who hope to find connectionist support for a ST literally
understood must seek it in models that preserve some notion of
propositional
attitude representation. Luckily, the representations needed can be located
in the activation patterns of PDP models or in the weights between the
units which create dispositions to form such patterns. However, once these
representational notions are secured in PDP models, they can be used to
support PDP models of TT as well. Even for those versions of TT that require
the representation of laws , the devices in the PDP architecture
that subserve the formation of representations of beliefs and desires will
presumably also be adequate for representing laws. Therefore, representational
considerations in PDP architecture are so far irrelevant to the TT-ST
debate.
But why should ST be seriously committed to propositional attitude
representations?
Perhaps a less literal reading of ST would allow a better accommodation
with PDP-architecture. For example, defenders of a PDP-ST linkage might
insist that ST is compatible with purely procedural brain processing, so
that talk of propositional attitudes and processors going "off line",
is merely metaphorical. The danger here is that as the demands ST places
on the brain's implementation of FP abilities are relaxed, so the linkage
between ST and its evidential support in brain architecture is weakened.
If talk of propositional attitude representations and their interaction
with a processor is striped away, then what exactly are the implications
of ST for the nature of brain processing? This issue will be revisited
once a few more ideas are put in place.
3. A Sketch an Argument Linking PDP Architecture to ST
Joe Cruz (1998) has proposed a more sophisticated way of forging the link
between PDP architecture and ST. Nevertheless, parallel issues concerning
representation in ST will arise. According ST, the same processor used
to control the formation of my own mental states is coopted to process
attribution of mental states to others. So my first person (1P) and third
person (3P) processing are very similar. However, according to the TT,
my own mental states are formed by one mechanism while the attribution
of mental states to others is accomplished by applying a folk psychological
theory to information about their case. So 1P and 3P processing
are very different in TT. Cruz' strategy is to explain why PDP architectures
must support processing for 1P and 3P cases that is nearly the same. Cruz'
reasoning revolves around two claims. The first is that PDP models display
a brand of processing homogeneity. Homogeneity means that when a
single network accomplishes two similar tasks, it uses similar processing
to get those jobs done. The second is that homogeneity entails the similarity
of 1P and 3P processing.
To demonstrate the second claim, Cruz notes the strong similarities in
1P and 3P inference. Reasoning about my own case and cases of others follow
strikingly similar basic principles, which supports the idea that the
corresponding
tasks are nearly the same. Presuming that the cognitive processing for
the two kinds of case is carried out in a single PDP network, homogeneity
guarantees that processing for 1P and 3P cases is similar, and so the ST
architecture is preferred.
This reasoning is sound only if it can be established that 1P and 3P processing
is carried out in a single network. Cruz attempts to eliminate belief in
a radically separate 1P and 3P networks with empirical evidence. One of
the most famous lines of experiment in developmental literature on FP (Perner
et. al. 1987) concerns false belief tasks where 3 year-old children consistently
attribute their own beliefs to others who are not in a position
to know what they know. According to Cruz, such errors in 3P processing
can only be explained by having information from the 1P net (what the child
believes) communicate with the 3P net. But this is incompatible with the
separation of the two nets.2
4. Why Homogeneity Might be Bad for ST
Since he is arguing that classical models select TT in preference to ST,
Cruz needs to explain how classical architecture escapes his argument for
similarity of 1P and 3P processing. Along the way, he inadvertently opens
the door for an argument that connectionist models for ST need not
be homogeneous as he claims. Classical models, Cruz notes (1998, p. 333),
make the data/procedure distinction. This means that two very different
processes can communicate with each other by sharing the very same
representations.
So a single classical network consisting of two sub-modules that share
representations can explain how 1P and 3P information can be made available
to two very different sub-processors in the same mechanism. Classical
architectures
can be inhomogeneous and still share information between modules because
representations and the procedures that operate on them are separable.
On the other hand, Cruz contends that PDP models cannot make the data-processing
distinction, so there is no room for a classical explanation of this kind.
If 1P "information" is available to 3P processing, it must be
because the 3P processing mimics 1P processing in relevant respects, that
is, the two kinds of processing must be similar. Note then that the purported
absence of shareable representations in PDP architectures is crucial for
establishing Cruz' homogeneity result.
However this very conception of the nature of PDP architecture threatens
rather than supports a robustly interpreted ST. The ST story claims that
the very same mechanism that outputs representations of what I come to
believe in 1P processing outputs representations of what another will believe
in the 3P case. But these representations go on to play very different
roles: belief fixation in one case and belief attribution in the other.
So depending on whether I am forming my own beliefs or simulating beliefs
of another, propositional attitude representations are made available to
either belief fixation or belief attribution processors. It follows that
connectionist models that support ST must make room for the idea that
representations
of beliefs and desires are shared between different processors.3
Defenders of a PDP-ST link may complain that ST does not require
that the brain contain shareable representations. ST is compatible with
a procedural account that does not advert to representations at all. ST
has genuine implications for brain architecture none the less because if
ST is true, the processing for 1P and 3P tasks will be expected to be very
similar.
However, such a purely procedural criterion is too weak. For example, imagine
one were to note that bouts of brain processing are very similar on occasions
when 1P and 3P tasks are accomplished. This would provide no support for
ST unless one could establish that the function of the similar processing
episodes was to determine mental states for the self and the other. Otherwise,
the similarity processing could be attributed to (say) a mechanism needed
to focus attention, or to solve the frame problem, etc.. To identify the
function of a process, some account of the kind of information being processed
is required. It is impossible to identify processes as 1P or 3P determination
of mental states, until one is able to identify brain states that embody
information about mental states. A functional decomposition of cognition
into computational sub-units presupposes an account of what those units
process.
So ST needs some coherent account of representations in the brain if it
is to begin a functional analysis of FP abilities. This requirement should
not be confused with the hypothesis that the data/procedure distinction
applies to the brain. On that hypothesis, the brain stores explicit symbolic
representations which are stored to and accessed from memory. A connectionist
theory of vector representation does not entail the presence of data of
this kind. However, ST does need a meaningful account of representations
and of their interactions with procedures designed to compute over them.
Anything less guts the empirical interest of ST as a hypothesis about brain
implementation.
The upshot is that Cruz faces a dilemma in providing a connectionist support
for ST models of (say) belief acquisition and attribution. If he is right
that PDP models cannot provide any account of procedures that share
representations,
then ST cannot be taken seriously as an account of the nature of cognitive
processing. On the other hand, if he provides an account of PDP mechanisms
that support the relevant notions of shareable representations needed to
tell the ST story, then the theory-theorist threatens to employ exactly
the same connectionist mechanisms to secure representation sharing needed
to explain data on the false belief task. The result will be a PDP model
for TT that escapes homogeneity by deploying the tactic Cruz outlines within
classical architecture. So connectionism does not sway us away from TT.
5. Why Homogeneity May Fail in PDP Architecture
It is fortunate for a potential alliance between connectionism and ST that
Cruz' claim that adequate PDP models of cognition are homogeneous can be
questioned. A single PDP network containing two communicating sub-modules
that do similar tasks in very different ways can be easily constructed.
For example, it is well known that two nets trained to solve the same task
by back propagation typically find very different solutions to the problem,
especially if one net has few while the other has may hidden units. Two
such sub-nets accomplishing the same task in different ways can hooked
together to form a single net that can mimic the kind of "data
interference"
found in the false belief task. When "cross talk" connections
from hidden units of one sub-net to the output units of the other are wired
in, data from one net can influence the output of the other. If the brain's
architecture for managing 1P and 3P processing were something like this,
then PDP networks would be compatible with TT rather than ST.
One might object that this counter model is not in the spirit of PDP
architecture,
since it contains modules whose representations are not fully distributed.
But it is dangerous to rule out such semi-modular connectionist architectures
by fiat, since the force of Cruz' conclusion will be thereby weakened.
If the argument applies only to those models of the brain that disallow
any connections between modules, then any evidence for modularity in FP
processing (for example research suggesting autism is a dissociation between
1P and 3P processing) would undercut support for ST.
Even if we restrict attention to fully distributed connectionist models,
Cruz' contention that PDP models are homogeneous can be challenged. His
case for homogeneity rests on the idea that PDP networks can process similar
tasks only by processing them in similar ways. Evidence from connectionist
research questions this assumption. For example, Servan-Screiber et. al.
(1991) show that a single network trained on a symbolic parsing task processes
examples of the task which have the same syntactic structure in very different
ways. Especially in networks with larger numbers of hidden units, the clean
mapping between the similarities in the task and the similarities in the
processing that Cruz predicts is not found.
It is ironic that Cruz should rest his case on homogeneity of PDP models,
for the fact that PDP are in homogeneous plays a role in connectionist
arguments against the language of thought hypothesis. Fodor's intuition
(expressed as Principle P (1987, pp. 141-143)) was that structure we find
in reasoning and language understanding can only be explained by corresponding
structure in the brain's processing, thus establishing the language of
thought. However, connectionist research discredits the idea that connectionist
models must cleanly mirror similarity structures that we find in tasks
in order to process those tasks (Garson, 1997, p. 349 ff.).
6. Are PDP Models for TT Ad Hoc?
In the end, Cruz admits that PDP architectures can be made compatible with
TT, but he defends himself by claiming such models are ad hoc ,
and so fail to provide genuine explanations. A connectionist network that
implements TT would require a big difference between 1P and 3P processing
and evidence that the 3P processing counts as a genuine theory .
PDP models can be artificially constructed that meet these criteria, but
they do not count as explanations of data on folk psychological abilities,
unless we have some independent evidence that those models should be employed
by the brain. For example, in light of the similarities found in 1P and
3P inferences, the assumption that 1P and 3P processing operate in very
different ways seems gratuitous. An explanation based on a homogeneous
mechanism would seem more principled.
However, there are good reasons for expecting that processing of 1P and
3P cases should not be similar in PDP models, despite similarities in the
1P and 3P "inferential economies". During connectionist training,
hidden unit representations are sensitively tuned to facilitate the processing
that accomplishes the task to be learned.4 When tasks are different,
very different representations and processing typically develop. What should
we expect in the case of 1P and 3P processing? The answer depends on how
much of the process one considers relevant. From a global point of view,
1P and 3P tasks are quite different, despite similarity in inferential
relationships. 3P processing requires a complex assessment of the other's
situation to adjust for relevant differences between his case and mine.
(I might eat that ice cream but I know he has fairly good discipline about
his diet.) Furthermore, determinations of mental states of others are used
for managing social interaction, while 1P processing outputs beliefs and
desires directly for ones own use. Since the processes that precede and
follow 3P mental state determination are so different from those that precede
and follow 1P determination, it can be expected that connectionist systems
crafted to do a good job in these two global processes should use different
styles of representation and processing. It is only when one arbitrarily
limits consideration to the inferential parts of the two tasks that one
expects processing similarities. This makes sense only if there is independent
evidence for a separable connectionist module devoted only to this part
of the task. Assuming the brain is not modular in this way, there is every
reason for expecting large differences in 1P and 3P processing, (which
may have produced selection pressures for the growth of a separate 3P
module).
Even if it is granted that 1P and 3P processing can be expected to differ
in PDP models, it does not follow that those models are compatible with
the TT. As Cruz suggests (1998, p. 331) one still needs a principled reason
for calling the network's 3P processing a theory . One answer could
be that the 3P processing operates on activation vectors which amount to
explicit representation of FP laws. Although this would clearly count as
the implementation of a theory, one worries that the solution is ad
hoc . Why should the brain go to all the trouble of explicitly
representing
laws in the 3P case, and not in the 1P case?5
I have already argued that TT does not require the representation
of laws; if so some other reason must be given for considering 3P processing
in a PDP network to be a theory. One possible strategy is to base the
attribution
on the fact that the 3P processing embodies general knowledge that can
be applied to all people, while 1P processing is specialized for the self.
(This difference might provide further reason for thinking 1P and 3P processing
should be different.) One complication in the ST-TT debate is that theory
theorists disagree on the requirements for the presence of a theory.6
Some may object that a theory is more than any general body of knowledge,
and so reject the idea that the distinct 3P processing described above
really counts as a theory. My own intuitions are that stronger criteria
for being a theory are needed and that 3P processing in the human brain
probably does not meet them. If I am right, then connectionist models support
neither TT nor ST accounts of FP abilities. ST would require that 1P and
3P mental state determination be very similar, and we have reasons for
thinking they would be quite different. TT would require that we dignify
3P processing as a theory, which we may be unable to do. So the most valuable
contribution of connectionism may be that it suggests accounts of FP processing
that lie outside the ST-TT debate.