Simulation and Connectionism: What is the Connection?

by James W. Garson
JGarson@uh.edu
Paper presented at the 26th Annual meeting of the Society for Philosophy and Psychology,
Barnard College, New York,
June 17th 2000.

1. Introduction

Simulation has emerged as an increasingly popular account of folk psychological (FP) talents at mindreading: predicting and explaining human mental states. Where its rival (the theory-theory) postulates that these abilities are explained by mastery of laws describing the connections between beliefs, desires, and action, simulation theory proposes that we mindread by "putting ourselves in another's shoes". We pretend to be in the other's situation and then coopt the very same processes that control our own thoughts and actions to determine what we would think and do under those circumstances. By running these processes "off-line" we are able to imaginatively assess the mental states to be projected onto the other person. Simulation theory appears especially economical because it requires no separate mindreading machinery. Instead, it reemploys cognitive capacities already known to exist on independent grounds, such as being able to reason, to imagine a case different from our own and to appreciate what is relevantly different about it.

There are two different ways of understanding the goals of ST (Heal 1998; Stich and Ravenscroft 1996). On the externalist reading, ST hopes to analyse and explain the conceptual structures found in folk psychology, while on the internalist reading, ST is to provide an account of the mechanisms in the brain that implement FP abilities. This paper concerns the internalist reading. Given that ST means to describe the nature of the brain processing that supports mindreading talents, how can support for ST can be based on evidence concerning the brain's computational architecture? Theories of cognitive architecture have polarized between classicists who advocate symbolic processing, and connectionists who prefer varieties of non-symbolic representation. Cruz (1998) has argued that PDP architecture (connectionist processing over distributed non-symbolic representations) is especially well suited to ST. This paper explores the connections between PDP architecture and the TT-ST debate. Problems with the linkage between PDP architecture and support for ST will be uncovered. Some reasons will be given for thinking that PDP architecture is the enemy of both TT and ST. PDP architecture suggests mechanisms for mindreading that may defy easy classification under TT and ST rubrics.

2. Representation in Simulation Theory

To prepare the ground, a simple minded attempt to link PDP models with ST will be presented and rejected. The purpose will be to lay bare issues about representation in ST which will be central to what follows. The simple minded attempt goes like this. ST is "representation poor", while TT is essentially "representation rich". If the brain makes use of FP laws as the TT presumes, then those laws must be somehow explicitly represented. However, ST presumes that the brain does mindreading by running "off-line" procedures already needed for basic cognition. Therefore ST makes do without explicitly representing FP principles. This difference is an excellent match with a salient difference between classical and PDP architectures. In classical architectures, data is explicitly and symbolically represented in memory. On the other hand, PDP models exhibit competence at cognitive tasks in their dispositions to behave, but that competence is nowhere explicitly represented. If the brain uses PDP architecture, then there is no symbolic representation of FP, hence ST should be preferred to TT.

However, this purported linkage between ST and PDP architecture does not hold up to more careful scrutiny. There are problems with both the thesis that TT is representation rich and the thesis that ST is representation poor. Although TT requires that the brain make use of laws of FP, it is not clear that the it requires that those laws be explicitly represented in symbolic form.1 So at best, PDP architecture would rule out only those versions of TT that explicitly represent FP.

Of course TT supposes that something is represented, for TT still needs to represent the beliefs, desires and other mental states of the person one hopes to predict or explain. However, ST postulates representations of exactly the the same kind. ST contends that in attributing (for example) beliefs to others, I use exactly the same mechanism (processor) that is responsible for fixing my own beliefs. This means that the processor has access to representations consisting of my own mental states on one side and imaginary mental states on the other, along with machinery to insure that the outputs for imaginary mental states control my reasoning about you rather than my own actions. The simulation story explicitly mentions mental states such as beliefs which are inputs to, and outputs from a processor. If ST is to be taken at its word, then representations for propositional attitudes exist in the brain.

It follows that those who hope to find connectionist support for a ST literally understood must seek it in models that preserve some notion of propositional attitude representation. Luckily, the representations needed can be located in the activation patterns of PDP models or in the weights between the units which create dispositions to form such patterns. However, once these representational notions are secured in PDP models, they can be used to support PDP models of TT as well. Even for those versions of TT that require the representation of laws , the devices in the PDP architecture that subserve the formation of representations of beliefs and desires will presumably also be adequate for representing laws. Therefore, representational considerations in PDP architecture are so far irrelevant to the TT-ST debate.

But why should ST be seriously committed to propositional attitude representations? Perhaps a less literal reading of ST would allow a better accommodation with PDP-architecture. For example, defenders of a PDP-ST linkage might insist that ST is compatible with purely procedural brain processing, so that talk of propositional attitudes and processors going "off line", is merely metaphorical. The danger here is that as the demands ST places on the brain's implementation of FP abilities are relaxed, so the linkage between ST and its evidential support in brain architecture is weakened. If talk of propositional attitude representations and their interaction with a processor is striped away, then what exactly are the implications of ST for the nature of brain processing? This issue will be revisited once a few more ideas are put in place.

3. A Sketch an Argument Linking PDP Architecture to ST

Joe Cruz (1998) has proposed a more sophisticated way of forging the link between PDP architecture and ST. Nevertheless, parallel issues concerning representation in ST will arise. According ST, the same processor used to control the formation of my own mental states is coopted to process attribution of mental states to others. So my first person (1P) and third person (3P) processing are very similar. However, according to the TT, my own mental states are formed by one mechanism while the attribution of mental states to others is accomplished by applying a folk psychological theory to information about their case. So 1P and 3P processing are very different in TT. Cruz' strategy is to explain why PDP architectures must support processing for 1P and 3P cases that is nearly the same. Cruz' reasoning revolves around two claims. The first is that PDP models display a brand of processing homogeneity. Homogeneity means that when a single network accomplishes two similar tasks, it uses similar processing to get those jobs done. The second is that homogeneity entails the similarity of 1P and 3P processing.

To demonstrate the second claim, Cruz notes the strong similarities in 1P and 3P inference. Reasoning about my own case and cases of others follow strikingly similar basic principles, which supports the idea that the corresponding tasks are nearly the same. Presuming that the cognitive processing for the two kinds of case is carried out in a single PDP network, homogeneity guarantees that processing for 1P and 3P cases is similar, and so the ST architecture is preferred.

This reasoning is sound only if it can be established that 1P and 3P processing is carried out in a single network. Cruz attempts to eliminate belief in a radically separate 1P and 3P networks with empirical evidence. One of the most famous lines of experiment in developmental literature on FP (Perner et. al. 1987) concerns false belief tasks where 3 year-old children consistently attribute their own beliefs to others who are not in a position to know what they know. According to Cruz, such errors in 3P processing can only be explained by having information from the 1P net (what the child believes) communicate with the 3P net. But this is incompatible with the separation of the two nets.2

4. Why Homogeneity Might be Bad for ST

Since he is arguing that classical models select TT in preference to ST, Cruz needs to explain how classical architecture escapes his argument for similarity of 1P and 3P processing. Along the way, he inadvertently opens the door for an argument that connectionist models for ST need not be homogeneous as he claims. Classical models, Cruz notes (1998, p. 333), make the data/procedure distinction. This means that two very different processes can communicate with each other by sharing the very same representations. So a single classical network consisting of two sub-modules that share representations can explain how 1P and 3P information can be made available to two very different sub-processors in the same mechanism. Classical architectures can be inhomogeneous and still share information between modules because representations and the procedures that operate on them are separable. On the other hand, Cruz contends that PDP models cannot make the data-processing distinction, so there is no room for a classical explanation of this kind. If 1P "information" is available to 3P processing, it must be because the 3P processing mimics 1P processing in relevant respects, that is, the two kinds of processing must be similar. Note then that the purported absence of shareable representations in PDP architectures is crucial for establishing Cruz' homogeneity result.

However this very conception of the nature of PDP architecture threatens rather than supports a robustly interpreted ST. The ST story claims that the very same mechanism that outputs representations of what I come to believe in 1P processing outputs representations of what another will believe in the 3P case. But these representations go on to play very different roles: belief fixation in one case and belief attribution in the other. So depending on whether I am forming my own beliefs or simulating beliefs of another, propositional attitude representations are made available to either belief fixation or belief attribution processors. It follows that connectionist models that support ST must make room for the idea that representations of beliefs and desires are shared between different processors.3

Defenders of a PDP-ST link may complain that ST does not require that the brain contain shareable representations. ST is compatible with a procedural account that does not advert to representations at all. ST has genuine implications for brain architecture none the less because if ST is true, the processing for 1P and 3P tasks will be expected to be very similar.

However, such a purely procedural criterion is too weak. For example, imagine one were to note that bouts of brain processing are very similar on occasions when 1P and 3P tasks are accomplished. This would provide no support for ST unless one could establish that the function of the similar processing episodes was to determine mental states for the self and the other. Otherwise, the similarity processing could be attributed to (say) a mechanism needed to focus attention, or to solve the frame problem, etc.. To identify the function of a process, some account of the kind of information being processed is required. It is impossible to identify processes as 1P or 3P determination of mental states, until one is able to identify brain states that embody information about mental states. A functional decomposition of cognition into computational sub-units presupposes an account of what those units process.

So ST needs some coherent account of representations in the brain if it is to begin a functional analysis of FP abilities. This requirement should not be confused with the hypothesis that the data/procedure distinction applies to the brain. On that hypothesis, the brain stores explicit symbolic representations which are stored to and accessed from memory. A connectionist theory of vector representation does not entail the presence of data of this kind. However, ST does need a meaningful account of representations and of their interactions with procedures designed to compute over them. Anything less guts the empirical interest of ST as a hypothesis about brain implementation.

The upshot is that Cruz faces a dilemma in providing a connectionist support for ST models of (say) belief acquisition and attribution. If he is right that PDP models cannot provide any account of procedures that share representations, then ST cannot be taken seriously as an account of the nature of cognitive processing. On the other hand, if he provides an account of PDP mechanisms that support the relevant notions of shareable representations needed to tell the ST story, then the theory-theorist threatens to employ exactly the same connectionist mechanisms to secure representation sharing needed to explain data on the false belief task. The result will be a PDP model for TT that escapes homogeneity by deploying the tactic Cruz outlines within classical architecture. So connectionism does not sway us away from TT.

5. Why Homogeneity May Fail in PDP Architecture

It is fortunate for a potential alliance between connectionism and ST that Cruz' claim that adequate PDP models of cognition are homogeneous can be questioned. A single PDP network containing two communicating sub-modules that do similar tasks in very different ways can be easily constructed. For example, it is well known that two nets trained to solve the same task by back propagation typically find very different solutions to the problem, especially if one net has few while the other has may hidden units. Two such sub-nets accomplishing the same task in different ways can hooked together to form a single net that can mimic the kind of "data interference" found in the false belief task. When "cross talk" connections from hidden units of one sub-net to the output units of the other are wired in, data from one net can influence the output of the other. If the brain's architecture for managing 1P and 3P processing were something like this, then PDP networks would be compatible with TT rather than ST.

One might object that this counter model is not in the spirit of PDP architecture, since it contains modules whose representations are not fully distributed. But it is dangerous to rule out such semi-modular connectionist architectures by fiat, since the force of Cruz' conclusion will be thereby weakened. If the argument applies only to those models of the brain that disallow any connections between modules, then any evidence for modularity in FP processing (for example research suggesting autism is a dissociation between 1P and 3P processing) would undercut support for ST.

Even if we restrict attention to fully distributed connectionist models, Cruz' contention that PDP models are homogeneous can be challenged. His case for homogeneity rests on the idea that PDP networks can process similar tasks only by processing them in similar ways. Evidence from connectionist research questions this assumption. For example, Servan-Screiber et. al. (1991) show that a single network trained on a symbolic parsing task processes examples of the task which have the same syntactic structure in very different ways. Especially in networks with larger numbers of hidden units, the clean mapping between the similarities in the task and the similarities in the processing that Cruz predicts is not found.

It is ironic that Cruz should rest his case on homogeneity of PDP models, for the fact that PDP are in homogeneous plays a role in connectionist arguments against the language of thought hypothesis. Fodor's intuition (expressed as Principle P (1987, pp. 141-143)) was that structure we find in reasoning and language understanding can only be explained by corresponding structure in the brain's processing, thus establishing the language of thought. However, connectionist research discredits the idea that connectionist models must cleanly mirror similarity structures that we find in tasks in order to process those tasks (Garson, 1997, p. 349 ff.).

6. Are PDP Models for TT Ad Hoc?

In the end, Cruz admits that PDP architectures can be made compatible with TT, but he defends himself by claiming such models are ad hoc , and so fail to provide genuine explanations. A connectionist network that implements TT would require a big difference between 1P and 3P processing and evidence that the 3P processing counts as a genuine theory . PDP models can be artificially constructed that meet these criteria, but they do not count as explanations of data on folk psychological abilities, unless we have some independent evidence that those models should be employed by the brain. For example, in light of the similarities found in 1P and 3P inferences, the assumption that 1P and 3P processing operate in very different ways seems gratuitous. An explanation based on a homogeneous mechanism would seem more principled.

However, there are good reasons for expecting that processing of 1P and 3P cases should not be similar in PDP models, despite similarities in the 1P and 3P "inferential economies". During connectionist training, hidden unit representations are sensitively tuned to facilitate the processing that accomplishes the task to be learned.4 When tasks are different, very different representations and processing typically develop. What should we expect in the case of 1P and 3P processing? The answer depends on how much of the process one considers relevant. From a global point of view, 1P and 3P tasks are quite different, despite similarity in inferential relationships. 3P processing requires a complex assessment of the other's situation to adjust for relevant differences between his case and mine. (I might eat that ice cream but I know he has fairly good discipline about his diet.) Furthermore, determinations of mental states of others are used for managing social interaction, while 1P processing outputs beliefs and desires directly for ones own use. Since the processes that precede and follow 3P mental state determination are so different from those that precede and follow 1P determination, it can be expected that connectionist systems crafted to do a good job in these two global processes should use different styles of representation and processing. It is only when one arbitrarily limits consideration to the inferential parts of the two tasks that one expects processing similarities. This makes sense only if there is independent evidence for a separable connectionist module devoted only to this part of the task. Assuming the brain is not modular in this way, there is every reason for expecting large differences in 1P and 3P processing, (which may have produced selection pressures for the growth of a separate 3P module).

Even if it is granted that 1P and 3P processing can be expected to differ in PDP models, it does not follow that those models are compatible with the TT. As Cruz suggests (1998, p. 331) one still needs a principled reason for calling the network's 3P processing a theory . One answer could be that the 3P processing operates on activation vectors which amount to explicit representation of FP laws. Although this would clearly count as the implementation of a theory, one worries that the solution is ad hoc . Why should the brain go to all the trouble of explicitly representing laws in the 3P case, and not in the 1P case?5

I have already argued that TT does not require the representation of laws; if so some other reason must be given for considering 3P processing in a PDP network to be a theory. One possible strategy is to base the attribution on the fact that the 3P processing embodies general knowledge that can be applied to all people, while 1P processing is specialized for the self. (This difference might provide further reason for thinking 1P and 3P processing should be different.) One complication in the ST-TT debate is that theory theorists disagree on the requirements for the presence of a theory.6 Some may object that a theory is more than any general body of knowledge, and so reject the idea that the distinct 3P processing described above really counts as a theory. My own intuitions are that stronger criteria for being a theory are needed and that 3P processing in the human brain probably does not meet them. If I am right, then connectionist models support neither TT nor ST accounts of FP abilities. ST would require that 1P and 3P mental state determination be very similar, and we have reasons for thinking they would be quite different. TT would require that we dignify 3P processing as a theory, which we may be unable to do. So the most valuable contribution of connectionism may be that it suggests accounts of FP processing that lie outside the ST-TT debate.




Acknowledgement: I owe a deep debt of gratitude to Bob Gordon and the National Endowment for the Humanities for an excellent seminar on simulation theory, during which a predecessor of this paper was drafted. The paper could not have been written without the help of inspiring conversations with Joe Cruz, who also attended that seminar. Comments from Bob Gordon also helped improve this work.




Notes

1. See Stich and Nicholas (1992) who contend that there is room for versions of TT that are neutral on how information on laws is embodied in the brain. Consider the FP law (L).
(L) If person p desires x and p believes y results in x, then p generally does y.
Instead of explicitly representing (L), the brain might make do with a computational device which outputs the mental state with content |p generally does y|, when states with contents |p desires x| and |p believes y results in x| are both input. This mechanism would embody information found in the law in its inferential economy without explicitly representing the law.

2. We should note here that Cruz' argument that the false belief task data requires that 1P and 3p nets communicate may be questioned. ST is supposed to explain fully developed 3P processing - exactly the processing three-year-olds are bad at. If his argument supports ST at all, Cruz needs to show that mature folk psychological abilities require communication between 1P and 3P processors.

3. It is not easy to characterize exactly the architectural and processing implications of the presence of shareable representations in the brain. Some may feel that this requires anatomically separate brain modules joined by neural connections over which data is sent. However, Cruz' explanation of the data on the false belief task requires no more than a functional characterization of how representations are shared between 1P and 3P processors. Any mechanism that allows representations produced by one processor to interact with another processor in any way will do the trick.

4. The DISCERN model of language processing by Miikkulainen. (1993) is especially interesting in this regard. The networks uses a semi-modular architecture where representations are shunted between several different processors, and are changed during the learning process to facilitate all the processes which use them. If a model of this kind were applied to FP processing, one would expect that 1P and 3P representations (and processing) would be quite different, since they interact with different modules.

5. Perhaps answers can be given to this question. For example, the brain might need explicit representation to keep track of which individuals have been assigned which mental states. However, my own view is that PDP architectures can accomplish such tracking without explicit representation.

6. For example, not all theory theorists accept Stich and Nichol's claim (1992) that FP theories need not represent laws explicitly.





References

Cruz, J. (1998) "Mindreading: Mental State Ascription and Cognitive Architecture," Mind and Language, 13 , #3, 323-340.

Fodor, J. (1987), Psychosemantics , MIT Press.

Garson, J. (1997) "Syntax in a Dynamic Brain," Synthese, 110 , 343-355.

Heal, J. (1998) "Co-Cognition and Off-Line Simulation: Two Ways of Understanding the Simulation Approach," Mind and Language , 13, #4, 477-498.

Miikkulainen. T. (1993) Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon and Memory , MIT Press.

Perner, J., Leekam, S. and Wimmer, H. (1987) "Three-Year Olds' Difficulty with False Belief: The Case for a Conceptual Defecit," British Journal of Developmental Psychology , 5 , 125-137.

Servan-Schreiber, D., Cleeremans, A. and McClelland, J. (1991) "Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks", in Touretzky, D. (ed.) Connectionist Approaches to Language Learning , Kluwer, Dordrecht, 57-89.

Stich S. and Nichols, S. (1992) "Folk Psychology: Simulation or Tacit Theory?" Mind and Language 7 , 35-71.

Stich S. and Ravenscroft, I. (1996) "What is Folk Psychology," Deconstructing the Mind , Oxford University Press, New York, Ch. 3, 115-135.


Return to main Simulation Theory Seminar page