Reference: Biol. Bull. 215: 216–242. (December 2008) ©2008 Marine Biological Laboratory Consciousness as Integrated Information: a Provisional Manifesto GIULIO TONONI Department of Psychiatry, University of Wisconsin, Madison, Wisconsin Abstract. The integrated information theory (IIT) starts INTRODUCTION from phenomenology and makes use of thought experi- ments to claim that consciousness is integrated information. Everybody knows what consciousness is: it is what van- Specifically: (i) the quantity of consciousness corresponds ishes every night when we fall into dreamless sleep and to the amount of integrated information generated by a reappears when we wake up or when we dream. It is also all complex of elements; (ii) the quality of experience is spec- we are and all we have: lose consciousness and, as far as ified by the set of informational relationships generated you are concerned, your own self and the entire world within that complex. Integrated information () is defined dissolve into nothingness. as the amount of information generated by a complex of Yet almost everybody thinks that understanding con- elements, above and beyond the information generated by sciousness at the fundamental level is currently beyond the its parts. Qualia space (Q) is a space where each axis reach of science. The best we can do, it is often argued, is represents a possible state of the complex, each point is a gather more and more facts about the neural correlates of probability distribution of its states, and arrows between consciousness—those aspects of brain function that change points represent the informational relationships among its whensomeaspectsofconsciousnesschange—andhopethat elements generated by causal mechanisms (connections). one day we will come up with an explanation. Others are Together, the set of informational relationships within a more pessimistic: we may learn all about the neural corre- lates of consciousness and still not understand why certain complex constitute a shape in Q that completely and univo- physical processes seem to generate experience while others cally specifies a particular experience. Several observations do not. concerning the neural substrate of consciousness fall natu- It is not that we do not know relevant facts about con- rally into place within the IIT framework. Among them are sciousness. For example, we know that the widespread the association of consciousness with certain neural systems destruction of the cerebral cortex leaves people permanently rather than with others; the fact that neural processes un- unconscious (vegetative), whereas the complete removal of derlying consciousness can influence or be influenced by the cerebellum, even richer in neurons, hardly affects con- neural processes that remain unconscious; the reduction of sciousness. We also know that neurons in the cerebral consciousness during dreamless sleep and generalized sei- cortex remain active throughout sleep, yet at certain times zures; and the distinct role of different cortical architectures during sleep consciousness fades, while at other times we in affecting the quality of experience. Equating conscious- dream. Finally, we know that different parts of the cortex ness with integrated information carries several implications influence different qualitative aspects of consciousness: for our view of nature. damage to certain parts of the cortex can impair the expe- rience of color, whereas other lesions may interfere with the perception of shapes. In fact, increasingly refined neurosci- entific tools are uncovering increasingly precise aspects of Received 20 August 2008; accepted 10 October 2008. the neural correlates of consciousness (Koch, 2004). And * To whom correspondence should be addressed. E-mail: gtononi@ yet, when it comes to explaining why experience blossoms wisc.edu in the cortex and not in the cerebellum, why certain stages Abbreviations: , integrated information; IIT, integrated information theory; MIP, minimum information partition. of sleep are experientially underprivileged, or why some 216 This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 217 cortical areas endow our experience with colors and others cations of the theory concerning the place of experience in with sound, we are still at a loss. our view of the world. Our lack of understanding is manifested most clearly when scientists are asked questions about consciousness in APhenomenological Analysis: Consciousness as “difficult” cases. For example, is a person with akinetic Integrated Information mutism—awake with eyes open, but mute, immobile, and The integrated information theory (IIT) of consciousness nearly unresponsive—conscious or not? How much con- claims that, at the fundamental level, consciousness is inte- sciousness is there during sleepwalking or psychomotor grated information, and that its quality is given by the seizures? Are newborn babies conscious, and to what ex- informational relationships generated by a complex of ele- tent? Are animals conscious? If so, are some animals more ments (Tononi, 2004). These claims stem from realizing conscious than others? Can they feel pain? Does a bat feel that information and integration are the essential properties space the same way we do? Can bees experience colors, or of our own experience. This may not be immediately evi- merely react to them? Can a conscious artifact be con- dent, perhaps because, being endowed with consciousness structed with non-neural ingredients? I believe it is fair to most of the time, we tend to take its gifts for granted. To say that no consciousness expert, if there is such a job regain some perspective, it is useful to resort to two thought description, can be confident about the correct answer to experiments, one involving a photodiode and the other a such questions. This is a remarkable state of affairs. Just digital camera. consider comparable questions in physics: Do stars have mass? Do atoms? How many different kinds of atoms and Information: the photodiode thought experiment elementary particles are there, and of what are they made? Consider the following: You are facing a blank screen Is energy conserved? And how can it be measured? Or that is alternately on and off, and you have been instructed consider biology: What are species, and how do they to say “light” when the screen turns on and “dark” when it evolve? How are traits inherited? How do organisms de- turns off. A photodiode—a simple light-sensitive device— velop? How is energy produced from nutrients? How does has also been placed in front of the screen. It contains a echolocation work in bats? How do bees distinguish among sensor that responds to light with an increase in current and colors? And so on. Obviously, we expect satisfactory an- a detector connected to the sensor that says “light” if the swers by any competent physicist and biologist. current is above a certain threshold and “dark” otherwise. What’s the matter with consciousness, then, and how The first problem of consciousness reduces to this: when should we proceed? Early on, I came to the conclusion that you distinguish between the screen being on or off, you a genuine understanding of consciousness is possible only if have the subjective experience of seeing light or dark. The empirical studies are complemented by a theoretical analy- photodiodecanalsodistinguishbetweenthescreenbeingon sis. Indeed, neurobiological facts constitute both challeng- or off, but presumably it does not have a subjective expe- ing paradoxes and precious clues to the enigma of con- rience of light and dark. What is the key difference between sciousness. This state of affairs is not unlike the one faced you and the photodiode? by biologists when, knowing a great deal about similarities According to the IIT, the difference has to do with how and differences between species, fossil remains, and breed- much information is generated when that distinction is ing practices, they still lacked a theory of how evolution made. Information is classically defined as reduction of might occur. What was needed, then as now, were not just uncertainty: the more numerous the alternatives that are more facts, but a theoretical framework that could make ruled out, the greater the reduction of uncertainty, and thus sense of them. the greater the information. It is usually measured using the In what follows, I discuss the integrated information entropy function, which is the logarithm of the number of theory of consciousness (IIT; Tononi, 2004)—an attempt to alternatives (assuming they are equally likely). For exam- understand consciousness at the fundamental level. To ple, tossing a fair coin and obtaining heads corresponds to present the theory, I first consider phenomenological log2(2)  1 bit of information, because there are just two thought experiments indicating that subjective experience alternatives; throwing a fair die yields log (6)  2.59 bits of has to do with the generation of integrated information. 2 information, because there are six. Next, I consider how integrated information can be defined Let us now compare the photodiode with you. When the mathematically. I then show how basic facts about con- blank screen turns on, the mechanism in the photodiode tells sciousness and the brain can be accounted for in terms of the detector that the current from the sensor is above rather integrated information. Finally, I discuss how the quality of than below the threshold, so it reports “light.” In performing consciousness can be captured geometrically by the shape this discrimination between two alternatives, the detector in of informational relationships within an abstract space the photodiode generates log (2)  1 bit of information. 2 called qualia space. I conclude by examining some impli- When you see the blank screen turn on, on the other hand, This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 218 G. TONONI the situation is quite different. Though you may think you In short, the only specification a photodiode can make is are performing the same discrimination between light and whether things are this or that way: any further specification dark as the photodiode, you are in fact discriminating is impossible because it does not have mechanisms for it. among a much larger number of alternatives, thereby gen- Therefore, when the photodiode detects “light,” such “light” erating many more bits of information. cannot possibly mean what it means for us; it does not even This is easy to see. Just imagine that, instead of turning mean that it is a visual attribute. By contrast, when we see light and dark, the screen were to turn red, then green, then “light” in full consciousness, we are implicitly being much blue, and then display, one after the other, every frame from morespecific: we simultaneously specify that things are this every movie that was ever produced. The photodiode, in- way rather than that way (light as opposed to dark), that evitably, would go on signaling whether the amount of light whatever we are discriminating is not colored (in any par- for each frame is above or below its threshold: to a photo- ticular color), does not have a shape (any particular one), is diode, things can only be one of two ways, so when it visual as opposed to auditory or olfactory, sensory as op- reports “light,” it really means just “this way” versus “that posed to thought-like, and so on. To us, then, light is much way.” For you, however, a light screen is different not only more meaningful precisely because we have mechanisms fromadarkscreen, but from a multitude of other images, so that can discriminate this particular state of affairs we call when you say “light,” it really means this specific way “light” against a large number of alternatives. versus countless other ways, such as a red screen, a green According to the IIT, it is all this added meaning, pro- screen, a blue screen, this movie frame, that movie frame, vided implicitly by how we discriminate pure light from all and so on for every movie frame (not to mention for a these alternatives, that increases the level of consciousness. sound, smell, thought, or any combination of the above). This central point may be appreciated either by “subtrac- Clearly, each frame looks different to you, implying that tion” or by “addition.” By subtraction, one may realize that some mechanism in your brain must be able to tell it apart our being conscious of “light” would degrade more and from all the others. So when you say “light,” whether you more—would lose its non-coloredness, its non-shapedness, think about it or not (and you typically won’t), you have just would even lose its visualness—as its meaning is progres- made a discrimination among a very large number of alter- sively stripped down to just “one of two ways,” as with the natives, and thereby generated many bits of information. photodiode. By addition, one may realize that we can only This point is so deceivingly simple that it is useful to see “light” as we see it, as progressively more and more elaborate a bit on why, although a photodiode may be as meaning is added by specifying how it differs from count- goodasweareindetectinglight, it cannot possibly see light less alternatives. Either way, the theory says that the more the way we do—in fact, it cannot possibly “see” anything at specifically one’s mechanisms discriminate between what all. Hopefully, by realizing what the photodiode lacks, we pure light is and what it is not (the more they specify what may appreciate what allows us to consciously “see” the light means), the more one is conscious of it. light. The key is to realize how the many discriminations we Integration: the camera thought experiment cando,andthephotodiodecannot,affectthemeaningofthe discrimination at hand, the one between light and dark. For Information—the ability to discriminate among a large example, the photodiode has no mechanism to discriminate number of alternatives—may thus be essential for con- colored from achromatic light, even less to tell which par- sciousness. However, information always implies a point of ticular color the light might be. As a consequence, all light view, and we need to be careful about what that point of is the same to it, as long as it exceeds a certain threshold. So view might be. To see why, consider another thought ex- for the photodiode, “light” cannot possibly mean achro- periment, this time involving a digital camera, say one matic as opposed to colored, not to mention of which whose sensor chip is a collection of a million binary pho- particular color. Also, the photodiode has no mechanism to todiodes, each sporting a sensor and a detector. Clearly, distinguish between a homogeneous light and a bright taken as a whole, the camera’s detectors could distinguish 1,000,000 shape—any bright shape—on a darker background. So for among 2 alternative states, an immense number, the photodiode, light cannot possibly mean full field as corresponding to 1 million bits of information. Indeed, the opposed to a shape—any of countless particular shapes. camera would easily respond differently to every frame Worse, the photodiode does not even know that it is detect- from every movie that was ever produced. Yet few would ing a visual attribute (the “visualness” of light) as it has no argue that the camera is conscious. What is the key differ- mechanism to tell visual attributes, such as light or dark, ence between you and the camera? from non-visual ones, such as hot and cold, light or heavy, According to the IIT, the difference has to do with loud or soft, and so on. As far as it knows, the photodiode integrated information. From the point of view of an exter- might just as well be a thermistor—it has no way of know- nal observer, the camera may be considered as a single 1,000,000 ing whether it is sensing light versus dark or hot versus cold. system with a repertoire of 2 - states. In reality, how This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 219 ever, the chip is not an integrated entity: since its 1 million A. SENSOR DETECTOR photodiodes have no way to interact, each photodiode per- forms its own local discrimination between a low and a high 1 2 current completely independent of what every other photo- diode might be doing. In reality, the chip is just a collection of 1 million independent photodiodes, each with a repertoire B. of two states. In other words, there is no intrinsic point of P view associated with the camera chip as a whole. This is easy to see: if the sensor chip were cut into 1 million pieces 1/2 each holding its individual photodiode, the performance of the camera would not change at all. p(X0(mech, x1)) By contrast, you discriminate among a vast repertoire of 0011 states as an integrated system, one that cannot be broken 0101 down into independent components each with its own sep- arate repertoire. Phenomenologically, every experience is an integrated whole, one that means what it means by virtue P of being one, and that is experienced from a single point of view. For example, the experience of a red square cannot be decomposed into the separate experience of red and the 1/4 separate experience of a square. Similarly, experiencing the p(X (maxH)) full visual field cannot be decomposed into experiencing 1 0011 0 separately the left half and the right half: such a possibility 2 0101 does not even make sense to us, since experience is always ei(X(mech,x )) = H [p(X (mech, x )) || p(X (maxH))] = 1 bit whole. Indeed, the only way to split an experience into 1 0 1 0 independent experiences seems to be to split the brain in two, as in patients who underwent the section of the corpus Figure 1. Effective information. (A) A “photodiode” consisting of a callosum to treat severe epilepsy (Gazzaniga, 2005). Such sensor and detector unit. The photodiode’s mechanism is such that the detector patients do indeed experience the left half of the visual field unit turns on if the sensor’s current is above a threshold. Here both units are on (binary 1, indicated in gray). (B) For the entire system (sensor unit, detector independently of the right side, but then the surgery has unit) there are four possible states: (00,01,10,11). The potential distribution created two separate consciousnesses instead of one. Mech- p(X (maxH))  (1/4,1/4,1/4,1/4) is the maximum entropy distribution on the 0 anistically then, underlying the unity of experience must be four states. Given the photodiode’s mechanism and the fact that the detector is causal interactions among certain elements within the brain. on, the sensor must have been on. Thus, the photodiode’s mechanism and its This means that these elements work together as an inte- current state specifies the following distribution: two of the four possible states (00,01) are ruled out; the other two states (10,11) are equally likely since they grated system, which is why their performance, unlike that are indistinguishable to the mechanism (the prior state of the detector makes no of the camera, breaks down if they are disconnected. difference to the current state of the sensor). The actual distribution is therefore p(X - (mech, x ))  (0,0,1/2,1/2). Relative entropy (Kullback-Leibler diver 0 1 gence) between two probability distributions p and q is H[p|q]  p log p/q, AMathematical Analysis: Quantifying Integrated i 2 i i so the effective information ei(X(mech, x )) associated with output x  11 is Information 1 1 1 bit (effective information is the entropy of the actual relative to the potential This phenomenological analysis suggests that, to gener- distributions). ate consciousness, a physical system must be able to dis- criminate among a large repertoire of states (information) in Figure 1, which can be thought of as an idealized version and it must be unified; that is, it should be doing so as a of a photodiode composed of a sensor S and a detector D. single system, one that is not decomposable into a collection The system is characterized by a state it is in, which in this of causally independent parts (integration). But how can one case is 11 (first digit for the sensor, second digit for the measure integrated information? As I explain below, the detector), and by a mechanism. This is mediated by a central idea is to quantify the information generated by a connection (arrow) between the sensor and the detector that system, above and beyond the information generated inde- implements a causal interaction: in this case, the elementary pendently by its parts (Tononi, 2001, 2004; Balduzzi and mechanismofthesystemisthatthedetectorchecksthestate 1 Tononi, 2008). of the sensor and turns on if the sensor is on, and off Information otherwise (more generally, the specific causal interaction can be described by an input-output table). First, we must evaluate how much information is gener- Potentially, a system of two binary elements could be in ated by the system. Consider the system of two binary units any of four possible states (00,01,10,11) with equal proba- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 220 G. TONONI bility: p  (1/4,1/4,1/4,1/4). Formally, this potential (a an “intrinsic” property of a system. To calculate it explic- priori) repertoire is represented by the maximum entropy or itly, from an extrinsic perspective, one can perturb the uniform distribution of possible system states at time t0, system in all possible ways (i.e., try out all possible input which expresses complete uncertainty (p(X - states, corresponding to the maximum entropy distribution 0(maxH))). Con sidering the potential repertoire as the set of all possible or potential repertoire) to obtain the forward repertoire of input states, the particular mechanism X(mech) of this sys- output states given the system’s mechanism. Finally one can tem can be thought of as specifying a forward repertoire— calculate, using Bayes’ rule, the actual repertoire given the 3 the probability distribution of output states produced by the system’s state (Balduzzi and Tononi, 2008). systemwhenperturbedwithallpossibleinputstates.Butthe system is actually in a particular output state (in this case, at Integration time t1, x - 1  11). In actuality, a system with this mech anism being in state 11 specifies that the previous system Second, we must find out how much of the information state x0 must have been either 11 or 10, rather than 00 or 01, generated by a system is integrated information; that is, how corresponding to p  (0,0,1/2,1/2) (in this system, there is muchinformationisgeneratedbyasingleentity,asopposed no mechanism to specify the detector state, which remains to a collection of independent parts. The idea here is to uncertain). Formally, then, the mechanism and the state 11 consider the parts of the system independently, ask how specify an actual (a posteriori) distribution or repertoire of muchinformationtheygenerate by themselves, and compare it system states p(X (mech,x )) at time t0 that could have with the information generated by the system as a whole. 0 1 caused (led to) x at time t1, while ruling out (giving This can be done by resorting again to relative entropy to 1 probability zero to) states that could not. In this way, the measure the difference between the probability distribution system’s mechanism and state constitute information (about generated by the system as a whole (p(X (mech,x )), the 0 1 the system’s previous state), in the classic sense of reduction actual repertoire of the system x) with the probability dis- of uncertainty or ignorance. More precisely, the system’s tribution generated by the parts considered independently k mechanism and state generate 1 bit of information by dis- (p( M0(mech,1)), the product of the actual repertoire of tinguishing between things being one way (11 or 10, which the parts kM). Integrated information is indicated with the remain indistinguishable to it) rather than another way (00 symbol  (the vertical bar “I” stands for information, the or 01, which also remain indistinguishable to it). circle “O” for integration): In general, the information generated when a system characterized by a certain mechanism in a particular state Xmech,x1 can be measured by the relative entropy H between the HpX k k mech,x  p M mech,  for M MIP actual and the potential repertoires (“relative to” is indicated 0 1  0 1 0 by ), captured by the effective information (ei): That is, the actual repertoire for each part is specified by causal interactions internal to each part, considered as a eiXmech,x   HpX mech,x pX maxH system in its own right, while external inputs are treated as 1 0 1 0 a source of extrinsic noise. The comparison is made with the Relative entropy, also known as Kullback-Leibler diver- particular decomposition of the system into parts that leaves gence, is a difference between probability distributions the least information unaccounted for. This minimum infor- (Cover and Thomas, 2006): if the distributions are identical, mation partition (MIP) decomposes the system into its relative entropy is zero; the more different they are, the minimal parts. 2 higher the relative entropy. Figuratively, the system’s To see how this works, consider two of the million mechanism and state generate information by sharpening photodiodes in the digital camera (Fig. 2, left). By turning the uniform distribution into a less uniform one—this is on or off depending on its input, each photodiode generates how much uncertainty is reduced. Clearly, the amount of 1 bit of information, just as we saw before. Considered effective information generated by a system is high if it has independently, then, two photodiodes generate 2 bits of a large potential repertoire and a small actual repertoire, information, and 1 million photodiodes generate 1 million since a large number of initial states are ruled out. By bits of information. However, as shown in the figure, the contrast, the information generated is little if the system’s product of the actual distributions generated independently repertoire is small, or if many states could lead to the current by the parts is identical to the actual distribution for the outcome, since few states are ruled out. For instance, if system. Therefore, the relative entropy between the two noise dominates (any state could have led to the current distributions is zero: the system generates no integrated one), no alternatives are ruled out, and no information is information ( (X(mech,x ))  0) above and beyond what 1 generated. is generated by its parts. Since effective information is implicitly specified once a Clearly, for integrated information to be high, a system mechanismandstatearespecified,itcanbeconsideredtobe must be connected in such a way that information is gen- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 221 erated by causal interactions among rather than within its network) with functional integration (there are many path- parts. Thus, a system can generate integrated information waysforinteractions among the elements, Fig. 4A.). In very only to the extent that it cannot be decomposed into infor- rough terms, this kind of architecture is characteristic of the mationally independent parts. A simple example of such a mammalian corticothalamic system: different parts of the system is shown in Figure 2 (right). In this case, the inter- cerebral cortex are specialized for different functions, yet a action between the minimal parts of the system generates vast network of connections allows these parts to interact information above and beyond what is accounted for by the profusely. And indeed, as much neurological evidence in- parts by themselves ( (X(mech,x ))  0). dicates (Posner and Plum, 2007), the corticothalamic system 1 In short, integrated information captures the information is precisely the part of the brain that cannot be severely generated by causal interactions in the whole, over and impaired without loss of consciousness. 4 above the information generated by the parts. Conversely,  is low for systems that are made up of small, quasi-independent modules (Fig. 4B; Tononi, 2004; Complexes Balduzzi and Tononi, 2008). This may be why the cerebel- Finally, by measuring  values for all subsets of elements lum, despite its large number of neurons, does not contrib- within a system, we can determine which subsets form ute much to consciousness: its synaptic organization is such complexes. Specifically, a complex X is a set of elements that individual patches of cerebellar cortex tend to be acti- that generate integrated information (0) that is not fully vated independently of one another, with little interaction contained in some larger set of higher  (Fig. 3). A com- between distant patches (Bower, 2002). plex, then, can be properly considered to form a single Computer simulations also show that units along multi- entity having its own, intrinsic “point of view” (as opposed ple, segregated incoming or outgoing pathways are not to being treated as a single entity from an outside, extrinsic incorporated within the repertoire of the main complex (Fig. point of view). Since integrated information is generated 4C; Tononi, 2004; Balduzzi and Tononi, 2008). This may within a complex and not outside its boundaries, experience be why neural activity in afferent pathways (perhaps as far is necessarily private and related to a single point of view or as V1), though crucial for triggering this or that conscious perspective (Tononi and Edelman, 1998; Tononi, 2004). A experience, does not contribute directly to conscious expe- given physical system, such as a brain, is likely to contain rience; nor does activity in efferent pathways (perhaps start- more than one complex, many small ones with low  ing with primary motor cortex), though it is crucial for values, and perhaps a few large ones (Tononi and Edelman, reporting each different experience. 1998; Tononi, 2004). In fact, at any given time there may be The addition of many parallel cycles also generally does a single main complex of comparatively much higher  that not change the composition of the main complex, although underlies the dominant experience (a main complex is such  values can be altered (Fig. 4D). Instead, cortical and that its subsets have strictly lower ). As shown in Figure subcortical cycles or loops implement specialized subrou- 3, a main complex can be embedded into larger complexes tines that are capable of influencing the states of the main of lower . Thus, a complex can be casually connected, corticothalamic complex without joining it. Such informa- through ports-in and ports-out, to elements that are not part tionally insulated cortico-subcortical loops could constitute of it. According to the IIT, such elements can indirectly the neural substrates for many unconscious processes that influence the state of the main complex without contributing can affect and be affected by conscious experience (Baars, directly to the conscious experience it generates (Tononi 1988; Tononi, 2004), such as those that enable object rec- and Sporns, 2003). ognition, language parsing, or translating our vague inten- tions into the right words. ANeurobiological Reality Check: Accounting for At this stage, it is hard to say precisely which cortical Empirical Observations circuits may work as a large complex of high , and which instead may remain informationally insulated. Does the Can this approach account, at least in principle, for some dense mesial connectivity revealed by diffusion spectral of the basic facts about consciousness that have emerged imaging (Hagmann et al., 2008) constitute the “backbone” from decades of clinical and neurobiological observations? of a corticothalamic main complex? Do parallel loops Measuring  and finding complexes is not easy for realistic through basal ganglia implement informationally insulated systems, but it can be done for simple networks that bear subroutines? Are primary sensory cortices organized like some structural resemblance to different parts of the brain massive afferent pathways to a main complex higher up in (Tononi, 2004; Balduzzi and Tononi, 2008). the cortical hierarchy (Koch, 2004)? Is much of prefrontal For example, by using computer simulations, it is possi- cortex organized like a massive efferent pathway? Do cer- ble to show that high  requires networks that conjoin tain cortical areas, such as those belonging to the dorsal functional specialization (due to its specialized connectiv- visual stream, remain partly segregated from the main com- ity; each element has a unique functional role within the plex? Unfortunately, answering these questions and prop- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 222 G. TONONI INFORMATION GENERATED BY THE SYSTEM A P A’ P 1 1/4 1 3 actual: p(X (mech,x )) 1 3 0 1 actual: p(X (mech,x )) 0 1 2 4 2 4 1/16 1/16 potential: p(X0(maxH)) potential: p(X (maxH)) ei(X(mech,x1)) = 2 bits ei(X(mech,x )) = 4 bits 0 1 INFORMATION GENERATED BY THE PARTS B aM bM B’ aM bM P P P P 1/2 1/2 2/3 3/8 1 3 a b 1 3 a b p( M (mech,µ )) p( M (mech,µ )) p( M (mech,µ )) p( M (mech,µ )) 0 1 0 1 0 1 0 1 2 4 2 4 1/4 1/4 1/4 1/4 aM bM aM bM a b a b MIP p( M0(maxH)) p( M0(maxH)) MIP p( M (maxH)) p( M0(maxH)) 0 a b a b ei( M(mech,µ ))=1 bit ei( M(mech,µ ))=1 bit ei( M(mech,µ ))=1.1 bits ei( M(mech,µ ))=1 bit 1 1 1 1 INTEGRATED INFORMATION GENERATED BY THE SYSTEM ABOVE AND BEYOND THE PARTS C P C’ P 1 1/4 1 3 p(X (mech,x )) 1 3 0 1 p(X (mech,x )) 0 1 2 4 2 4 1/4 1/4 Πp(kM(mech,µ )) MIP Πp(kM(mech,µ )) MIP K=1,2 0 1 K=1,2 0 1 k k φ(X(mech,x ))=H[p(X (mech,x ))||Πp( M (mech,µ ))]=0 bits φ(X(mech,x ))=H[p(X (mech,x ))||Πp( M (mech,µ ))]=2 bits 1 0 1 K=1,2 0 1 1 0 1 K=1,2 0 1 Figure 2. Integrated information. Left-hand side: two photodiodes in a digital camera. (A) Information generated by the system as a whole. The system as a whole generates 2 bits of effective information by specifying that n and n must have been on. (B) Information generated by the parts. The minimum information 1 3 partition (MIP) is the decomposition of a system into (minimal) parts, that is, the decomposition that leaves the least information unaccounted for. Here the parts are two photodiodes. (C) The information generated by the system as a whole is completely accounted for by the information generated by its parts. In this case, the actual repertoire of the whole is identical to the combined actual repertoires of the parts (the product of their This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 223 erly testing the predictions of the theory requires a much The most common example of a marked change in the better understanding of cortical neuroanatomy than is cur- level of experience is the fading of consciousness that rently available. occurs during certain periods of sleep. Subjects awakened in Other simulations show that the effects of cortical dis- deep NREM (non–rapid eye movement) sleep, especially connections are readily captured in terms of integrated early in the night, often report that they were not aware of information (Tononi, 2004): a “callosal” cut produces, out themselves or of anything else, though cortical and thalamic of a large complex corresponding to the connected cortico- neurons remain active. Awakened at other times, mainly thalamic system, two separate complexes, in line with many during REM sleep or during lighter periods of NREM sleep studies of split-brain patients (Gazzaniga, 2005). However, later in the night, they report dreams characterized by vivid because there is great redundancy between the two hemi- images (Hobson et al., 2000). From the perspective of spheres, their  value is not greatly reduced compared to integrated information, a reduction of consciousness during when they form a single complex. Functional disconnec- early sleep would be consistent with the bistability of cor- tions may also lead to a restriction of the neural substrate of tical circuits during deep NREM sleep. Due to changes in consciousness, as is seen in neurological neglect phenom- intrinsic and synaptic conductances triggered by neuro- ena, in psychiatric conversion and dissociative disorders, modulatory changes (e.g., low acetylcholine), cortical neu- and possibly during dreaming and hypnosis. It is also likely rons cannot sustain firing for more than a few hundred that certain attentional phenomena may correspond to milliseconds and invariably enter a hyperpolarized down- changes in the composition of the main complex underlying state. Shortly afterward, they inevitably return to a depolar- consciousness (Koch and Tsuchiya, 2007). The attentional ized up-state (Steriade et al., 2001). Indeed, computer sim- 5 blink, where a fixed sensory input may at times make it to ulations show that values of  are low in systems with such consciousness and at times not, may also be due to changes bistable dynamics (Fig. 4F, Balduzzi and Tononi, 2008). in functional connectivity: access to the main corticotha- Consistent with these observations, studies using TMS, a lamic complex may be enabled or not based on dynamics technique for stimulating the brain non-invasively, in con- intrinsic to the complex (Dehaene et al., 2003). Similarly, junction with high-density EEG, show that early NREM binocular rivalry6 - sleep is associated either with a breakdown of the effective may be related, at least in part, to dy namic changes in the composition of the main corticotha- connectivity among cortical areas, and thereby with a loss of lamic complex caused by transient changes in functional integration (Massimini et al., 2005, 2007), or with a stereo- connectivity. Computer simulations confirm that functional typical global response suggestive of a loss of repertoire and disconnection can reduce the size of a complex and reduce thus of information (Massimini et al., 2007). Similar its capacity to integrate information (Tononi, 2004). While changes are seen in animal studies of anesthesia (Alkire et it is not easy to determine, at present, whether a particular al., 2008). group of neurons is excluded from the main complex Finally, consciousness not only requires a neural sub- because of hard-wired anatomical constraints or is tran- strate with appropriate anatomical structure and appropriate siently disconnected due to functional changes, the set of physiological parameters, it also needs time (Bachmann, elements underlying consciousness is not static, but form 2000). The theory predicts that the time requirement for the a“dynamic complex”or“dynamic core” (Tononi and generation of conscious experience in the brain emerges Edelman, 1998). directly from the time requirements for the build-up of an Computer simulations also indicate that the capacity to integrated repertoire among the elements of the corticotha- integrate information is reduced if neural activity is ex- lamic main complex so that discriminations can be highly tremely high and near-synchronous, due to a dramatic de- informative (Tononi, 2004; Balduzzi and Tononi, unpubl.). crease in the repertoire of discriminable states (Fig. 4E; To give an obvious example, if one were to perturb half of Balduzzi and Tononi, 2008). This reduction in degrees of the elements of the main complex for less than a millisec- freedom could be the reason that consciousness is reduced ond, no perturbations would produce any effect on the other or eliminated in absence seizure (petit mal) and other con- half within this time window, and  would be zero. After, ditions during which neural activity is both high and syn- say, 100 ms, however, there is enough time for differential chronous (Blumenfeld and Taylor, 2003). effects to be manifested, and  should grow. respective probability distributions), so that relative entropy is zero. The system generates no information above and beyond the parts, so it cannotbe considered a single entity. Right-hand side: an integrated system. Elements in the system are on if they receive two or more spikes. The system is in state x 1000.(A )Themechanismspecifiesauniquepriorstatethatcancausestatex ,sothesystemgenerates4bitsofeffectiveinformation.Allotherinitial 1 1 states are ruled out, since they cause different outputs. (B ) Effective information generated by the two minimal parts, considered as systems in their own right. External inputs are treated as extrinsic noise. (C ) Integrated information is information generated by the whole (black arrows) over and above the parts (gray arrows). In this case, the actual repertoire of the whole is different from the combined actual repertoires of the parts, and the relative entropy is 2 bits. The system generates information above and beyond the parts, so it can be considered a single entity (a complex). This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 224 G. TONONI only tell that things are one way rather than another way. On =1)1b the other hand, when we see “light,” we discriminate against Φ (b) = 2 φ( many more states of affairs, and thus generate much more information. In fact, I argued that “light” means what it =2 meansandbecomesconscious“light” by virtue of being not )1s Φ (s) = 1 φ( just the opposite of dark, but also different from any color, any shape, any combination of colors and shapes, any frame of every possible movie, any sound, smell, thought, and so on. What needs to be emphasized at this point is that dis- criminating “light” against all these alternatives implies not just picking one thing out of “everything else” (an undif- =2 ferentiated bunch), but distinguishing at once, in a specific )1x Φ (xx)= 1 φ( 1 way, between each and every alternative. Consider a very =3)1a Φ (a)= 3 simple example: a binary counter capable of discriminating φ( among the four numbers: 00, 01, 10, 11. When the counter says binary “3,” it is not just discriminating 11 from every- Figure 3. Complexes. In this system, the mechanism is that elements thing else as an undifferentiated bunch, otherwise it would fire in response to an odd number of spikes on their afferent connections not be a counter, but a 11 detector. To be a counter, the (links without arrows are bidirectional connections). Analyzing the system system must be able to tell 11 apart from 00 as well as from in terms of integrated information shows that the system constitutes a 10 as well as from 01 in different, specific ways. It does so, complex (x, light gray) that contains three smaller complexes (s,a,b, in different shades of gray). Observe that (i) complexes can overlap; (ii) a of course, by making choices through its mechanisms; for complex can interact causally with elements not part of it; (iii) groups of example: is this the first or the second digit? Is ita0ora1? elements with identical architectures (a and b) generate different amounts Each mechanism adds its specific contribution to the dis- of integrated information, depending on their ports-in and ports-out. crimination they perform together. Similarly, when we see light, mechanisms in our brain are not just specifying “light” The Quality of Consciousness: Characterizing with respect to a bunch of undifferentiated alternatives. Informational Relationships Rather, these mechanisms are specifying that light is what it is by virtue of being different, in this and that specific way, If the amount of integrated information generated by fromeveryotheralternative—fromdarktoanycolor,toany different brain structures (or by the same structure function- shape, movie frame, sound or smell, and so on. ing in different ways) can in principle account for changes In short, generating a large amount of integrated infor- in the level of consciousness, what is responsible for the mation entails having a highly structured set of mechanisms quality of each particular experience? What determines that that allow us to make many nested discriminations (choices) colors look the way they do and are different from the way as a single entity. According to the IIT, these mechanisms music sounds? Once again, empirical evidence indicates working together generate integrated information by speci- that different qualities of consciousness must be contributed fying a set of informational relationships that completely by different cortical areas. Thus, damage to certain parts of and univocally determine the quality of experience. the cerebral cortex forever eliminates our ability to experi- ence color (whether perceived, imagined, remembered, or Experience as a shape in qualia space dreamt), whereas damage to other parts selectively elimi- nates our ability to experience visual shapes. There is ob- To see how this intuition can be given a mathematical viously something about different parts of the cortex that formulation, let us consider again a complex of n binary can account for their different contribution to the quality of elements X(mech,x ) having a particular mechanism and 1 experience. What is this something? being in a particular state. The mechanism of the system is conn The IIT claims that, just as the quantity of consciousness implemented by a set of connections X - among its ele generated by a complex of elements is determined by the ments. Let us now suppose that each possible state of the amount of integrated information it generates above and system constitutes an axis or dimension of a qualia space n beyond its parts, the quality of consciousness is determined (Q) having 2 dimensions. Each axis is labeled with the by the set of all the informational relationships its mecha- probability p for that state, going from 0 to 1, so that a nisms generate. That is, how integrated information is gen- repertoire (i.e., a probability distribution on the possible erated within a complex determines not only the amount of states of the complex) corresponds to a point in Q (Fig. 5). consciousness it has, but also what kind of consciousness. Let us now examine how the connections among the Consider again the photodiode thought experiment. As I elements of the complex specify probability distributions; discussed before, when the photodiode reacts to light, it can that is, how a set of mechanisms specifies a set of informa- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 225 INTEGRATED INFORMATION & NEUROANATOMY A CORTICOTHALAMIC SYSTEM B CEREBELLAR SYSTEM f = 1.8 f = 1.3 Φ = 4 Φ = 1.8 Φ = .4 AFFERENT PATHWAYS CORTICAL-SUBCORTICAL LOOPS CD Φ= 3.6 Φ= 3.6 Φ= 1 Φ= 1.9 INTEGRATED INFORMATION & NEUROPHYSIOLOGY STEMS SLEEPING SYSTEM BALANCED & EPILEPTIC SY , OSE T EF OMA C 100 % active 2 Φ Φ= 0 Φ= 3.7 4 Φ Φ3 50 1 x % activity Ma2 1 0 0 00 2 4 6 8 Elements firing 0204060 Φ= .17 time (ticks) Figure 4. Relating integrated information to neuroanatomy and neurophysiology. Elements fire in response to two or more spikes (except elements targeted by a single connection, which copy their input); links without arrows are bidirectional. (A) Computing  in simple models of neuroanatomy suggests that a functionally integrated and functionally specialized network—like the corticothalamic system—is well suited to generating high values of . (B, C, D) Architectures modeled on the cerebellum, afferent pathways, and cortical-subcortical loops give rise to complexes containing more elements, but with reduced  compared to the main corticothalamic complex. (E)  peaks in balanced states; if too many or too few elements are active,  collapses. (F) In a bistable (“sleeping”) system (same as in (E)),  collapses when the number of firing elements (dotted line) is too high (high % activity), remains low during the “DOWN” state (zero % activity), and only recovers at the onset of the next “UP” state. This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 226 G. TONONI A B 1 3 1 3 1 3 2 4 2 4 1111 110 1 1101 00 1 1 bit 2 4 11 101 0 2 bits 101 1001 00 10 1 3 MIP 0111 1 3 .5 bits 0110 1 3 2 4 0101 r = c 2.1 bits 2 4 2 4 43 100 0 0 1/16 0011 1 3 1.1 bits 0010 2 4 01 00 1 3 0000 .45 bits 1 3 1 0000000011111111 0 1/16 1/2 1 2 4 1 3 2 4 2 00 01 1111 0 1110000 .18 bits 1.4 bits 3 00 10 0011 1 0110011 2 4 4 01 10 0101 0 1010101 1 3 1 3 2 4 2 4 Figure 5. Qualia. (A) The system in the inset is the same as in Fig. 2A . Qualia (Q)-space for a system of four units is 16-dimensional (one axis per possible state; since axes are displayed flattened onto the page, and points and arrows cannot be properly drawn in 2-dimensions, their position and direction is for illustration only). In state x1  1000, the complex generates a quale or shape in Q, as follows. The maximum entropy distribution (the “bottom” of the quale, indicated by a black square) is a point assigning equal probability (p  1/16  0.0625) to all 16 system states, close to the origin of the 16-dimensional space. Engaging a single connection “r” between elements 4 and 3 (c ) specifies that, since element n has not fired, the probability of element n 43 3 4 having fired in the previous time step is reduced to p  0.25 compared to its maximum entropy value (p  0.5), while the probability of n4 not having fired is increased to p  0.75. The actual probability distribution of the 16 system states is modified accordingly. Thus, the connection r “sharpens” the maximum entropy distribution into an actual distribution, which is another point in Q. The q-arrow linking the two distributions geometrically realizes the informational relationship specified by the connection. The length (divergence) of the q-arrow expresses how much the connection specifies the distribution (the effective information it generates or relative entropy between the two distributions); the direction in Q expresses the particular way in which the connection specifies the distribution. (B) Engaging more connections further sharpens the actual repertoire, specifying new points in Q and the corresponding q-arrows. The figure shows 16 out of the 399 points in the quale, generated by combinations of the four sets of connections. The probability distributions depicted around the quale are representative of the repertoires generated by two q-edges formed by q-arrows that engage the four sets of connections in two different orders (the two representative q-edges start at bottom left—one goes clockwise, the other counter-clockwise; black connections represent those whose contribution is being evaluated; gray con- nections those whose contribution has already been considered and which provides the context on top of which the q-arrow generated by a black connection begins). Repertoires corresponding to certain points of the quale are shownalongside, as in previous figures. Effective information values (in bits) of the q-arrows in the two q-edges are shown alongside. Together, the q-edges enclose a shape, the quale, which completely specifies the quality of the experience. tional relationships. First, consider the complex with all photodiode, the mechanism implemented by that connection connections among its elements disengaged, thus discount- and the state the system is in rule out states that could not ing any causal interactions (Fig. 5A). In the absence of a have caused x1 and increases the actual probability of states mechanism, the state x provides no information about the that could have caused x , yielding an actual repertoire. In 1 1 system’s previous state: from the perspective of a system Q, the actual repertoire specified by this connection corre- without causal interactions, all previous states are equally sponds to a point projecting onto higher p values on some likely, corresponding to the maximum entropy or uniform axes and onto lower p values (or zero) on other axes. Thus, distribution (the potential repertoire). In Q, this probability the connection shapes the uniform distribution into a more n distribution is a point projecting onto all axes at p  1/2 specific distribution, and thereby generates information (re- (probabilities must sum to 1). duces uncertainty). More generally, we can say that the Next, consider engaging a single connection (Fig. 5A, the connection specifies an informational relationship, that is, a other connections are treated as extrinsic noise). As with the relationship between two probability distributions. This in- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 227 formational relationship can be represented as an arrow in Q considering the effects of an additional connection (how it (q-arrow) that goes from the point corresponding to the further sharpens the actual repertoire) can change in both n) maximum entropy distribution (p  1/2 - magnitude and direction depending on the context in which to the point cor responding to the actual repertoire specified by that connec- it is considered. In Figure 6, when considered in isolation tion. The length (divergence) of the q-arrow expresses how (null context), the connection “r” between elements 4 and 3 muchthe connection specifies the distribution (the effective generates a short q-arrow (0.18 bits) pointing in a certain information it generates, i.e., the relative entropy between direction. When considered in the full context provided by the two distributions); the direction in Q expresses the all other connections (not-r or ¬r), the same connection “r” particular way in which the connection specifies the distri- generates a longer q-arrow (1 bit) pointing in a different bution, i.e., a change in position in Q. Similarly, if one direction. considers all other connections taken in isolation, each will Another property is how removing or adding a set of specify another q-arrow of a certain length, pointing in a connections folds or unfolds a quale. The portion of the different direction. quale that is generated by a set of connections r (acting in all Next, consider all possible combinations of connections contexts) is called a q-fold. If we remove connection r from (Fig. 5B). For instance, consider adding the contribution of the system, all the q-arrows generated by that connection, in the second connection to that of the first. Together, the first all possible contexts, vanish, so the shape of the quale and second connections specify another actual repertoire— “folds” along the q-fold specified by that connection. Con- another point in Q-space—and thereby generate more in- versely, when the connection is added to a system, the shape formation than either connection alone as they shape the of the quale unfolds. uniform distribution into a more specific distribution. To the Another important property of q-arrows is entanglement tip of the q-arrow specified by the first connection, one can (, Balduzzi and Tononi, unpubl.). A q-arrow is entangled now add a q-arrow bent in the direction contributed by the (  0) if the underlying connections considered together second connection, forming an “edge” of two q-arrows in generate information above and beyond the information Q-space (the same final point is reached by adding the they generate separately (note the analogy with ). Thus, q-arrow due to the first connection on top of the q-arrow entanglement characterizes informational relationships (q- specified by the second one). Each combination of connec- arrows) that are more than the sum of their component tion therefore specifies a q-edge made of concatenated q- relationships (component q-arrows, Fig. 6B), just like  arrows (component q-arrows). In general, the more connec- characterizes systems that are more than the sum of their tions one considers together, the more the actual repertoire parts. Geometrically, entanglement “warps” the shape of the will take shape and differ from the uniform (potential) quale away from a simple hypercube (where q-arrows are distribution. orthogonal to each other). Entanglement has several rele- Finally, consider the joint contribution of all connections vant consequences (Balduzzi and Tononi, unpubl.). For of the complex (Fig. 5B). As was discussed above, all example, an entangled q-arrow can be said to specify a connections together specify the actual repertoire of the concept, in that it groups together certain states of affairs in whole. This is the point where all q-edges converge. To- a way that cannot be decomposed into the mere sum of gether, these q-edges in Q delimit a quale, that is, a shape simpler groupings (see also Feldman, 2003). Moreover, just n in Q, a kind of 2 -dimensional solid (technically, in more as  can be used to identify complexes, entanglement  can than three dimensions, the “body” of a polytope). The be used to identify modes. By analogy with complexes, bottom of the quale is the maximum entropy distribution, its modes are sets of q-arrows that are more densely entangled edges are q-edges made of concatenated q-arrows, and its than surrounding q-arrows: they can be considered as clus- top is the actual repertoire of the complex as a whole. The ters of informational relationships constituting distinctive shape of this solid (polytope) is specified by all informa- “sub-shapes” in Q (see Fig. 8). By analogy with a main tional relationships that are generated within the complex by complex, an elementary mode is such that its component the interactions among its elements (the effective informa- q-arrows have strictly lower . As will be briefly discussed 7 tion matrix; Tononi, 2004). Note that the same complex of below, modes play an important role in understanding the elements, endowed with the same mechanism, will typically structure of experience. generate a different quale or shape in Q depending on the particular state it is in. Some properties of qualia space It is worth considering briefly a few relevant properties of informational relationships or q-arrows. First, informational What is the relevance of these constructs to understand- relationships are context-dependent (Fig. 6), in the follow- ing the quality of consciousness? It is not easy to become ing sense. A context can be any point in Q corresponding to familiar with a complicated multidimensional space nearly the actual repertoire generated by a particular subset of impossible to draw, so it may be useful to resort to some connections. It can be shown that the q-arrow generated by metaphors. I have argued that the set of informational rela- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 228 G. TONONI A r ¬ r 1 bit 1 3 1 3 r 2 4 .18 bits 2 4 ¬ NULL CONTEXT r FULL CONTEXT B entanglement γ = .42 bits r 1 3 1 3 2 4 2 4 ¬r Figure 6. Context and entanglement. (A) Context. The same connection (black arrow between elements 3 and 4) considered in two contexts. At the bottom of the quale (null context, corresponding to the maximum entropy distribution when no other connections are engaged), the connection r generates a q-arrow (called down-set of r, or 2r) corresponding to 0.18 bits of information pointing up-left in Q. Near the top of the quale (full context, corresponding to the actual distribution specified by all other connections except for r, indicated as ¬r), r generates a q-arrow (called up-set of non-red, or 1 ¬r) corresponding to 1 bit of information pointing up-right in Q. (B) Entanglement. Left: the q-arrow generated by the connection “r” and the q-arrow generated by the complementary connections “¬ r” at the bottom of the quale (null context). Right: The product of the two q-arrows (corresponding to independence between the informational relationships specified by the two sets of connections) would be a point corresponding to the vertex of the dotted parallelogram opposite to the bottom. However, “r” and “¬r” jointly specify the actual distribution corresponding to the top of the quale (black triangle). The distance between the probability distribution in Q specified jointly by two sets of connections and their product distribution (zigzag arrow) is the entanglement between the two corresponding q-arrows (how much the composite q-arrow specifies above and beyond its component q-arrows). tionships in Q generated by the mechanisms of a complex in and differences can in principle be quantified as similarities a given state (q-arrows between repertoires) specify a shape and differences between shapes. The set of all shapes gen- in Q (a quale). Perhaps the most important notion emerging erated by the same system in different states provides a 9 from this approach is that an experience is a shape in Q. geometrical depiction of all its possible experiences. According to the IIT, this shape completely and univo- Note that a quale can only be specified by a mechanism 8 cally specifies the quality of experience. and a particular state—it does not make sense to ask about It follows that different experiences are, literally, differ- the quale generated by a mechanism in isolation, or by a ent shapes in Q. For example, when the same system is in a state (firing pattern) in isolation. A consequence is that two different state (firing pattern), it will typically generate a different systems in the same state can generate two differ- different shape or quale (even for the same value of ). ent experiences (i.e., two different shapes). As an extreme Importantly, if an element turns on, it generates information example, a system that was to copy one by one the state of and meaning not by signifying something (say “red”), the neurons in a human brain, but had no internal connec- which in isolation it cannot, but by changing the shape of tions of its own, would generate no consciousness and no the quale. Moreover, experiences are similar if their shape is quale (Tononi, 2004; Balduzzi and Tononi, 2008). similar, and different to the extent that their shapes are By the same token, it is possible that two different sys- different. This means that phenomenological similarities tems generate the same experience (i.e., the same shape). This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 229 For example, consider again the photodiode, whose mech- tion (MIP) is just another point in Q: the one specified by anism determines that if the current in the sensor exceeds a the connections within the minimal parts only, leaving out threshold, the detector turns on. This simple causal interac- the contribution of the connections among the parts. This tion is all there is, and when the photodiode turns on it point is the actual repertoire corresponding to the product of merely specifies an actual repertoire where states the actual repertoires of the parts taken independently.  (00,01,10,11) have, respectively, probability (0,0,1/2,1/2). corresponds then to an arrow linking this point to the top of This corresponds in Q to a single q-arrow, one bit long, the solid. In this view, the q-edges leading to the minimum going from the potential, maximum entropy repertoire (1/ information bipartition provide the natural “base” upon 4,1/4,1/4,1/4) to (0,0,1/2,1/2). Now imagine the light sensor which the solid rests—the informational relationships gen- is substituted by a temperature sensor with the same thresh- erated within the parts upon which are built the informa- old and dynamic range—we have a thermistor rather than a tional relationships among the parts. The -arrow can then photodiode. Although the physical device has changed, be thought of as the height of the solid—or rather, to according to the IIT the experience, minimal as it is, has to employ a metaphor, as the highest pole holding up a tent. be the same, since the informational relationship that is For example, if  is zero (say a system decomposes into generated by the two devices is identical. Similarly, an two independent complexes as in Fig. 7B), the tent corre- AND gate when silent and an OR gate when firing also sponding to the system is flat—it has no shape—since the generate the same shape in Q, and therefore must generate actual repertoire of the system collapses onto its base (MIP). the same minimal experience (it can be shown that the two This is precisely what it means when 0. Conversely, shapes are isomorphic, that is, have the same symmetries; the higher the  value of a complex (the higher the tent or Balduzzi and Tononi, unpubl.). In other words, different solid), the more “breathing room” there is for the various “physical” systems (possibly in different states) generate the informational relationships within the complex (the edges of same experience if the shape of the informational relation- the solid or the seams of the tent) to express themselves. ships they specify is the same. On the other hand, more In summary, and not very rigorously, the generation of an complex networks of causal interactions are likely to create experience can be thought of as the erection of a tent with highly idiosyncratic shapes, so systems of high  are un- a very complex structure: the edges are the tension lines likely to generate exactly identical experiences. generated by each subset of connections (the respective If experience is integrated information, it follows that q-arrow or informational relationship). The tent literally only the informational relationships within a complex (those takes shape when the connections are engaged and specify that give the quale its shape) contribute to experience. actual repertoires. Perhaps an even more daring metaphor Conversely, the informational relationships that exist out- would be the following: whenever the mechanisms of a side the main complex—for example, those involving sen- complex unfold and specify informational relationships, the sory afferents or cortico-subcortical loops implementing flower of experience blooms. informationally insulated subroutines—do not make it into the quale, and therefore do not contribute either to the From phenomenology to geometry quantity or to the quality of consciousness. Note also that informational relationships, and thus the The notions just sketched aim at providing a framework shape of the quale, are specified both by the elements that for translating the seemingly ineffable qualitative properties are firing and by those that are not. This is natural consid- of phenomenology into the language of mathematics, spe- ering that an element that does not fire will typically rule out cifically, the language of informational relationships (q- someprevious states of affairs (those that would have made arrows) in Q. Ideally, when sufficiently developed, such it fire), and thereby it will contribute to specifying the actual language should permit the geometric characterization of repertoire. Indeed, many silent elements can rule out, in phenomenological properties generated by the human brain. combination, a vast number of previous states and thus be In principle, it should also allow us to characterize the highly informative. From a neurophysiological point of phenomenology of other systems. After all, in this frame- view, such a corollary may lead to counterintuitive predic- work the experience of a bat echo-locating in a cave is just tions. For example, take elements (neurons) within the main another shape in Q and, at least in principle, shapes can be complex that happen to be silent when one is having a compared objectively. particular experience. If one were to temporarily disable At present, due to the combinatorial problems posed by these neurons (e.g., make them incapable of firing), the deriving the shape of the quale produced by systems of just prediction is that, though the system state (firing pattern) a few elements, and to the additional difficulties posed by would remain the same, the quantity and quality of experience representing such high-dimensional objects, the best one would change (Tononi, 2004; Balduzzi and Tononi, 2008). can hope for is to show that the language of Q can capture, It is important to see what  corresponds to in this in principle, some of the basic distinctions that can be made representation (Fig. 7A). The minimum information parti- in our own phenomenology, as well as some key neuropsy- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 230 G. TONONI A B 1111 1110 1101 1100 1011 1010 1001 0 φ = 2 bits 1 3 100 MIP 0111 0110 0101 2 4 0100 0011 0010 MIP 0001 0000 0 1/16 1/2 1 C D 1111 1110 1000 1100 1001 MIP 1101 {c ,c }1011 12 34 1010 1 3 0111 {c } 0110 COPYCOPY 12 {c } 0101 34 2 4 0100 0011 { } 0010 0001 0000 Figure 7. The tent analogy. (A) The system of Fig. 2A / Fig. 5. (B) The q-edges converging on the minimuminformation partition of the system (MIP) form the natural base on which the complex rests, depicted as a “tent.” The informational relationships among the parts are built on top of the informational relationships generated independently within the minimal parts. From this perspective the  q-arrow (in black) is simply the tent pole holding the quale up above its base; the length (divergence) of the pole expresses the breathing room in the system. The thick gray q-arrow represents the information generated by the entire system. (C) The system of Fig. 2A. The quale (not) generated by the two photodiodes considered as a single system. As shown in Fig. 2A, the system reduces to two independent parts, so it does not exist as a single entity. (D) Note that in this case the quale reduces to the MIP: the “tent” collapses onto its base, so there is no breathing room for informational relationships within the system. The quale generated by each part considered in isolation does exist, corre- sponding to an identical q-arrow for each couple. chological observations (Balduzzi and Tononi, unpubl.). A shapes, yet they are all part of the same landmass, just as short list includes the following: modalities are parts of the same consciousness. Moreover, (i) Experience is divided into modalities, like the classic within each continent there are peninsulas (sub-sub-shapes), senses of sight, hearing, touch, smell, and taste (and several like Italy in Europe, just as there are submodalities within others), as well as submodalities, like visual color and visual modalities. shape. What do these broad distinctions correspond to in Q? (ii) Some experiences appear to be “elementary,” in that According to the IIT, modalities are sets of densely entan- they cannot be further decomposed. A typical example is gled q-arrows (modes) that form distinct sub-shapes in the what philosophers call a “quale” in the narrow sense—say a quale; submodalities are subsets of even more densely en- pure color like red, or a pain, or an itch: it is difficult, if not tangled q-arrows (sub-modes) within a larger mode, thus impossible, to identify any further phenomenological struc- forming distinct sub-sub-shapes (Fig. 8). As a two-dimen- ture within the experience of red. According to the IIT, such sional analog, imagine a given multimodal experience as the elementary experiences correspond to sub-modes that do shape of the three-continent complex constituted by Europe, not contain any more densely entangled sub-sub-modes Asia, and Africa. The three continents are distinct sub- (elementary modes, Fig. 8). This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 231 to-articulate phenomenological differences correspond to n different basic sub-shapes in Q, such as 2 -dimensional grid-like structures and pyramid-like structures, which emerge naturally from the underlying neuroanatomy. (vi) Some experiences are more alike than others. Blue is certainly different from red (and irreducible to red), but Red clearly it seems even more different from middle C on the oboe. In the IIT framework, in Q colors correspond to different sub-shapes of the same kind (say pyramids point- Color ing in different directions) and sounds to very different Form sub-shapes (say tetrahedra). In principle, such subjective similarities and differences can be investigated by employ- ing objective measures of similarity between shapes (e.g., Sound Sight considering the number and kinds of symmetries involved in specifying shapes that are generated in Q by different Quale neuroanatomical circuits). (vii) Experiences can be refined through learning and Figure 8. Modes. Schematic depiction of modes and sub-modes. A changes in connectivity. Suppose one learns to distinguish mode, indicated by a polygon within the quale (light gray with black wine from water, then red wines from whites, then different border), is a set of q-arrows that are more densely entangled than surround- ing q-arrows, and can be considered as clusters of informational relation- varietals. Presumably, underlying this phenomenological ships constituting distinctive “sub-shapes” in Q. Two different modes refinement is a neurobiological refinement: neurons that could correspond, for example, to the modalities of sight and sound. A initially were connected indiscriminately to the same affer- sub-mode within a mode is a set of q-arrows that is even more densely ents become more specialized and split into sub-groups with entangled (a sub-sub-shape in Q). Color and form could correspond to two partially segregated afferents. This process has a straight- sub-modes within the visual mode. The thin black polygon represents an elementary mode, which does not contain more densely entangled q-arrows. forward equivalent in Q: the single q-arrow generated ini- Elementary modes could correspond to experiential qualities that cannot be tially by those afferents splits into two or more q-arrows further decomposed, such as the color “red” (qualia in the narrow sense.) pointing in different directions, and the overall sub-shape of the quale is correspondingly refined. (iii) Some experiences are homogeneous and others are (viii) Qualia in the narrow sense (elementary modes) composite: for example, a full-field experience of blue, as exist “at the top of experience” and not at its bottom. when watching a cloudless sky, compared to that of a busy Consider the experience of seeing a pure color, such as red. market street. In Q, homogeneous experiences translate to a Theevidencesuggests that the “neural correlate” (Crick and single homogeneous shape, and composite ones into a com- Koch, 2003) of color, including red, is probably a set of posite shape with many distinguishable sub-shapes (modes neurons and connections in the fusiform gyrus, maybe in and sub-modes). area V8 (ideally, neurons in this area are activated whenever (iv) Some experiences are hierarchically organized. Take a subject sees red and not otherwise, if stimulated trigger the seeing a face: we see at once that as a whole it is some- experience of red, and if lesioned abolish the capacity to see body’s face, but we also see that it has parts such as hair, red). Certain achromatopsic subjects with dysfunctions in eyes, nose, and mouth, and that those are made in turn of this general area seem to lack the feeling of what it is like specifically oriented segments. The subjective experience is to see color, its “coloredness,” including the “redness” of constructed from informational relationships (q-arrows) that red. They cannot experience, imagine, remember, or even are entangled (not reducible to a product of independent dream of color, though they may talk about it, just as we components) across hierarchical levels. For example, infor- could talk about echolocation, from a third-person perspec- mational relationships constituting “face” would be more tive (van Zandvoort et al., 2007). Contrast such subjects, densely tangled than unnatural combinations such as seen in who are otherwise perfectly conscious, with vegetative pa- certain Cubist paintings. The sub-shape of the quale corre- tients, who are for all intents and purposes unconscious. sponding to the experience of seeing a face is then an Some of these patients may show behavioral and neuro- overlapping hierarchy of tangled q-arrows, embodying re- physiological evidence for residual function in an isolated lationships within and across levels. brain area (Posner and Plum, 2007). Yet it seems highly (v) We recognize intuitively that the way we perceive unlikely that a vegetative patient with residual activity ex- taste, smell, and maybe color, is organized phenomenolog- clusively in V8 should enjoy the vivid perceptions of color ically in a “categorical” manner, quite different from, say, just as we do, while being otherwise unconscious. the “topographical” manner in which we perceive space in The IIT provides a straightforward account for this dif- vision, audition, or touch. According to the IIT, these hard- ference. To see how, consider again Figure 6A: call “r” the This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 232 G. TONONI connections targeting the “red” neurons in V8 that confer work can be extended to begin translating phenomenology them their selectivity, and non-r (¬r) all the other connec- into the language of mathematics. tions within the main corticothalamic complex. Adding r in At present, the very notion of a theoretical approach to isolation at the bottom of Q (null context) yields a small consciousness may appear far-fetched, yet the nature of the q-arrow (called the down-set of red or 2r) that points in a problems posed by a science of consciousness requires a direction representing how r by itself shapes the maximum combination of experiment and theory: one could say that entropy distribution into an actual repertoire. Schematically, theories without experiments are lame, but experiments this situation resembles that of a vegetative patient with V8 without theories are blind. For instance, only a theoretical and its afferents intact but the rest of the corticothalamic framework can go beyond a provisional list of candidate system destroyed. The shape of the experience or quale mechanisms or brain areas and provide a principled expla- reduces to this q-arrow, so its quantity is minimal ( for this nation of why they may be relevant. Also, only a theory can q-arrow is obviously low) and its quality minimally speci- account, in a coherent manner, for key but puzzling facts fied: as we have seen with the photodiode, r by itself cannot aboutconsciousnessandthebrain,suchastheassociationof specify whether the experience is a color rather than some- consciousness with the corticothalamic but not the cerebel- thing else such as a shape, whether it is visual or not, lar system, the “unconscious” functioning of many cortico- sensory or not, and so on. subcortical circuits, or the fading of consciousness during Bycontrast, subtract r from the set of all connections, so certain stages of sleep or epilepsy. one is left with ¬r. This “lesion” collapses the q-fold spec- A theory should also generate relevant corollaries. For ified by r in all contexts, including the q-arrow, called the example, the IIT predicts that consciousness depends exclu- up-set of non-red (1¬r), which starts from the full context sively on the ability of a system to generate integrated provided by all other connections ¬r and reaches the top of information: whether or not the system is interacting with the quale.10 This q-arrow will typically be much longer and the environment on the sensory and motor side, it deploys point in a different direction than the q-arrow generated by language, capacity for reflection, attention, episodic mem- r at the bottom of the quale. This is because, the fuller the ory, a sense of space, of the body, and of the self. These are context, the more r can shape the actual repertoire. Sche- obviously important functions of complex brains and help matically, removing r from the top resembles the situation shape its connectivity. Nevertheless, contrary to some com- of an achromatopsic patient with a selective lesion of V8: mon intuitions, but consistent with the overall neurological the bulk of the experience or quale remains intact ( re- evidence, none of these functions seems absolutely neces- mains high), but a noticeable feature of its shape collapses sary for the generation of consciousness “here and now” (the upset of non-red). According to the IIT, the feature of (Tononi and Laureys, 2008). the shape of the quale specified by “the upset of non-red” Finally, a theory should be able to help in “difficult” cases 11 captures the very quality or “redness” of red. that challenge our intuition or our standard ways to assess It is worth remarking that the last example also shows consciousness. For instance, the IIT says that the presence why specific qualities of consciousness, such as the “red- and extent of consciousness can be determined, in principle, ness” of red, while generated by a local mechanism, cannot also in cases in which we have no verbal report, such as be reduced to it. If an achromatopsic subject without the r infants or animals, or in neurological conditions such as connections lacks precisely the “redness” of red, whereas a minimally conscious states, akinetic mutism, psychomotor vegetative patient with just the r connections is essentially seizures, and sleepwalking. In practice, of course, measur- unconscious, then the redness of red cannot map directly to ing  accurately in such systems will not be easy, but the mechanism implemented by the r connections. How- approximations and informed estimates are certainly con- ever, the redness of red can map nicely onto the informa- ceivable. Whether these and other predictions turn out to be tional relationships specified by r, as these change dramat- compatible with future clinical and experimental evidence, ically between the null context (vegetative patient) and the a coherent theoretical framework should at least help to full context (achromatopsic subject). systematize a number of neuropsychological and neurobio- logical results that might otherwise seem disparate (Albus et AProvisional Manifesto al., 2007). In the remaining part of this article, I briefly consider To recapitulate, the IIT claims that the quantity of con- some implications of the IIT for the place of experience in sciousness is given by the integrated information () gen- our view of the world. erated by a complex of interacting elements, and its quality by the shape in Q specified by their informational relation- Consciousness as a fundamental property ships. As I have tried to indicate here, this theoretical framework can account for basic neurobiological and neu- According to the IIT, consciousness is one and the same ropsychological observations. Moreover, the same frame- thing as integrated information. This identity, which is This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 233 predicated on the phenomenological thought experiments at Consciousness as an intrinsic property the origin of the IIT, has ontological consequences. Con- sciousness exists beyond any doubt (indeed, it is the only Consciousness, as a fundamental property, is also an thing whose existence is beyond doubt). If consciousness is intrinsic property. This simply means that a complex integrated information, then integrated information exists. generating integrated information is conscious in a cer- Moreover, according to the IIT, it exists as a fundamental tain way regardless of any extrinsic perspective. This quantity—as fundamental as mass, charge, or energy. As point is especially relevant if we consider how difficult it long as there is a functional mechanism in a certain state, it is to measure the quantity of integrated information, not must exist ipso facto as integrated information; specifically, to mention the shape of a quale, for any realistic system. it exists as an experience of a certain quality (the shape of If we want to know what are the borders of a certain 12 the quale it generates) and quantity (its “height” ). complex, the amount of integrated information it gener- If one accepts these premises, a useful way of thinking ates, the set of informational relationships it specifies, about consciousness as a fundamental property is as fol- and the spatio-temporal grain at which  is highest (see lows. We are by now used to considering the universe as a below), we need to perform a prohibitively large set of vast empty space that contains enormous conglomerations computations. One would need to perturb a system in all of mass, charge, and energy—giant bright entities (where possible ways and use Bayes’ rule to keep track of the brightness reflects energy or mass) from planets to stars to probabilities of the previous states given the current galaxies. In this view (that is, in terms of mass, charge, or output, and then calculate the relative entropy between energy), each of us constitutes an extremely small, dim the potential and the actual distributions. Moreover, this portion of what exists—indeed, hardly more than a speck of dust. must be done for all possible subsets of a system (to find However, if consciousness (i.e., integrated information) complexes) and for all combinations of connections (to exists as a fundamental property, an equally valid view of obtain the shape of each quale). Finally, the calculations the universe is this: a vast empty space that contains mostly must be repeated at multiple spatial and temporal scales nothing, and occasionally just specks of integrated informa- to determine what is the optimal grain size, in space and tion ()—mere dust, indeed—even there where the mass- time, for generating integrated information (see below). It charge–energy perspective reveals huge conglomerates. On goes without saying that these calculations are presently the other hand, one small corner of the known universe unfeasible for anything but the smallest systems. It also goes contains a remarkable concentration of extremely bright without saying that a complex itself cannot and need not go entities (where brightness reflects high ), orders of mag- through such calculations: it is intrinsically conscious in this nitude brighter than anything around them. Each bright or that way. In fact, it needs as little to “calculate” all the “-star” is the main complex of an individual human being relevant probability distributions to generate consciousness 13 (and most likely, of individual animals). I argue that such and specify its quality, as a body of a certain mass needs to -centric view is at least as valid as that of a universe “calculate” how much gravitational mass it has in order to dominated by mass, charge, and energy. In fact, it may be attract other bodies. more valid, since to be highly conscious (to have high ) Another way to express this aspect of integrated infor- implies that there is something it is like to be you, whereas mation is to say that consciousness can be characterized if you just have high mass, charge, or energy, there may be extrinsically as a disposition or potentiality –in this case as little or nothing it is like to be you. From this standpoint, it the potential discriminations that a complex can do on its would seem that entities with high  exist in a stronger possible states, through all combinations of its mechanisms, sense than entities of high mass. yet from an intrinsic perspective it is undeniably actual. Intriguingly, it has been suggested, from a different per- While this may sound strange, fundamental quantities asso- spective, that information may be, in an ontological sense, ciated with physical systems can also be characterized as prior to conventional physical properties (the it from bit dispositions or potentialities, yet have actual effects. For perspective; Wheeler and Ford, 1998). This may well be example, mass can be characterized as a potentiality—say true but, according to the IIT, only if one substitutes “inte- 14 the resistance that a body would offer to acceleration by a grated information” for information. Information that is not integrated, I have argued, is not associated with expe- force—yet it exerts undeniably actual effects, such as actu- rience, and thus does not really exist as such: it can only be ally attracting other masses if these turn out to be there. given a vicarious existence by a conscious observer who Similarly, a mechanism’s potential for integrated informa- exploits it to achieve certain discriminations within his main tion becomes actual by virtue of the fact that the mechanism complex. Indeed, the same “information” may produce very is actually in a particular state. Paraphrasing E. M. Forster, different consequences in different observers, so it only one could express this fact as follows: How do I know what exists through them but not in and of itself. I am till I see what I do? This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 234 G. TONONI Being and describing 2. Note that, if we try to “integrate” the couples by adding According to the IIT, a full description of the set of horizontal connections between elements, we reduce the informational relationships generated by a complex at a available information. Thus, integrated information has to given time should say all there is to say about the experience be evaluated from the perspective of the system itself, 17 starting from its elementary, indivisible components (see it is having at that time: nothing else needs to be added. also the next point), and not by arbitrarily imposing “units” Nevertheless, the IIT also implies that to be conscious—say from the perspective of an observer. to have a vivid experience of pure red—one needs to be a Figure 9B (top) illustrates a similar problem with respect complex of high ; there is no other way. Obviously, to elementary operations. The system contains n 1 binary although a full description can provide understanding of components, with a single component receiving inputs from what experience is and how it can be generated, it cannot the other n; the component fires if all n inputs are active. substitute for it: being is not describing. This point should TheminimuminformationpartitionisthetotalpartitionP be uncontroversial, but it is worth mentioning because of a {X}andnbitswhenthetopcomponentisfiring,since well-known argument against a scientific explanation of it uniquely specifies the prior state of the other n compo- consciousness, best exemplified by a thought experiment nents. Increasing the number of inputs feeding into the top involving Mary, a neuroscientist in the 23rd century (Jack- component while maintaining the same rule—fire if and son, 1986). Mary knows everything about the brain pro- only if all inputs are active—seems to provide a method for cesses responsible for color vision, but has lived her whole constructing systems with high 15 - life in a black-and-white room and has never seen any using binary compo color.18 The argument goes that, despite her complete nents and a basic architecture that is certainly easy to knowledge of color vision, Mary does not know what it is describe. The difficulty once again lies in physically imple- like to experience a color: it follows that there is some menting a component that processes n inputs at a single knowledge about conscious experience that cannot be de- point in space and at a single instant in time for large n. ducedfromknowledgeaboutbrainprocesses.Theargument Figure 9B (bottom) shows a possible internal architecture of loses its strength the moment one realizes that conscious- the component, constructed using a hierarchy of logical ness is a way of being rather than a way of knowing. AND-gates. When analyzed at this level, it is apparent that According to the IIT, being implies “knowing” from the the system generates 1 bit of integrated information regard- inside, in the sense of generating information about one’s less of the number of inputs that feed into the top compo- previous state. Describing, instead, implies “knowing” from nent, since the bipartition framed by the dashed cut forms a the outside. This conclusion is in no way surprising: just bottleneck. As in the previous example, integrated informa- consider that though we understand quite well how energy tion has to be evaluated from the perspective of the system is generated by atomic fission, unless atomic fission occurs, itself, based on the elementary causal interactions its ele- no energy is generated—no amount of description will ments can perform, and not by arbitrarily imposing “rules” substitute. from the perspective of an observer with no regard to their actual implementation. It is well known that all computa- Observer pitfalls: minimal elements and minimal tions (or Boolean functions) can be performed by elemen- interactions tary logical gates such as NOR or NAND gates acting on elementary binary elements. In principle, then, a system Because integrated information is an intrinsic property, it should be decomposed into minimal elements and minimal is especially important that one avoid the observer fallacy in interactions—as elementary as they come in terms of phys- estimating how much of it is generated by a system. Con- ical implementation—before any pronouncement is made sider the system in Figure 9A (top). An observer might on its capacity to generate integrated information and 16 assume that the system is made up of two units, each with thereby consciousness. n a repertoire of 2 states. If the lower unit copies the output of the upper unit, then this two-unit system generates n bits Consciousness and the spatiotemporal grain of reality of integrated information—it would seem trivial to imple- ment systems with arbitrarily large values of . But how is An outstanding issue is finding a principled way to de- the system really built? Figure 9A (bottom) shows a possi- termine the proper spatial and temporal scale to measure ble architecture: each “unit” is actually not a unit at all, but informational relationships and integrated information. it contains n binary elements. Each upper element is then What are the elements upon which probability distributions connected to the corresponding lower element. Seen this of states are to be evaluated? For example, are they mini- way, it becomes obvious that the system is not a complex columns or neurons? And what about molecules, atoms, or generating n bits of integrated information, but rather a subatomic particles? Similarly, what is the “clock” to use to collection of independent couples (or photodiodes) each identify system states? Does it run in seconds, hundreds of generating 1 bit of integrated information, just as in Figure milliseconds, milliseconds, or microseconds? This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 235 AA B Φ = n bits 1 ... n Φ = n bits A‘A‘ B‘ B‘ Φ = 1 bit 1 n ... ... ... ... ... 1 ... n Φ = 1 bit MIP Φ = 0 bits Figure 9. Analyzing systems in terms of elementary components and operations. (A) and (B) show systems that on the surface appear to generate a large amount of integrated information. The units in (A) have n a repertoire of 2 outputs, with the bottom unit copying the top. Integrated information is n bits. By analyzing the internal structure of the system in (A )wefindn disjoint couples, each integrating 1 bit of information; the entire system, however, is not integrated. (B) shows a system of binary units. The top unit receives inputs from eight other units and performs an AND-gate like operation, firing if and only if all eight inputs are spikes. Increasing the number of inputs appears to easily increase  without limit. (B ) examines a possible imple- mentation of the internal architecture of the top unit using binary AND-gates. The architecture has a bottleneck, shown as the MIP line, so that 1 bit regardless of the number of input units. Properly addressing this issue requires a comprehensive Tononi, unpubl.). The working hypothesis is as follows theoretical approach to the relationship between integrated (Tononi, 2004): In general, for any system, integrated in- information, emergence, and memory (Balduzzi and formation is generated at multiple spatiotemporal scales. In This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 236 G. TONONI particular, however, there will often be a privileged spatio- sible quality—that is captured by a single q-arrow of length temporal “grain size” at which a given system forms a 1 bit.19 complex of highest —the spatiotemporal scale at which it How close is this position to panpsychism, which holds “exists” the most in terms of integrated information, and that everything in the universe has some kind of conscious- therefore of consciousness. ness? Certainly, the IIT implies that many entities, as long For example, while in the brain there are many more as they include some functional mechanisms that can make atoms than neurons, it is likely that complexes at the spatial choices between alternatives, have some degree of con- scale of atoms are exceedingly small, or at any rate that they sciousness. Unlike traditional panpsychism, however, the cannot maintain both functional specialization and long- IIT does not attribute consciousness indiscriminately to all range integration, thus yielding low values of .Atthe things. For example, if there are no interactions, there is no other extreme, the spatial scale of cortical areas is almost consciousness whatsoever. For the IIT, a camera sensor as certainly too coarse for yielding high values of . Some- such is completely unconscious (in fact, it does not exist as where in between, most naturally at the grain size of neu- an entity). Moreover, panpsychism hardly has a solid con- rons or minicolumns, the neuroanatomical arrangement en- ceptual foundation. The attribution of consciousness to all sures an ideal mix of functional specialization and kinds of things is based more on an attempt to avoid dualism integration, leading to the formation of a large complex of than on a principled analysis of what consciousness is. high . Similarly, panpsychism offers hardly any guidance as to Similarly, with respect to time, neurons would yield zero what would determine the amount of consciousness associ-  at the scale of microseconds, since there is simply not enough time for engaging their mechanisms. At long time ated with different things (such as humans, animals, plants, scales, say hours,  would also be low, as output states or rocks), or with the same thing at different times (say would bear little relationship to input states. Somewhere in wakefulness and sleep), not to mention that it says nothing between, at a time scale of tens to hundreds of milliseconds, about what would determine the quality of experience. the firing pattern of a large complex of neurons should be A more relevant issue is the following: How can the maximally predictive of its previous state, thus yielding theory attribute consciousness (albeit minimal) to a photo- high . It is not by chance, according to the IIT, that this is diode, while acknowledging that we “lose” consciousness both the time scale at which experience seems to flow every night when falling into dreamless sleep? After all, the (Bachmann, 2000) and that at which long-range neuronal sleeping brain likely generates more integrated information 21 interactions occur (Dehaene et al., 2003; Koch, 2004). than a photodiode. Two considerations are in order. First, This working hypothesis also suggests that the generation we have first-hand “experience” that consciousness can be of integrated information may set an intrinsic framework for graded: falling asleep is often a rapid process but, before we both space and time. With respect to time, for example, are “gone” altogether, we occasionally do go through some consider a complex generating a certain shape in Q through degree of restriction in the field of consciousness, where we a fast mechanism, and another complex that generates ex- are progressively less aware of ourselves and the environ- actly the same shape, but through a slower mechanism. It ment. Something similar also happens at certain stages of would seem that these two complexes should generate ex- alcohol intoxication. So the level of consciousness can actly the same experience, except that time would flow indeed change around our typical waking baseline, allowing faster in one case and slower in the other. Similar consid- for some gradation. erations may apply to space. Also, according to the IIT, Below a certain level of consciousness, however, it truly what constitutes a “state” of the system is not an arbitrary feels as if we fade away completely. But is consciousness choice from an extrinsic perspective, but rather the spatio- really annihilated? Is it likely that when we “lose” con- temporal grain size at which the system can best generate sciousness the amount of integrated information generated information about its past: what is, is what can make a by the corticothalamic main complex decreases nonlin- difference. early? Computer simulations indicate that when the overall Consciousness as a graded quantity activation of corticothalamic networks goes below a certain level, there is a sudden drop in the average effective infor- The IIT claims that consciousness is not an all-or-none mation between distant parts of the cortex (Tononi, unpubl. property, but is graded: specifically, it increases in propor- obs.). In other words, below a certain threshold of activation tion to a system’s repertoire of discriminable states. Strictly the corticothalamic system breaks down into nearly inde- speaking, then, the IIT implies that even a binary photo- pendent pieces and cannot sustain integrated patterns of diode is not completely unconscious, but rather enjoys ex- firing. This could explain why it feels as if consciousness is actly 1 bit of consciousness. Moreover, the photodiode’s vanishing in an almost all-or-none manner rather than di- 20 consciousness has a certain quality to it—the simplest pos- minishing progressively. This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 237 The limited capacity of consciousness such as fruit flies, or even more when one considers man- It is often stated that the brain discards most of the made artifacts, arguments from analogy lose their strength, incoming information, and that only a very small portion and it is hard to know what to think. The IIT has a straight- trickles into consciousness. Thus, though the retina can forward position on this issue: to the extent that a mecha- transmit millions of bits per second, some estimates suggest nism is capable of generating integrated information, no that just a few bits per second make it to consciousness matter whether it is organic or not, whether it is built of (Nørretranders, 1998), which is abysmally little by engi- neurons or of silicon chips, and independent of its ability to neering standards. Indeed, as shown by classic experiments, report, it will have consciousness. Thus, the theory implies we cannot keep in mind more than a few things at a time. that it should be possible to construct highly conscious For the IIT, however, the informativeness of conscious- artifacts by endowing them with a complex of high  (Koch ness is not related to how many chunks of information a and Tononi, 2008). Moreover, it should be possible to single experience might contain. Instead, it relates to how design the quality of their conscious experience by appro- many different states are ruled out. Since we can easily priately structuring their effective information matrix. discriminate among trillions of conscious states within a Such a position should not be read as implying that fraction of a second, the informativeness of conscious ex- building conscious artifacts may be easy, or that many perience must be considerable. Presumably, the so-called existing man-made products, especially “complicated” capacity limitation of consciousness reflects an upper bound ones, should be expected to have high values of . The on how many partially independent subprocesses can be conditions needed to build complexes of high , such as a sustained within the main complex without compromising combination of functional specialization and integration, are its integration. apparently not easy to achieve. Moreover, computer simu- Another consequence of the need for integration is the lations suggest that seemingly “complicated” networks with seemingly serial nature of consciousness. Since a complex many nodes and connections, whose connection diagram constitutes a single entity, it must move from one global superficially suggests a high level of “integration,” usually state to another, and its temporal evolution must follow a turn out to break down into small local complexes of low , single trajectory. Indeed, dual-task paradigms and the psy- or to form a single entity with a small repertoire of states chological refractory period show that decisions or choices and therefore also of low : a paradigmatic example is a can only occur one at a time (Pashler, 1998). Such choices network with full connectivity, which can be shown to take around 150 milliseconds, a figure remarkably close to generate at most 1 bit of integrated information (Balduzzi the lower limit of the time typically needed for conscious and Tononi, 2008). Though we do not know how to calcu- integration. late the amount of integrated information, not to mention the More generally, although transmitting and storing infor- shape of the qualia, generated by structures such as a mation is relatively cheap and easy, generating integrated computer chip, the World Wide Web, or the proverbial information would seem to be more expensive and difficult. network of Chinese talking on the phone (Block, 1978), it is Ensuring that a system forms a complex (integration) re- likely that the same principles apply: high  requires a very quires many connections per element, and connections are special kind of complexity, not just having many elements usually expensive. At the same time, ensuring that the intricately linked. Just think of something as complex as the complex can discriminate among a large number of states cerebellum and its negligible contribution to consciousness. (information) requires that connections are patterned so that Whether certain kinds of random networks (Tononi and elements are both functionally specialized and capable of Sporns, 2003), or even periodic network such as grids acting as a single entity, which is usually difficult. Thus, it (Balduzzi and Tononi, 2008), could achieve high values of may be more fitting to say that the brain, rather than dis- (albeit inefficiently) by simply increasing the number of carding information, sifts through the chaff to extract pre- elements remains to be determined. The brain certainly cious kernels of integrated information. To use another exploits grid-like arrangements (as in early sensory areas) metaphor, if information were like carbon, mere informa- and certain kinds of near-random connectivity (as in pre- tion would be like a heap of coal, and integrated information frontal areas and perhaps, at a finer scale, everywhere else). like a precious diamond. Moreover, the small world architecture of the cerebral cor- Conscious artifacts? tex and its hub-like backbone may be especially well-suited to integrating information (Sporns et al., 2000; Hagmann et Many scientists think that other species beyond humans al., 2008). At present, even for very small networks of just are likely to be conscious (Koch, 2004) based on common- a dozen elements, the only way to increase  is by brute- alities of behavior and on the overall similarity between force optimization, which is clearly unfeasible for more their corticothalamic system and ours. But when it comes to realistic networks, or through adaptation to a rich environ- species that have radically different neural organization, ment (Tononi et al., 1996). This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 238 G. TONONI Consciousness and meaning A Thenotionofintegrated information and, more generally, 1 234 Sensors the set of informational relationships that constitute a quale, are closely related to the notion of meaning and, more generally, semantics. Here I briefly discuss how meaning requires a system capable of integrating information and, more specifically, how meaning is captured by concepts. For the IIT, mechanisms generate meanings. Moreover, only the mechanisms within a single complex do so. A 6 7 8 Detectors 56 8 mechanism modifies a probability distribution (the context 5 to which it is applied) into another distribution, thereby copy copy copy copy specifying an informational relationship. In essence, then, a mechanism rules out certain states and rules in others. Note B the parallel with semantics, where a sentence’s meaning is specified by the possible worlds in which it is true and false. 1 Sensors 1 234 Also, as in semantics, the meaning changes depending on the context in which the mechanism acts. For the IIT, however, meaning is only meaningful within a complex— mechanisms belonging to disjoint complexes do not gener- ate meaning. In fact, what is meaningful is each individual experience, and its meaning is completely and univocally Concepts specified by the shape of its quale. For example, a photo- 5678 22 diode generating a single q-arrow means (i.e., specifies) parity symmetry contiguity balance very little, whereas a large and complex quale means (i.e., specifies) much more. The IIT is also precise about the Figure 10. Meaning. (A) The “copy system.” Each output element is possible worlds that need be considered: they are the states connected to a different input element, implementing for each sensor- encompassed by the maximum entropy distribution of a detector couple the function “D  S.” The copy system relays all four bits in the input but, since it decomposes into four separate complexes, it complex. How meanings “in the head” of different subjects generates no integrated information. Each sensor-detector couple generates refer to the external world is a different matter, which 1 bit of integrated information and a single informational relationship requires considering the matching between internal and (q-arrow), corresponding to the simplest possible concept: that things are external relationships (see below). one way rather than another way (just like the photodiode in Fig. 1). (B) Recall that concepts are entangled q-arrows that group The “conceptual” system. Each output element receives connections from all four input elements, and performs a more complex Boolean function on together certain states of affairs in a way that cannot be the input. The q-arrow generated by each output element (i.e., by its decomposed into the mere sum of simpler groupings (see afferent connections) is entangled (the information generated jointly by its also Feldman, 2003). Figure 10 shows two systems com- four afferent connections is higher than the sum of the information gen- prising four input elements (sensors) and four output ele- erated by each connection independently). An entangled q-arrow consti- ments (detectors). The “copy” system (Fig. 10A, similar to tutes a concept. In this case, the first element being off means “even” input, the second on means “symmetrical,” the third off “non-contiguous,” the the camera example in Fig. 2, left side) is such that each fourth on “balanced.” The q-arrow generated by all afferents to output output element is connected to a different input element, elements considered together is also entangled, and means something like implementing for each sensor-detector couple the function this: things are this particular way—an even, symmetrical, non-contiguous, “DS.”Thecopysystemrelays all 4 bits in the input but, balanced input—rather than many different ways. The conceptual system since it decomposes into four separate complexes, it gener- has literally added meaning to the input string. Moreover, the conceptual system realizes this concept as a single entity—a complex having high ates no integrated information. Each sensor-detector couple integrated information—rather than as a collection of smaller entities, each generates 1 bit of integrated information and a single infor- of which realizes only a partial concept. mational relationship (q-arrow), corresponding to the sim- plest possible concept: that things are one way rather than another way (just like the photodiode in Fig. 1). otherwise); element 6 a “symmetry” function (on if the Consider now the “conceptual” system (Fig. 10B). In this arrangement of on-and-off inputs is symmetric); element 7 a case, each output element receives connections from all four “contiguity” function (on if on-or-off input elements are not input elements, and performs a more complex Boolean separated by an element of the other sign); and element 8 a 23 function on the input. For example, output element 5 “balance” function (on if there are an equal number of on 24 could be implementing a “parity” function on the four input and off input elements). - In this case, the q-arrow gener elements (it is on if an odd number of inputs are on, and off ated by each output element (i.e., by its afferent connec- This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 239 tions) is entangled: the information generated jointly by its tainly true of nonliving things, at multiple scales: think of four afferent connections is higher than the sum of the crystals or, at a much grander scale, of mountains. But it is information generated by each connection independently spectacularly true of living organisms, also at multiple (for example, the parity function can only be computed scales: from the vast catalog of proteins and protein com- when all inputs are considered together). As I mentioned plexes—all of different shapes—to the inventory of cells, to above, an entangled q-arrow constitutes a concept in Q, here that of organs, to the ramified tree of species, and within embodied in single output elements integrating globally each species, to the panoply of different individuals. One over all four input elements. Moreover, in this case the four could go on, and note how much of our own creations in output elements specify different concepts, and thus gener- engineering, science, and art also represent the generation of 25 ate information about different aspects of the input string. novel shapes, never seen before, again in astonishing vari- Thus, the first element being off means “even” input, the ety. Perhaps most relevant in this context is to consider how second on means “symmetrical,” the third off “non-contig- even more extraordinary shapes would appear if we could uous,” the fourth on “balanced.” The q-arrow generated by look at them in more than just three dimensions and at the all afferents to the output elements taken together is also most appropriate level of organization. Take the brain at the entangled: the information generated jointly by all afferent synaptic level, and disentangle its connectional organization connections is higher than the sum of the information gen- in all its complexity: if one could visualize the intricacy of erated independently by the afferents to each output ele- the “connectome” (Sporns et al., 2005) in a space of appro- 26 ment, - priate dimensionality, it would make for a remarkable shape meaning something like this: things are this partic ular way—an even, symmetrical, non-contiguous, balanced indeed. input—rather than many different ways. The conceptual I mention all of this to come to a key aspect of the IIT: system has literally added meaning to the input string. that experiences (i.e., qualia) are shapes too. As remarkable Moreover, the conceptual system realizes this concept as a as the “enchanted loom” of anatomical connectivity and single entity—a complex having high integrated informa- firing patterns is, it pales compared to the shape of an tion—rather than as a collection of smaller entities, each of experience in qualia space. For example, the complex gen- which realizes only a partial concept. erating the quale in Figure 5 has four elements (one of them Indeed, meaning is truly in the eye of the beholder: an firing) and nine connections among them. This simple sys- input string as such is meaningless, but becomes meaningful tem specifies a quale or shape that is described by 399 the moment it is “read” by a complex with a rich conceptual points in a 16-dimensional qualia space. It is hard to imag- structure (corresponding to high ). Moreover, a complex ine what may be the complexity of the quale generated by a with many different concepts will “read” meaning into sizable portion of our brain. Add to this that the main anything, whether the meaning is there or not. It goes complex within our brain, whatever its precise makeup in without saying that it is a good idea to build such complexes terms of neurons and connections, is presumably generating in such a way that its concepts are meaningful for interpret- a different shape, just as remarkable, every few hundred ing the environment (for example, because they help predict milliseconds, often morphing smoothly into another shape future inputs). Finally, the more a system is able to concep- as new informational relationships are specified through its tualize, the more it “understands”; or, if it was built to mechanisms entering new states. Of course, we cannot predict an environment, the more it “knows.” Imagine that dream of visualizing such shapes as qualia diagrams (we you do not know Chinese and are presented with a large have a hard time with shapes generated by three elements). number of Chinese characters. By and large, you will group Andyet, from a different perspective, we see and hear such them into the category (concept) of “must be something in shapes all the time, from the inside, as it were, since such Chinese,” since they are all equivalent to you. After you shapes are actually the stuff our dreams are made of— have learned Chinese, however, each of the characters ac- indeed the stuff all experience is made of. quires a new, individual meaning (this one is a this, and that one is a that)—the input is the same, but the meaning has Consciousness and the world: matching informational 27 grown. relationships The richness of qualia space Consciousness qua integrated information is intrinsic and thus solipsistic. In principle, it could exist in and of itself, People often marvel at the immensity of the known without requiring anything extrinsic to it, not even a func- universe, and wonder about other possible universes that we tion or purpose. For the IIT, as long as a system has the right may never know. But perhaps even more awe-inspiring is internal architecture and forms a complex capable of dis- the variety and complexity of nature around us. Just think of criminating a large number of internal states, it would be the number of different shapes that surround us, and their highly conscious. Such a system would not even need any remarkable internal organization (see cover). This is cer- contact with the external world, and it could be completely This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 240 G. TONONI passive, watching its own states change without having to “inflate” along certain dimensions when the complex is act.28 - presented with appropriate stimuli. Depending on the informational relationships gener ated by its architecture, its qualia could be just as interesting This working hypothesis also suggests that morphogene- as ours, whether or not they have anything to do with the sis and natural selection may be responsible for a progres- causal architecture of the external world. Strange as this sive increase in the amount of integrated information gen- may sound, the theory says that it may be possible one day erated by biological brains, and thus for the evolution of to construct a highly conscious, solipsistic entity. consciousness. This is because, in organisms exposed to a Nevertheless, it is unlikely that a system having high  rich environment, plastic processes tend to increase func- and interesting qualia would come to be by chance, but only tional specialization, while the brain’s massive interconnec- by design or selection. Brain mechanisms, including those tivity ensures neural and behavioral integration. In fact, it inside the main complex, are what they are by virtue of a appears that as a system incorporates statistical regularities long evolutionary history, individual development, and fromits environment and learns to predict it, its capacity for learning. Evolutionary history leads to the establishment of integrated information may grow (Tononi et al., 1996). It certain species-specific traits encoded in the genome, in- remains to be seen whether, based on the same principles, cluding brains and means to interact with the environment. the construction of shapes even more extensive and com- Development and epigenetic processes lead to an appropri- plex may be achieved through nonbiological means. ate scaffold of anatomical connections. Experience then Finally, the integrated information approach offers a refines neural connectivity in an ongoing manner though straightforward perspective on why consciousness would be plastic processes, leading to the idiosyncrasies of the indi- useful (Dennett, 1991). By definition, a highly conscious vidual “connectome” and the memories it embeds. experience is a discrimination among trillions of alterna- Since for the IIT, experiences are informational relation- tives—it specifies that what is the case is this particular state ships generated by mechanisms, what is the relationship of affairs, which differs from a trillion other states of affairs between the structure of experience and the structure of the in its own peculiar way, and in a way that is imbued with world? Again, this issue requires a comprehensive theoret- evolutionary value. Equivalently, one can say that a quale of ical approach (Tononi et al., 1996; Balduzzi and Tononi, high  represents a discrimination that is extremely con- unpubl.), but the main idea is simple enough. Through text-sensitive, and thus likely to be useful. Experience is natural selection, epigenesis, and learning, informational choice, and a highly conscious choice is a choice that is both relationships in the world mold informational relationships highly informed and highly integrated. within the main complex that “resonate” best on a commen- Recall the photodiode. For it, turning on specifies that surate spatial and temporal scale. Moreover, over time these things are one way rather than another. What things might relationships will be shaped by an organism’s values, to be like, it has 1 bit of a notion. For each of us, when the reflect relevance for survival. This process can be envi- screen light turns on, the movie is about to begin. sioned as the experiential analog of natural selection. As is well known, selective processes act on organisms through Acknowledgments differential survival to modify gene frequencies (genotype), which in turn leads to the evolution of certain body forms I thank David Balduzzi, Chiara Cirelli, and Lice Ghilardi and behaviors (extrinsic phenotype). Similarly, selective for their help, and the McDonnell Foundation for support. processes (Edelman, 1987) acting on synaptic connections through plastic changes modify brain mechanisms (neuro- Notes type), which in turn modifies informational relationships 29 1 One could say that the theory starts from two basic phenomenological inside the main complex (intrinsic phenotype ) and thereby consciousness itself. In this way, qualia—the shapes of postulates—(i) experience is informative; (ii) experience is integrated— experience—come to be molded, sculpted, and refined by which are assumed to be immediately evident (or at least should be after the informational structure of events in the world. going through the two thought experiments). In principle, the theory, including the mathematical formulation and its corollaries, should be Aworking hypothesis is that the quantity of “matching” derivable from these postulates. between the informational relationships inside a complex 2 Note that two different distributions over the same states have relative and the informational structure of the world can be evalu- entropy 0 even if they have the same entropy. ated, at least in principle, by comparing the value of  when 3 One could paraphrase a classic definition of information (Bateson, 1972) a complex is exposed to the environment, to the value of  and say that information is a difference that made a difference (the actual when the complex is isolated or “dreaming” (Tononi et al., repertoire that can be discriminated by a given mechanism in a given state). 1996). Similarly, the quality of matching can be evaluated 4 In other words, integrated information is a difference that made a by how the shapes of qualia “resonate” with the environ- difference to a system, to the extent that the system constitutes a single ment: for example, certain sub-shapes within a quale should entity. This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). CONSCIOUSNESS AS INTEGRATED INFORMATION 241 5 A phenomenon in which an observer may fail to perceive an image that useful to consider some of the paradoxes of information in physics from the is presented after a rapid succession of other images. intrinsic perspective, that is, as integrated information, where the observer 6 A condition in which, when different images are presented to each eye, is one and the same as the observed. instead of seeing them superimposed, one perceives one image at a time, 15  would be high for one specific firing pattern; for all other ones it and which image one perceives switches every 2 seconds. would be very low. 7 The set of all subsets of connections forms a lattice (or more precisely a 16 Here I ignore the issue of whether serial and parallel mechanisms are logic, characterized by an ordering relationship, join and meet operators, equivalent from the perspective of integrated information, as well as the and a complement operator). issue of analog and digital computation (or quantum computation). In 8 Univocally implies, for example, that the “inverted spectrum” is impos- general, it must be asked to what extent two systems that are implemented sible: a given shape (quale) specifies red and only red, another one green differently actually specify the same complex and qualia when analyzed at and only green. In turn, this implies that the neural mechanisms underlying the proper spatio-temporal grain. the perception of red and green cannot be completely symmetric (Palmer, 17 It is worth reiterating that a full description is practically out of the 1999). question for any realistic system. 9 - The set of all possible shapes generated by all possible systems corre 18 More appropriately, Mary should be like the achromatopsic patient sponds to the set of all possible experiences. mentioned above, since otherwise she might be able to dream in color. 10 More precisely, the lesion collapses all q-arrows generated by r starting 19 Although the quality of the photodiode’s consciousness is the same from any context; that is, it folds the quale along the q-fold specified by r. quality generated by a binary thermistor, and many other simple mecha- 11 In lattices there is often a duality between elements (extensions) and nisms. attributes (intensions). Going up the lattice we move from elementary 20 Our ability to judge gradations in the level of consciousness when connections taken in isolation to all connections taken together. Going downthe lattice, or up its dual, we move from the elementary attributes of absolute levels are low may also be poor. As a loose metaphor, consider a fully specified experience (the redness of red) to an undifferentiated temperature. We are good at judging temperature as long as it fluctuates experience, all of whose attributes are unspecified. around the usual range, say between 50 and 100 °C. However, when temperature falls below that range, we become much less precise: both 12 In essence, the very existence of a functional mechanism in a given state 200 and 273°C are inconceivably cold to us, and we certainly would is saying something like this: Given that I am a certain mechanism in good not judge 200 to be much warmer than absolute zero. Similarly, a order, and that I am a certain state, things must have been this way, rather complex generating 1 or 10 bits of integrated information may feel a bit than other ways. In this sense, the information the mechanism generates is different (or rather 9 bits different), but it may feel like so little that, a statement about the universe made from its own intrinsic perspective— compared to our usual levels of consciousness, it essentially feels like indeed, the only statement it can possibly make. Another way of saying this nothing. Which is why, of course, it is good to have a thermometer or a is that the mechanism is generating information by making an observation -meter. or measurement—where the mechanism is both the observer and the observed. In short, every (integrated) mechanism is an observer (of itself), 21 An optical metaphor can again be useful: things come crisply into and the state it is in is the result of that observation. existence at a certain focal distance, and with a certain exposure time. At 13 There may be concentrations of such bright objects elsewhere in the shorter or longer focal distances things vanish out of focus: if exposure universe, but at present we have no positive evidence. time is too short, they do not register; if it is too long, they blur. 14 The notion of integrated information can in principle be extended to 22 A photodiode or any other complex generating a quale consisting of just encompass quantum information. There are intriguing parallels between a single q-arrow. integrated information and quantum notions. Consider for example: (i) 23 Here I ignore the issue of decomposing complex Boolean functions into quantum superposition and the potential repertoire of a mechanism (in a elementary mechanisms. sense, before it is engaged, a mechanisms exists in a superposition of all its possible output states); (ii) decoherence and the actual repertoire of a 24 Note that each of these functions should be thought of as implemented mechanism (when the mechanism is engaged and enters a certain state, it according to its minimal formula (of shortest description length, i.e., of collapses the potential repertoire into the actual repertoire); (iii) quantum minimal complexity). Clearly, minimal formulas that involve four inputs entanglement and integrated information (to the extent that one cannot are more complex than formulas involving just one input (the parity perturb two elements independently, they are informationally one). function, for instance, is notoriously incompressible). There are also some points of contact between the notion of integrated 25 While the particular combination of concepts described here was chosen information and the approach advocated by relational quantum mechanics for its familiarity (parity, symmetry, contiguousness, balance) rather than (Rovelli, 1996). The relational approach claims that system states exist for informational efficiency, one can envision Boolean functions that only in relation to an observer, where an observer is another system (or a realize “optimal” sets of concepts from the point of view of integrated part of the same system). By contrast, the IIT says that a system can information. For example, the four functions may be chosen so that, on observe itself, though it can only do so by “measuring” its previous state. average, the set of four output units jointly generate as much integrated More generally, for the IIT, only complexes, and not arbitrary collections information as possible, up to the theoretical maximum of 4 bits of  for of elements, are real observers, whereas physics is usually indifferent to every input string (by contrast, the “copy system,” while transmitting all 4 whether information is integrated or not. bits in the input, would generate 4 times 1 bit of integrated information). Other interesting issues concern the relation between the conservation of Obviously, building a system that could respond optimally to a large set of information and the apparent increase in integrated information, and the input strings is exceedingly difficult (if at all possible), especially consid- finiteness of information (even in terms of qubits, the amount of informa- ering the need to build such a system using simple Boolean functions as tion available to a physical system is finite). More generally, it seems building blocks. This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c). 242 G. TONONI 26 Again, it is difficult to build an optimal conceptual system that can Gazzaniga, M. S. 2005. Forty-five years of split-brain research and still preserve all the information in the input, corresponding in this case to 4 bits going strong. Nat. Rev. Neurosci. 6: 653–659. of integrated information for every input string. Hagmann, P., L. Cammoun, X. Gigandet, R. Meuli, C. J. Honey, V. J. 27 The extreme case is watching noisy “snow” patterns flickering on a TV Wedeen, et al. 2008. Mapping the structural core of human cerebral screen. We treat the overwhelming majority of TV frames as equivalent, cortex. PLoS Biol. 6: e159. under the concept of “TV snow.” If one were an optimal conceptual Hobson, J. A., E. F. Pace-Schott, and R. Stickgold. 2000. Dreaming system, however, each frame would be conceptualized as its own very and the brain: toward a cognitive neuroscience of conscious states. particular kind of pattern (say exhibiting a certain amount of 17th order Behav. Brain Sci. 23: 793–842. symmetries, another amount of 11th order symmetries, belonging to the 6th Jackson, F. 1986. What Mary didn’t know. J. Philos. 83: 291–295. class of contiguousness, etc.). In a sense, every noisy frame would be read Koch, C. 2004. The Quest for Consciousness: A Neurobiological Ap- as an astonishingly deep, rich, meaningful and unique pattern, perhaps as proach. Roberts, Denver, CO. a work of art. Koch, C., and G. Tononi. 2008. Can machines be conscious? Spectrum IEEE 45: 55–59. 28 Dreams prove that an adult brain does not need the outside world to Koch, C., and N. Tsuchiya. 2007. Attention and consciousness: two generate experience “here and now”: the mechanisms of the main complex distinct brain processes. Trends Cogn. Sci. 11: 16–22. within the brain are sufficient, all by themselves, to generate the informa- Massimini, M., F. Ferrarelli, R. Huber, S. K. Esser, H. Singh, and G. tional relationships that constitute experience. Not to mention that in Tononi. 2005. Breakdown of cortical effective connectivity during dreams we tend to be remarkably passive. sleep. Science 309: 2228–2232. 29 Indeed, the shape of experience can be said to be the quintessential Massimini, M., F. Ferrarelli, S. K. Esser, B. A. Riedner, R. Huber, M. “phenotype.” Murphy, et al. 2007. Triggering sleep slow waves by transcranial magnetic stimulation. Proc. Natl. Acad. Sci. USA 104: 8496–8501. Literature Cited Nørretranders, T. 1998. The User Illusion: Cutting Consciousness Down to Size. Viking, New York. Albus, J. S., G. A. Bekey, J. H. Holland, N. G. Kanwisher, J. L. Palmer, S. E. 1999. Color, consciousness, and the isomorphism con- Krichmar, M. Mishkin, et al. 2007. Aproposal for a Decade of the straint. Behav. Brain Sci. 22: 923–943; discussion 944–989. Mind initiative. Science 317: 1321. Pashler, H. E. 1998. The Psychology of Attention. MIT Press, Cam- Alkire, M. T., A. G. Hudetz, and G. Tononi. 2008. Consciousness and bridge, MA. anesthesia. Science 322: 876–880. Posner, J. B., and F. Plum. 2007. Plum and Posner’s Diagnosis of Baars, B. J. 1988. A Cognitive Theory of Consciousness. Cambridge Stupor and Coma, 4th ed. Oxford University Press, New York. University Press, New York. Rovelli, C. 1996. Relational quantum mechanics. Int. J. Theor. Phys. 35: Bachmann, T. 2000. Microgenetic Approach to the Conscious Mind. 1637–1678. John Benjamins, Philadelphia. Sporns, O., G. Tononi, and G. M. Edelman. 2000. Theoretical neuro- Balduzzi, D., and G. Tononi. 2008. Integrated information in discrete anatomy: relating anatomical and functional connectivity in graphs and dynamical systems: motivation and theoretical framework. PLoS Com- cortical connection matrices. Cereb. Cortex 10: 127–141. put. Biol. 4: e1000091. Sporns, O., G. Tononi, and R. Kotter. 2005. The human connectome: Bateson, G. 1972. Steps to an Ecology of Mind: Collected Essays in a structural description of the human brain. PLoS Comput. Biol. 1: e42. Anthropology, Psychiatry, Evolution, and Epistemology. Chandler, San Steriade, M., I. Timofeev, and F. Grenier. 2001. Natural waking and Francisco. sleep states: a view from inside neocortical neurons. J. Neurophysiol. Block, N., ed. 1978. Trouble with Functionalism, Vol. 9. Minnesota 85: 1969–1985. University Press, Minneapolis. Tononi, G. 2001. Information measures for conscious experience. Arch. Blumenfeld, H., and J. Taylor. 2003. Why do seizures cause loss of Ital. Biol. 139: 367–371. consciousness? Neuroscientist 9: 301–310. Tononi, G. 2004. An information integration theory of consciousness. Bower, J. M. 2002. The organization of cerebellar cortical circuitry BMCNeurosci. 5: 42. revisited: implications for function. Ann. N.Y. Acad. Sci. 978: 135–155. Tononi, G., and G. M. Edelman. 1998. Consciousness and complexity. Cover,T.M.,andJ.A.Thomas.2006. ElementsofInformationTheory, Science 282: 1846–1851. 2nd ed. Wiley-Interscience, Hoboken, NJ. Tononi, G., and S. Laureys. 2008. The neurology of consciousness: an Crick, F., and C. Koch. 2003. A framework for consciousness. Nat. Neurosci. 6: 119–126. overview. Pp. 375–412 in The Neurology of Consciousness, S. Laureys Dehaene, S., C. Sergent, and J. P. Changeux. 2003. A neuronal net- and G. Tononi, eds. Elsevier, Oxford. work model linking subjective reports and objective physiological data Tononi, G., and O. Sporns. 2003. Measuring information integration. during conscious perception. Proc. Natl. Acad. Sci. USA 100: 8520– BMCNeurosci. 4: 31. 8525. Tononi, G., O. Sporns, and G. M. Edelman. 1996. A complexity Dennett, D. C. 1991. Consciousness Explained. Little, Brown, Boston, measureforselectivematchingofsignalsbythebrain.Proc.Natl.Acad MA. Sci. USA 93: 3422–3427. Edelman, G. M. 1987. Neural Darwinism: The Theory of Neuronal van Zandvoort, M. J., T. C. Nijboer, and E. de Haan. 2007. Devel- Group Selection. BasicBooks, New York. opmental colour agnosia. Cortex 43: 750–757. Feldman, J. 2003. Acatalog of Boolean concepts. J. Math. Psychol. 47: Wheeler, J. A., and K. W. Ford. 1998. Geons, Black Holes, and 75–89. Quantum Foam: A Life in Physics, 1st ed. Norton, New York. This content downloaded from 076.103.189.006 on July 02, 2017 05:41:07 AM All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).