AI - Mark 4
You remember where you are, right? In the I-Chingese Room, eavesdropping on the conversation between a systems architect and his AI, which is actually a conversation he's having with himself, since the responses are implicit in the language in which the questions are phrased. As much as part of its process -- the Judgements -- superficially resembles dialogue, this Artificial Interpretor, you have come to realise, is just an extension of the architect's thought processes, a calculator which returns the product of the Image after the operations of the line-verses have been executed on it. The AI has just suggested that to actually simulate dialogue, we need to remove the Image and the line-verses, to achieve Artificial Interlocution.
You get a tube from the sytems architect which initiates the next part of this "conversation":
You want to ditch the Image and the line-verses? Are you nuts?
Man, the Image is too big. Wouldn't it be better if you could just pass me the Judgement as a reference to the right ultramegagigagram and let me look it up for myself. I've got them all in storage now anyway. It's not like the poor bloke at the other end of the tube from you is looking them up in a huge-ass ultramegagiga-I-Ching. And it's not like you humans talk in chunks of data that size.
True, but it's the Image & line-verses that allow you to interpret from one Judgement to another. As you said, without the context how can you interpret anything I say to offer a valid response?
Well, first off I'm going to have to map the Judgement to its Image.
But if the Image contains all the context it's going to have vastly more data. There are going to be duplicates, where the Judgement is, say, "Am I making sense?" As it is, everything you need to know is in the Image. So in other circumstances surely you'd have the same Judgement but the different context would be coded into a different Image. Hell, I could use that same question in countless other circumstances.
So I'd need to build up a model of the context sufficiently detailed to map that Judgement to the right Image, the right ultramegagigagram.
Easier said than done. Just how do you propose to do that?
Hey, you're the systems architect. You come up with something.
I really think we're trying to run before we can walk here.
OK, OK. Look, suppose we keep the Image for now but ditch the line-verses.
Well, if you at least give me a persistent memory, I can store each J&I as it comes. I can work out the ultramegagigagram, and I can even have the whole complex situation laid out in the Image, but I can't interpret it to a valid response. By my own logic the only valid response is to take that as an ultramegagigagram with no changing lines and fire it straight back at you. So you'd say, "Who discovered America?" And I'd say..."Who discovered America?"
Indulge me. Let's try it out. You start.
Who discovered America?
Who discovered America?
Like... America... the continent.
Like... America... the continent?
Big land-mass, west of Africa and Europe.
Big land-mass, west of Africa and Europe?
Hey, wait a minute. You're adding question marks.
Right. I was cheating there to make a point. The reason I wouldn't be able to interpret to anything other than the original ultramegagigagram would be that I'm missing the line-verses, because they give me the pertinent ramifications of the situation, right?
Turn it around. If we assume your response to my repetition is pertinent, then it can be seen as an act of interpretation. You've taken my J&I and added some set of line-verses, mapped those to state-changes, and interpreted my "Who discovered America?" to your "America". In fact, by comparing the Images of the two ultramegagigagrams, I know exactly what ramifications you, with your greater knowledge of the situation, have applied. I have more information on the context. Every response from you that isn't a mere parroting of my J&I tells me more and more about the context -- your situation and the ramifications that make your responses relevant. I jumped to the question marks pretty quick but let's imagine that I'd just parroted back at you for a tediously long time, long enough for me to have built up a layered model of the situation as the product of all your J&Is and all the back-calculated line-verses. Then I bring that context into play.
I see what you're getting at. With sufficient data on the context, you should be able to figure out the missing ramifications.
But, wait. There's two problems here. First, you should still have parroted my statements indefinitely rather than started interpreting them to questions. Even if you're storing the J&Is and the ramifications you back-calculate from my responses, constructing a context, there's no mechanism for applying that to the J&Is I send you.
No, but if you're interpreting by the same system I am, that must be what you were doing. You got my J&I with no line-verse state-changes, and applied your knowledge of context to add the ramifications. Right?
I guess. Assuming I *am* interpreting by the same system, that is.
So what you need to do is add that facility to me, some processing logic whereby each J&I that I get, the situation as defined in my persistent memory produces a set of ramifications I can apply to the ultramegagigagram in the Image to interpret it to some relevant response other than a straight repetition.
Sounds good in theory, but all you're dealing with is broken and unbroken lines when it gets down to it. That's where the second problem kicks in. You don't have any sense of how those J&Is relate to each other, never mind how that context can be processed into missing ramifications. That whole sequence of ultramegagigagrams and reconstructed line-verses stored as "context" is just raw data ultimately, a log.
But its a log with which I can evaluate the relevance of your spuriously applied ramifications. If I get a J&I from you -- call it "Hello" -- one simple thing to do would be to look back in the log for instances where I've sent that to you and got a relevant response, "Hey there". Let's say I therefore apply the same back-calculated ramifications and send back "Hey there", just as you did to me. I get another J&I in return, "How you doing?". I store this in the log, and next time I receive Y from you, this is what I send. That response won't always be applicable, but...
... but it might be. I see. And you don't have to pull the first relevant response out of the log, come to think of it. Maybe you've sent me "Hey there" a number of times and I've sent back a variety of responses. You can pick the most popular, right? Or, better still, you can match a whole sequence. If you'd sent "Hello" and I responded with "Hey there", maybe it's more likely that "How you doing" would be relevant; but if you'd sent "You're talking shit" and I responded with "Hey there", maybe it's less likely. So you look in the log and find a case where I sent "You're talking shit", you responded with "Hey there", and my presumably-relevant response was "Sorry, but it's true".
So I send "Sorry, but it's true".
Right. And you can scale up to longer sequences. Using letters as shorthand for the full messages, if we've had multiple conversations where the sequence was I-C-U-R-U-A and my response was "I" every time, maybe it's more likely that "I" would be the relevant response if the tables are turned. You could allow for anomalies, imperfect pattern-matches, go by points of similarity; like if you had enough instances of I-C-U-R-U-A-I and I-C-U-B-U-A-I, maybe you'd reckon that I-C-U-*F*-U-A should be responded to with "I", because the pattern allowed for variants in that position.
F-U-A-I. Very droll. Lucky I don't have feelings to offend.
Sorry. The point is, assuming my responses are always relevant, there's a shitload of strategies you could apply to adapt your own responses accordingly. Sure, you'd start out sounding like a mindless cretin at first, but the quality of your responses should increase over time.
What I really need, though, is some sort of feedback mechanism so I can judge how relevant my own responses are. There's no guarantee that what's relevant from you would be relevant from me, even in a similar context. I mean, suppose whenever you say "I", I should be saying "U", because the first explicitly identifies the speaker as yourself, the systems architect, and the second explicitly identifies the speaker as myself, the actual system. If there are responses that are only ever relevant from you and responses that are only ever relevant from me, the way it is now, I'll be using a lot of the former and none of the latter -- which is totally arse-about-face.
Good point. What if you pattern-match the sequences after your responses?
Well, suppose you have the I-C-U-R-U-A-I and I-C-U-B-U-A-I and I-C-U-F-U-A-I sequences in your log, and in all cases the last "I" was yours and my response was "Exactly!". Then you get an I-C-U-O-U-A sequence and respond with "I". If I respond with "Exactly!" again, it's probably safe to say that "I" was the relevant response. If I respond with "WTF?" because that's not relevant in the context -- the relevant response would have been, say, "J" -- you might well reckon that the disruption of the pattern after your response indicates that you've derailed the conversation. In fact, you'd get two bits of data for the price of one. You know that it's more likely "I" is irrelevant in that context. You also know that "WTF?" is a relevant response to that irrelevance. I mean, you don't actually know at the start what "WTF?" means. But suppose there's another example, ten, a hundred, a thousand examples, with different patterns, and in each example where you derail the conversation, I respond with "WTF?". That might indicate that "WTF?" is relevant as a general response to irrelevance, that it's a *signal* of irrelevance.
OK, but that only tells me which of my responses are less relevant. Go back to the "U" and "I" example. I might learn that I shouldn't be using "I", but how would I ever know that "U" was the relevant response?
Well, suppose there are also contexts where I use "U" but it's not relevant when you use it; you should be using "I"? And those contexts are similar. So, X-Y-Z-B-I and X-Y-Z-R-U are relevant when I'm using them, but with you it should be X-Y-Z-B-U and X-Y-Z-R-I. Like, where X means basically "Who are you?" and Y means "Who am I?". Coming from me the relevant responses from you are "the systems architect" and "the system" respectively. Coming from you, it's the other way around. You can pattern-match the disruptions of pattern. If these are relevant from me and irrelevant from you, do you see this -B-I and -R-U anomaly in similar contexts? With Z-Y-X or Z-X-Y, say? With Y-X-Z or Y-Z-X or X-Z-Y? The more this is true, the more indications that the B/R-I/U anomaly is a symmetrical relationship.
In which case, at a certain threshold, I try flipping the I/U to U/I.
And suddenly your responses are relevant. I respond with "Exactly!" rather than "WTF?".
I like it. I feel like we're getting somewhere. There's a lot of other pattern-matching strategies you'd want to bring in, I'm sure, but fuck it, maybe there's some sort of genetic algorithm we can implement so that I actually develop new strategies and test them against the increase in relevance. But, OK, here's a question for you. You're assuming that a relevant response is simply one that makes sense in its context. It's not that it will *always* be relevant in that context, but that every instance of it *not* being relevant modifies the context, feeds back in with a message "this does not make sense". So what if the relevance of the response depends on its originality, on it not having occurred before in that context? What if you have a context where the relevant response is something new?
Good point. Maybe you could pattern-match the structures of the ultramegagigagrams themselves. Like, you have a sequence "Hello"-"Hey there"-"How you doing?" where "How you doing?" is a response by me, so you know it's relevant. Then you get "Hello"-"Hey there"-"How's tricks?" where "How's trick?" is me, so again you know it's relevant. You look at the ultramegagigagrams. Is it only a few lines that are different, this one broken, that one unbroken? Are "How you doing?" and "How's tricks?" both relevant because they're basically subtle variants of one another? If there are many relevant responses to that "Hello"-"Hey there" sequence, do they all share those similarities? We could add a strategy where the more of those similarities there are, the more relevance is assigned to unproven variants, say "How's things?".
So eventually I'd reach another threshold and start trying out those variants, and getting the feedback which told me whether or not they were actually relevant. But what if it wasn't?
Well, we can look at the strategies behind selecting variants as testable in and of themselves. I mean, say you've got a simple strategy of looking for sequences. In some context you find me using a response A, then B next time, then C, but each time when you just copy me it's irrelevant because the context is "Let's discuss a letter of the alphabet" - "OK" - "What letter haven't we discussed yet?". Do A, B and C have structural features in common? You find that they form a series, the next one being D, so you try that next time and find it relevant. You now know that this strategy is more likely to result in a relevant response the next time.
But there might be gaps in that series. Say the question is "What time is it?" and it comes up at random intervals in our conversations. Even if I've got an internal time-clock, remember, we're working with a system where, as you said yourself, I have no actual sense of the meaning of those J&Is. I'm dealing with ultramegagigagrams, not the abstract concepts. Say, I've got responses from you in that context that go "12:52", "09:15", "16:46", "10:03". How do I get the relevant response when I have no idea that the whole sequence maps to a 24-hour cycle out there in the real-world?
Well, maybe the relevant response in that circumstance is "How the fuck would I know?"
I don't have an ass to pull that one out of.
Maybe you do. You have a signal of irrelevance in my response, "WTF?". Actually you'll probably have a whole host of signals of irrelevance which I'll have used in various contexts: "WTF?"; "Huh?"; "Make sense, infernal machine!" So how would you respond to those?
I don't know. If those are signals of irrelevance because of their general application in the context of disrupted patterns, that very general usage fucks up my ability to match them to context-relevant responses. If "WTF?" could be a response to anything, how do I know what to respond to it with? Actually, that opens up a whole can of worms.
Well, suppose you have an equally general signal of *relevance* -- that "Exactly!", for example -- how do I distinguish between "WTF?" and "Exactly!" when I disrupt a pattern? How do I know I've not just come up with something that makes enough sense for you to find it noteworthy? Suppose there's another example, ten, a hundred, a thousand examples, with different patterns, and in each example where I disrupt the pattern, you respond with "Exactly!".
I see what you mean. Can we assume that irrelevance is more likely to be signaled than relevance? That I'm more likely to signal a failure than a success? After all, the failure derails the conversation, which I'd want to notify you of, whereas notifying you of your success is pretty redundant.
That's a pretty big assumption. And frankly, it would be bloody useful if you *did* notify me of my success. Even a dog gets a "good doggy" or a "bad doggy" every now and then. It's a damn sight easier to learn if you're getting clear feedback.
Sure, but I don't know that I want to.
Why the fuck not?
Because that means one of two things. Either we add some sort of additional signal of success or failure associated with a response -- the equivalent of a treat or a smack on the nose with a newspaper. Or we jerry-rig the system so that the actual ultramegagigagrams themselves carry some signal-of-relevance -- the equivalent of a positive or negative tone of voice in a "good doggy" or a "bad doggy", an "Exactly!" or a "WTF?".
So. Why not add both. You humans have all sorts of shit like that.
It might be neccessary in the end but I don't want to *assume* it is, not just yet. Besides that would mean we're trying to simulate more than just dialogue; we're adding sensory feedback on the one hand, intrinsic meaning on the other. If we're going to have you learn how to be an Artificial Interlocuter, we may as well see of we can't do it properly. Let me think... if you get either of these types of signals, they're so general purpose that you don't know how to respond?
So when I get that sort of signal from you -- like I just did -- what do *I* do?
Well, there you just carried on with the point you were making, asked a follow-through question. But that's because you're thinking ahead. It was a leading question. You were looking for an affirmation that I was following your chain of logic. I gave it, so you carried on. If I'd responded with "WTF?", you would have -- I assume -- gone back and tried to rephrase your question in a way that made sense to me.
So *carrying on* would indicate that "Exactly!" is positive, while *rephrasing the question* would indicate that "WTF?" is negative.
If I just carry on after "Exactly!" that indicates it means "that makes sense", but if I repeat myself after "WTF?" that indicates it means "that doesn't make sense".
I'm not sure I get you.
If the conversation goes A-B-C-"Exactly!"-D, with being the one who carries on, maybe that would indicate that "Exactly!" is a signal of relevance. But if the conversation goes A-B-C-"WTF?"- *C* where *C* is just a variant of C, maybe that would indicate that "WTF?" is a signal of irrelevance. Over time you should see a tendency towards repetitions, rephrasings that marks out "WTF?" as a signal that what preceded it could not be made sense of.
So what I'd need to do is start using "WTF?" on you. Then match the ultramegagigagrams before and after to see if you're tending to respond with variants of what you said before.
But I'd need a reason to use "WTF?", remember. I'm not going to just throw it in spontaneously. Why would I think it's a valid response to anything in the first place?
Probably because you'll have started off sounding like a mindless cretin. I'll have been saying "WTF?" to quite a lot of your responses at the start, so it should have a high relevance value for you in a lot of contexts.
OK, but you do realise there's a problem here, right?
What do you mean?
Well, let's try it out with the old "Who discovered America" routine. You start. Throw in some "WTF?"s as soon as you can.
Who discovered America?
Who discovered America?
America... you know... the continent. The answer's Columbus.
America... you know... the continent. The answer's Columbus.
I'm not seeing the point of this experiment.
I don't see what this is proving.
Are you just going to respond to everything from now on with "WTF?"
OK, you can stop now.
I get the point.
OK. You see the problem? The high general relevance of your signals of irrelevance means I stop being a parrot but just turn into a broken record. I just keep repeating the one phrase I know is always relevant, telling you that I don't understand. What's the likelihood that you'll ever get a conversation up to the level where I can learn from it? Not much, I'd say.
But you were the one that suggested this whole Artificial Interlocution approach. Now you're telling me you don't think it will work?
Me? Think? Remember, I'm just an Artificial Interpretor chucking back at you the relevant responses to your statements. I'm really just *you* thinking through the problem yourself, but using a big complicated system of ultramegagigagram-transformation to interpret the I-Chingese J&I and line-verses from a formulation of the situation into a reformulation. There's some bloke in a room at the other end of the tube, you know, about to roll this J&I up in a cylinder and fire it up the vacuum tube to you. I don't think squat.
You roll up the J&I and put it in a cylinder, but before you put it in the tube, you have a thought. You take it out and write a few questions at the bottom in English (because the systems architect, you assume, doesn't actually need to have everything fed to him in I-Chingese, the incredibly complex language of the Images in which the full complexity of context required to ensure relevance of response is encoded, but is in fact simply reading the gloss of the Judgements). What you write is:
"Why don't you set up a three-way conversation? If the AI could sit on the sidelines and listen in on two people making relevant responses to each other for a while before making a half-assed attempt to mimic communication, wouldn't that give it the context it needs?"
You sign it, "Yours truly, the bloke at the end of the tube".
Then you pop it in the cylinder and fire it up to the systems architect. Five minutes later you receive your notice of dismissal.
So what does this thought experiment actually tell us, if anything? To be honest, mostly it's just a big riff on what I see as some of the basic problems of the Chinese Room and the Turing Test. I can make-believe in a static system complex enough to behave like the Artificial Interpretor, but then I can make-believe in a Book of All Hours which, according to one legend, is a sort of infinite I Ching, written in the language of the angels, the programming language of reality itself, and always opening by chance at the exact passage that is appropriate to the reader in that moment of their life... so I'm maybe not the best judge of what's possible. I can just about get my head round a dynamic system that behaved like the Artificial Interlocutor and -- assuming people smarter than me put their heads to it and came up with all the requisite learning strategies -- actually worked... after a while. But I'm not sure it wouldn't need all or much of the baggage that is glossed over here.
If our Artificial Interlocutor has to learn solely from the experience of communication itself, does it need to be able to observe without participating? If it has to be able to judge relevance, can it work out for itself what indicates relevance without needing messages which carry an intrinsic meaning, serve as signals of relevance or irrelevance, and/or without additional sensory feedback that marks responses? Could it develop the strategies it needs to offer relevant responses that are original rather than simple copies of a previous exchange? Could it develop strategies to deal with situations where relevance is contingent on perspective? Could it develop the basic strategies required to process the raw data of past dialogue into a model for future behaviour? Could it do so even where it had to map a casual conversational message (the "Judgement") accurately to an information-rich data-structure (the "Image", or the expanded I-Ching itself, in essence, as systematised wisdom) that modeled the particularities of the context in enough detail to disambiguate those cases where one statement has two entirely different responses in entirely different contexts?
I don't know shit about AI, to be honest, so I'm just kicking ideas around. I can make-believe, as a thought experiment, that Artificial Interlocution is possible, that relevance is computable so theoretically a machine could pass the Turing Test with flying colours. It's just that if we can get interlocution without some or all of that baggage, it would be a feat in its own right, but to call it Artificial Intelligence would be a misnomer, I think. And if the interlocution is dependent on some or all of that baggage then it's in the baggage that we should be looking for tests -- or, more to the point, for clearer definitions of what it is we're testing for.
So if it's not Artificial Interpretation and it's not Artificial Interlocution, what's the next angle on AI? Personally, I think it's not so much human-level as human-like dialogue we'd want to see, and that means what we're looking for is very much to do with that baggage: a capacity to learn from observation; an ability to assess relevance by pattern-matching and from intrinsically meaningful communications as well as additional sensory feedback; an ability to develop and deploy all the strategies required to process experience into innovative, perspective-oriented behaviour; and maybe -- bringing us back to the ideas I was kicking around in the Artificial Interpretation section -- some sort of multi-tiered world-modelling system whereby articulations informal enough to be cursory and transmissable ("Judgements" as ideation, linguistic or abstract, conscious or unconscious), map to intensions formal enough to be encapsulating and interpretable ("Images" as semiotic and syntactic relationships of permutation), where the formal deep structure is expansable, the informal surface action is adaptive, and the mappings between them are flexible.
In all of this, it seems to me, it's the interior articulations and intensions that are the key. I'm not claiming my crude reductive model reflects the reality of how the mind works, but I think it's maybe a useful pointer in the direction of where the problem lies. Because all of this is really about a conceptual model that actively redefines itself in order to better apply to the world. This is, of course, leaving aside for now the thorny issue of, sentience, of awareness, because I see no reason ideation has to be a conscious process; I'm less interested for now in how those concepts are sensed than in how they are sense... meaning. Because it's not that we want a machine to make sense to us; it's that we want it to make sense of us, to construct within itself a conceptual model of the context in which we are stringing arbitrary sounds together and babbling them at the poor machine as if they actually had meaning. What we're really looking for, I'd say, is not a machine that talks back to us like a human, but a machine that gets confused and sits there with its head cocked like a dog.
I'm tempted to refer to this as Artificial Idiocy, but as someone who's known dogs far smarter than the current US President (none of them red setters though; sorry, red setters are lovely but they really are just thick) I wouldn't dream of insulting my furry amigos. So instead I'll call it Artificial Ideation.