Commonsensical/kid-centric properties of cognition

Schank, Roger, Dynamic Memory Revisited (1999), Preface, Pg. viii. Schank says that “someone has to get a computer to know what a human knows about using a toaster or playing baseball”.


Comments : 

Children understand things at just the right level – not absolutely shallow, as in rote memory, and obviously not too deep. How do children know how to do things? Take this example. A child knows how to call up someone on a cellphone. How? First go to the phone icon. Then dial the number of the person you want to connect to. Then press Connect/Dial. They don’t have structures of these steps in their minds which require analysis for those structures to emerge, or have abstract generalisations of the steps. For example, they cannot “(analytically)classify-cum-(abstractedly)generalise” these 3  steps into steps like : Access – Identify – Execute/Instruct. This is a product of an adult mind. But they know things like if you want to call someone you should go in ‘Phone’, and not in ‘Messages’ or ‘Email’ icons on the screen. The number you dial should be of the person you want to connect to; if even a single digit goes wrong, it will connect to the person whose number that erroneously dialed number is etc. So they know things at just the right level.

I think whatever kids cognize as steps in doing things, or as components of anything they grasp, meets/satisfies certain “kid-centric properties”. Otherwise it cannot enter their heads. Here, lets see these properties of the 3 steps of calling up someone, that they understand :

Step 1 – Go to ‘Phone’. Property : SAMENESS – want to phone, so go in ‘Phone’!

Step 2 – Dial the number of the person you want to connect to. Property : One of the elementary questions that would arise in anyone’s mind as to the process of “calling” (in general – in any sense or context), which is – whom do I want to call? / who is this person calling/talking to?

Step 3 – Press Dial. Property : Children are used to one-step ‘Action->Result’ (Do this -> you get that) or ‘Goal -> Action’ (Want that ->  do that) pairs. E.g. – Kill -> shoot. Send the ball -> kick. Want the door to open -> Ring the bell. The ‘press (the Dial button) and the cell-phone will connect you (to the person)’ format fits into this structure. This step is like the “punch” of instruction/signal for them, to get what they want. 

So, I think we need to identify these kid-centric properties of procedures, to understand and represent how they know how to do what they do. 

Commonsense and Logic : a concise theory

Here is a long-standing phenomenon (problem, perhaps) in Commonsense Reasoning – commonsense v/s logic. While interpreting any data, there are a lot of possible ways to do it in. These possibilities are logical possibilities. But in everyday life, commonsense takes over, and makes us choose the “commonsensical” possibility. For example, if someone says to you (say, over the phone) – …the pages of the book were moving to and fro..” you would immediately take the meaning to be that the pages were being automatically flipped (within the framework of the book). There is another logical possibility that the book was moving to and fro along the desk, which would also technically amount to “the pages moving to and fro (along the desk)”. Now the questions is – does this possibility occur to the mind and is rejected/discarded or it doesn’t cross your mind even for a second? To a sentence like ‘Mary and Sue are mothers’, Douglas Lenat once mentioned in a talk at Google that “..it wouldnt cross your mind even for a second that they are each others’ mothers’ (as in ‘Mary and Sue are sisters’.) I agree with that view. But we are logical beings too. So why dont we take the logical possibilities to this utterance and thereby first enlist the above possible cases and then decide which one to choose? What explains this “not even crossing the mind even for a second”? Below, I provide a concise theoretical explanation for the same.

Given any data, there are 2 ways in which we try to “consume” it i.e. understand. One is a memory-based approach and the other is a logical approach. The first one is predominant in everyday situations (and corresponds to our commonsense). In a memory-based approach you try to relate whatever that is coming at you to your own past experiences – simple – and try to understand it accordingly. In a logical approach you try to take all possible logical cases and deal with the situation. So when you hear about the pages of the book over the phone, you immediately resort to memory to cognize it. You recall situations in your past experience in which pages of a book have “moved to and fro” or/and have also been described in that way corresponding to a certain reality, and immediately arrive at the commonsensical possibility as explained above. This explains how you interpret the meaning without the other logical possibilities crossing your mind even for a second. This explains the immediacy. 

To see this distinction between the two approaches broadly and in general, consider this example. Suppose someone tells you, in everyday life, “the bell rang when we were having dinner”. One immediate response would be “damn frustrating / disturbing!”. Here you have related the data that came at you, to your personal memory and given a response which clearly reflects that. A scientist might have said something overly technical like “was it in the midst of the dinner, or had you just begun, or were about to finish?”
In everyday life we resort to memory-match, as against logical formulation. 

Reverse Cognition

When we consume data, it increases with time. We listen to one word, then the second word – that leads to some partial cognition – then the third word and so on. Then we cognize one sentence. Then 2 sentences, and so on. As (more and more) information comes in, the meaning unravels more and more. 

But meaning can unravel even when we delete (or discard as wrong) information from our minds. So why don’t we study reverse cognition? That is, the unraveling of meaning upon decrease of data from the system (mind)? Especially when we grow up, we learn that so much of what we thought about the world as children, was plain false/wrong.


(Maybe that can help in the debugging of programs.)

Sensory and Mental data

Is it a good idea or of any use to divide parts of data into sensory and mental, on the basis of their consumption-type?


E.g. – 
1) There are 3 types of pens available in this shop. 

Senses – pen, shop

Mind – types, 3, available. (My mind has processed sensory data and arrived at these inferences about it. These descriptions of reality are purely mental products.)


2) John gave a ball to Jack.

Sensory – John, Jack, ball, gave

Mental – —


3) John is the smartest man in the world.

Sensory – John, world

Mental – smartest, man


4) Giving gifts is a good tradition.

Sensory – gifts, tradition, giving 

Mental – good

Linguistic Completeness of Informati

I propose a method for determining the “Linguistic completeness of information” in a given piece of text.
Meanings of words are either dependent upon other entities and concepts or not. For example, suppose I use the word ‘gave’. Now, ‘gave’ means transfer of something from one place to another. This invokes the dependence or attachment of this word to 3 other entities – ‘something’, ‘one’ and ‘another’. So ‘gave’ is 3x. What does that mean? It means that whenever the word ‘gave’ is present in a piece of text, there have to be these 3 other entities somewhere around – ‘what was given?’, ‘the initial place’ and ‘the final place’.Take another word like say ‘to’. Here what is always attached to a ‘to’ is the destination. That is, ‘to the station‘ or ‘to the house‘ or ‘gave a ball to John‘ etc. So, ‘to’ is 1x. A word like ‘ball’ is self-sufficient – its definition doesn’t “depend” upon the presence of other entities. So it is 0x. 


Method : See the definition of each word in the text.

See the variables it is dependent upon for its definition.
Question for the existence of those variables in the text, around.

Once all these questions have answers, the text can be said to be sufficiently Linguistically complete.

(All these steps are programmable).

Note : 1) This obviously doesn’t include the reasons for the phenomena in the text since that is an infinite phenomenon. 
2) If there is the word ‘ball’, we aren’t looking for information like the radius of the ball or the colour of the ball, around, in the text. Those aren’t Linguistic information. We aren’t including logically, mathematically etc. dependent pieces of information (variables) around in the text, in this ‘completeness’. 

A way to represent words

This is a way to represent words in language, on a computer. 

Take a number. Say 100. I can do logical / computational operations on it, like – taking its half, taking its factorial, its square root etc. But take a word, say, ‘boy’. Can I do operations on it? Can I take (boy)! or (boy)^(1/2) or (boy)/2, giving me various properties or information about a boy? No. Because, a word is NOT LOGICALLY / COMPUTATIONALLY OPERABLE UPON.

——————-

Inspiration : If just the word “radio” is written in a sentence, it tells me nothing about a radio. I cannot do imaginary things like say, ‘strip of the first and last characters’ and I get what material the radio is made up of. OR take the central letter and follow it up with so and so letters, and I get the shape of the radio etc. 

This write-up is also inspired by QR-codes, or bar-codes on books.
——————

If following were the codes – 
10 = Male

00 = female
age range : 0-5 = 15-15 = 216-25 = 326-30 = 4…..
A ‘boy’ would be = (10)(2)

So decide some root words (like “co-ordinates”) to represent all the other words in language and express all the words in terms of those root-word-numbers (the values of the root-words). This will give operability upon the words to reveal SEMANTIC INFORMATION about the words. Every word will have a numerical code.All dictionary words will have to be converted to this code. The whole dictionary would have to be re-organised in terms of these root-codes.

Consider this example – For an instrument, the root words (in red bold) and their respective instantial codes (in black bold) are as below : 
living/non-living –     shape –        made of –            requires –           takes in / gives out –         ……….         –       …………        
100      / -100   –        50(box) –      500(plastic) –     1000(power) –         -2 (gives out)

Legend related to the above instance : 
1) If living thing, then 100, if non-living, then (-100)
2) If 50, it’s a box, say if 60, it’s a sphere etc.
3) If 500, its plastic, if 600, its metal etc.
4)1000 – power1001 – chemical energy1002 – light energy
1003 – mechanical energy….
5) +2 – takes in something-2 – gives out something

So, a radio = (-100)(50)(500)(1000)(-2) (i.e. non-living box made of plastic requiring electrical energy which gives out something) in the format ‘(l/nl)(shape)(made-of)(requires)(takes in/ gives out)’.So, quite simply,         if substring[2] == 50 :         then (its shape is a box)

Now, logical operations on this numerical word, will give me semantic properties of this word, and such operations can be used extensively, to generate various kinds of statements, throughout the program at other places.
————

The above description is vague and there would obviously be better choices of the root-words, and numerous operations one can think of on the code-words. The point is to convey the essence of the idea. 

—–

When the word ‘radio’ is entered into the machine, the machine knows automatically that, whatever, say, the 3rd element of the numerical word is, tells that the entity it is made up of so and so, the 2nd element tells its shape, etc. It doesn’t have to ‘process and understand computationally’ an additional line in English, which otherwise is needed to be added, that a radio is made of so and so (or an additional line that a radio is shaped so and so).
Effectively, it is like naming entities with their properties somehow included in their very names.

A philosophical question

Why does man converse? (I am not asking – how come man converses?)
Is just one of these 2 reasons true? 

1) Is conversation just an “intelligent action”? That is, an intellectual activity; a “subset” of intelligence. Man understands himself (his systemic make-up) and that of others of his kind, i.e. the relation of something like “speech” to the mind, and merely uses speech to get his goals served with the help of other humans, via talking to them. 
2) Actions of the members of the human species, living in co-existence, might clash against one’s intentions or might harm him, and hence it is a mechanism to “communicate” to the other person to prevent them from doing those actions.
Or is there a bigger scientific reason than the above two ones?

MODIFICATION OF MINSKY’S FRAMES : Mechanism for commonsense

Minsky says that – “When one encounters a new situation (or makes a substantial change in one’s view of the present problem) one selects from memory a structure called a Frame. This is a remembered framework to be adapted to fit reality by changing details as necessary.

A frame is a data-structure for representing a stereotyped situation, like being in a certain kind of living room, or going to a child’s birthday party”. (Minsky, 1974). So essentially what Minsky is saying is that the evoked frame is essentially commonsensical (comes from experiential commonsense – stereotyped), and the details are filled in as per the given data.
My claim, which is a sort of a distortion and reversal of the above – The evoked “frame” is something from memory which “matches” the given data. That “frame” is a particular set of data composed of details (made up of elements, which are from various different samples in one’s memory, corresponding to the data) and whatever that particular frame brings along with it i.e. its details, constitute the mind’s commonsensical assumptions for that given data, which are (obviously) not present in the given data. When some further new situation comes, this frame is referred to, to give answers as per these commonsensical contents/details of the evoked/imagined particular frame.
For example – When someone tells you that ‘He went for a haircut’. You understand, though not stated, by commonsense, that he went to a saloon. My claim is that, on hearing whatever is given to us first, we evoke from memories, ‘some person went for a haircut’ (this  is most likely made up of components – a person, going, haircut/hair – each from one’s memories). That image will automatically carry some details along with it (since they are drawn from samples from memory) – (e.g. he is wearing a particular shirt, he is roughly of a certain size, headed to a saloon / is inside a saloon etc.) and these have formed as a part of / according to our commonsense. Now, when some further situation comes up, this “frame” / image is referred to and the answers are drawn forth as per whatever they are (the commonsensical details) in that frame. So, for example, if some third person (say an absent-minded/less alert person or an idiot) around says – where has he gone? (after hearing that “he went for a haircut”) – you immediately say to him – of course, to the saloon. What kind of a question is that? This answer comes from referring to the commonsensical details of that image or “frame”.

A super-simple NLP idea

Consider these major parts of speech – 

N V Ad Adv Art Prep Conj 

7P2 i.e. 42 such permutations.

E.g. – 

N V

V Adv

Prep Adv

etc.

For each pair, write the general, elaborate meaning. 

E.g. (Proper N) + (V) – John called

Such a meaning of this fragment (pair) is – 

‘A person named N did the action of V’ – ‘A person named John did the action of calling’

Other meanings could be with replacing the words (N and V) with their synonyms / (and more importantly) meanings from the dictionary.

These general meaning elaborations will have to be hand-typed. Thus is generated a database of the pairs and their general, elaborate meanings.

Now, given a sentence, take all the pairs of successive words in it. The program will then generate their meanings from this database. The resultant set of sentences would, to a great extent, be the cumulative meaning of the sentence. The set will also contain the various different interpretations of the sentence (due to the synonyms / (and more importantly) meanings of the words of the sentence, from the dictionary).

What ‘commonsense’ is this?

Consider this sentence someone tells you – 

The doctors did the surgery quickly.

There are obviously 2 parts to the cognition of this sentence – 

1) In cognition, we, first, obviously consider / understand what is given explicitly (that there were more than one doctor, they did a surgery, they did it quickly etc. This obviously involves understanding the meanings of the words making up the sentence).

2) Then, we consider the implicit part – the commonsense. In this case, we cognize (assumedly) that this happened in a hospital, the doctors were adults etc.

I wish to draw forth your attention to another part of the cognition of the sentence.

Now, when we hear this sentence, we obviously don’t know a lot of things – like,  whose surgery was done?, what surgery was done? who were those doctors? etc. These questions arise in our mind. The basis for the arousal of the blank slot of say, ‘what surgery was done?’, is the dormant cognition/”registry” of the fact that ‘if it was a surgery => it must be a surgery of some KIND’. It is the cognition of this super-simple rule that leads to the question – what (kind of) surgery was it? This is the commonsense (the rule) I am talking about. This is something more basic than something like – the event must have happened in a hospital (as given as an example in part (2)). Also, as a general additional point, what else can explain the arousal of a question like ‘what kind of a surgery was it?’ on being told there was a surgery done? 

(Other such rules here, corresponding to the questions :

a) If they were doctors => they must be some particular doctors.                                                            b)If the surgery was done => it must be done of someone (this is a bit more “distant” an implication than the above one).