Artificial intelligence or, for short AI, is one of the great objectives of computer science for many years. The question is, what is behind all those attempts to approach the creation of an intelligence that seems human? How can we get a computer to understand us, or to recognize our face, or to reason?
Today in TechGIndia we will see that aspect of artificial intelligence, the technique and the algorithms behind the virtual assistant that answers you when you ask a thing, the mobile that recognizes your face or the car that knows how to react in a complicated situation.
Although defining exactly what intelligence is is quite complicated, in practice we can say what we want from a system that is “intelligent“. We want him to be able to recognize patterns (images or sounds) so he can receive information from the outside world. We want you to be able to learn by filtering useful information and saving your new discoveries. And finally, we want you to also reason and be able to reason and deduce – in other words, to be able to create knowledge.
How does a system learn?
We can say that the most important part of an artificial intelligence is the ability to learn, to change its behavior based on new information. It is, in a sense, the most spectacular part of all this: seeing how as you train a system improves and ends up being able to do things for which you had not been explicitly programmed.
Although there are many tools that make a system “learn”, there is a fairly simple one that is very useful to us. Antispam systems often use what is called a naive Bayes classifier. This algorithm is the application of a probabilistic theorem, the Bayes Theorem, which for the purposes of this article does not interest us too much. This classifier, as its name suggests, what it does is classify. Obviously, we can use it to classify an email as spam or as “no spam”.
And how does it do it? Using a fairly intuitive approach. From the mail we extract several data, such as how many times each word appears. So, we look at how much that data is worth in other emails that we have marked as spam. If in all of them the word “Viagra” always appears and in which we are analyzing it also appears, then the email is more likely to be spam. Easy and simple. To improve the algorithm, we also consider what is the probability that a random email is spam: if 99% of the messages we receive are, then it is very likely that the one we are analyzing is even if it does not have many words “suspicious”. On the contrary, if we just receive spam, we will need stronger evidence to classify it as spam.
But the interesting thing about the classifier is how it learns. Each time you press the button “this is spam” or “this is not spam” those values are updated to improve the classification. With each indication you give about the nature of an email, you will learn better to distinguish one from the other.
The point is that, although this seems very simple, in the end it tells us how many of the artificial intelligence systems, mainly those of classification, learn. It is passed some data and its expected classification, and adjust the system values to “absorb” that knowledge and apply it to new entries. Neural networks, which are lately giving very interesting results, apply that same idea: they adjust little by little the weights of their “neurons” until what is expected.
Machine learning imitates our behavior, although without the limitations of imagination and intuition
Although this approach may seem simple, it is actually very powerful. In the end, it is nothing more than imitating our learning process, but without the limitations of our imagination, prejudices and intuition. Thus, machine learning systems are able to extract the most important characteristics and synthesize the information they receive. As an example, one only has to see the great advances that have been made in voice recognition with deep neural networks. These systems can be trained: they are fed with “pieces” of audio and their transcription so that they learn how speech works. Thus they extract rules that, although probably incomprehensible and strange for us, serve to identify words much better than other models inspired by assumptions of how our language and our voice work.
Unfortunately, not everything is perfect, and machine learning is not always as direct as “pass some data, learn and now“. It is, in fact, more art than science. The main problem that one can find is that of overfitting: a system that works perfectly with the example data but works terribly wrong with other data. For example, our antispam classifier could have a problem working in the real world if we train it with the mails of a Viagra commercial. Other cases are even more curious, such as neural networks that classify abstract images as if they were a bikini or a television remote.
The other problem is to see what kind of classifier to use. It turns out that there is no “supreme classifier”. Neural networks are quite relevant for the results they achieve, but they are not the only ones or the best algorithms for certain problems. For example, to identify the structure of proteins, support vector machines (SVM) give better results than neural networks. And for the classification of spam, Bayesian and statistical learning methods achieve very good results with a very simple implementation.
Creating new knowledge: deduction and reasoning
Learning is fine, but we can not leave out something very interesting: reason and make deductions to add more knowledge to the system. The first option is to use logic. There are up to several programming languages dedicated to it. The best known is Prolog, with a paradigm very different from that of more common languages.
The idea is that instead of giving instructions on how to do something, we simply give some logical rules and facts and the system deduces the solutions. For example, you can solve the problem of the farmer, the wolf, the goat and the cabbage crossing the river without telling how. Simply write the riddle rules formally: you have to tell Prolog what it means to “cross the river” and what states are not allowed (that is, you have to tell him that the wolf can not be with the goat, nor the goat with the cabbage). Then, you explain what you want: a list of movements that take everyone from one side of the river to another. Finally, you ask him if there are any series of movements that solve that riddle and, by magic, he answers you.
It is not, of course, magic. Prolog uses the rules of logic to reason (everything is said, much faster than you and me) and see what the solution is. In this way, you avoid having to tell him how to solve the question. It may seem a trivial saving, but it is not when you have lots of facts, rules and possible questions: a logical system can face questions that its creator would not even have imagined.
However, here we encounter several problems. The first is easy to see: you have to the base knowledge. It does not help us to reason if we do not have anything to do it. You have to create that foundation, and it may not be easy. On the one hand, it is complicated to list all the rules and relationships of a particular field. On the other, it can be difficult to write it formally and order it for a computer to understand. Think about it: would you be able to express without a doubt everything you know?
There is another additional problem, and that is that the real world (especially the world we perceive and that humans talk about) does not quite fit well with something as strict as logic. It goes without saying that humans are anything but logical, and that not everything is true or false.
For the first there is little to do, but the second has a solution: fuzzy logic. Let’s say that we are part of a team to develop a voice assistant style Siri, Cortana or Google Now, and they tell us that we have to get the assistant to respond to the user when he asks if he has to put on a coat to go out and / or carry an umbrella.
It seems easy: if it’s cold, it has a coat, and if it’s raining it has an umbrella. Now, what temperature is exactly “cold”? And are we going to make the poor user carry an umbrella if four drops are falling, when he could only take his coat? But how do we know if four drops fall or it rains? Do we make the user pull the phone out the window to see what happens?
Fuzzy logic allows to work with imprecise concepts such as “very cold” or “it rains a little”
To solve it, instead of complicating us trying to define those imprecise concepts that humans also give us, what we do is accept them in our system and work with them. Using fuzzy logic, we will write style rules “if it’s hot, do not wear a coat”, “if it rains a little, wears a coat” or “if it rains a lot, it has an umbrella”. Underneath, those predicates do not become “yes or no”, but are assigned grades in a range between 0 and 1 (or 0-100%, for convenience). For example, to the question of “it is hot” our diffuse logic system would not answer “yes”, but something like “50%”. Equally, the action would not be or take or not take shelter, but take it with a certain intermediate degree (of course, then we would transform that into concrete actions to avoid telling the user “is 80% overcoat and 20% umbrella”) .
Although you may think that I am kidding you, in reality this is very useful because it reflects well our human reasoning. For example, if it is raining a little more than four drops but it is very hot, our assistant would tell us not to take anything: as it is very hot, the option “to wear a coat” loses points, so to speak, and even if it rains something the recommendation would be not to wear a coat.
Precisely because it seems more like our thinking, the diffuse logic has had many uses. One very famous is the control of certain subway trains in Japan, more efficient and smoother than human drivers. It has also served to give some more life to such famous films as the trilogy of The Lord of the Rings, creating huge groups of animated 3D characters that move and respond realistically depending on what they have around them.
When the computer speaks your language: language processing
We still lack part of the intelligent systems: natural communication. It is perhaps the most striking part, which allows a computer to communicate with us and us with it, without having to learn programming languages or thousands of commands with little sense. This is the area of AI called natural language processing.
The first part is that the computer is able to understand a natural phrase. There are several possibilities to achieve it. One of them, the simplest, is to find matches in a database of predefined actions. In other words, when you say to your voice control program “Play the rock list”, what it will do is detect the word “play”, the word “list” and “rock” and, using instructions that the programmers have included explicitly, will look in your music for a list that is “rock” and will play it.
The point is that this approach is limited. We can not say that a system is intelligent if all we have done has been to pre-program the answers explicitly. We want a system to understand even the phrases that “hear” or “read” for the first time.
To solve this problem, one of the most interesting techniques used is to use something that you should remember about the institute: syntactic analysis. Knowing the structure of language, grammar, a computer can convert a phrase such as “What is the headquarters of the ministry that is responsible for agriculture in India?” in a query that makes you look for, among the ministries of India, the one that among its responsibilities has “agriculture” and that from it takes out the “headquarters” property.
Do you remember the syntactic analysis of the institute? The same can be used for a computer to “understand” what you want to say
The system will also need the meaning of the words. The syntactic analysis can tell you that the “ministry” is in “India”, but you would need to know that “India” is a country to be able to understand what exactly “being in India” means.
Something like that is what the SHRDLU system, created in 1971, did. Using the grammar of English, the system was able to understand the questions and commands that were given about a virtual world with various objects, such as cubes and pyramids.
It is a complex task, as you can imagine. You have to have well organized and classified the words and all their possible meanings (what is called the lexicon). Must take into account the ambiguities and what is taken for granted in the language, and also the context. For example, what would a computer do with the phrase “The first floor of the house is a mess”? We can be talking about a plant or the first floor, that depends on the context. In addition, the computer should know that “being a disaster” is a phrase made and we do not mean that the plant we had in a pot has become a hurricane or something.
To overcome these problems, techniques are often used that do not require so much “understanding” of the language. Identification of keywords or names, summaries, classification of sentences or analysis of feelings are some of the tactics used to make a computer “understand” what a human is trying to say.
An example is the IBM Watson system, capable of playing and winning humans in a contest called Jeopardy! which consists of answering trivia questions on various topics as quickly as possible. For this, Watson uses several techniques to understand the question and many others to find the answer in its database. If several techniques provide the same answer, the system considers it good. It is a pragmatic approach: as it is difficult to simulate the human mind, it is better to use several different techniques that, although they are not “intelligent” and do not really understand what is being asked, are capable of giving answers and are not so complex to program.
The other part of language processing is to go the other way: generate a natural text that represents the knowledge that the system wants to transmit. Also, here are several approaches: from the use of predefined templates (for example, a voice assistant when you ask him what time he is going to answer) “Today / tomorrow / the day X will rain / make sun / snow / dilute”) systems that, again, “understand” the grammar and translate its formal representation into a syntactic structure and then look in the lexicon for the words they need to create the phrase.
Joining the blocks and creating “intelligence”
Throughout this article we have been discussing some techniques (not all, and of course not with complete precision) to achieve certain aspects of an “intelligent” system. What is missing is to unite them to create complete systems.
For example, how do voice assistants work like Cortana, Siri or Google Now? On the one hand they have to pass your voice to a text, and then understand that text and make an action. For the former, they use neural networks (Apple is in the process of doing it with Siri) rather complex, which train with many sounds to translate voice to text efficiently. In fact, they send some commands that you dictate to telephones to humans that interpret them and they pass it back to the neural network to improve it even more.
The other part of this type of assistants is the processing of the language, using the techniques that we saw to try to “understand” what it is that you want. Now, what they are not capable of is to hold a conversation and reason. That is a higher level, which requires the logical techniques of deduction.
The SHRDLU system that we mentioned before did reach that level. Once the language processing part translated the sentences into formal representations, it added the new facts to its knowledge base and was able to “learn” that, for example, you can not put a cube on top of a pyramid. Based on those facts and properties, SHRDLU was able to answer questions like “What can I put on top of a bucket?” using your knowledge base and the logic to make deductions.
The autonomous car, which is now very fashionable, can also use artificial intelligence to function. On the one hand, you can have automatic learning systems to detect cars and pedestrians based on the information from sensors and cameras: thus, you do not have to directly program the patterns to see that those pixels in the image are a pedestrian, but tell the system in what images there are pedestrians and where they are placed so that they “learn” to detect them. In addition, they can use fuzzy logic to control navigation and driving smoothly, similar to how a person would drive.
Of course, here is not everything that exists or is needed to create “intelligence”, but we have managed to see, in broad strokes, what is being done to advance on that path towards the imitation of human intelligence.