The Cassandra Algorithm - RUSSOFT
Attention: the new version of RUSSOFT website is available at russoft.org/en.
RUS | ENG

Supported by:

The Cassandra Algorithm

Forecsys Co., one of the prizewinners in the most recent Russian Innovations Competition, could be called the electronic Cassandra.

By Elena Rytsareva, Gateway2Russia
Aug 03, 2003
Forecsys Co., one of the prizewinners in the most recent Russian Innovations Competition, could be called the electronic Cassandra. Forecsys' software easily determines which bank is reliable and which isn't, which cellular customers will jump to another service, which supermarket shoppers will like a new product, which candidate voters in a certain region will vote for, and which traders at the stock exchange will go bear. However, unlike the mythological Greek seer, the software uses a mathematical tool devised by academician Yury Ivanovich Zhuravlyov and his students to see into the future, not mystical illuminations.

Oscillations of the infinite string

Forecsys's technology is based on a mathematical theory which was published in full in Soviet and foreign scientific journals about 15-20 years ago. Yet, since then nobody except a few dozen disciples of Zhuravlyov has succeeded in mastering this math in Russia or anywhere else in the world. The method is extremely difficult but works extremely easily and quickly, leaving its competitors in the dust. "It's like two cars, one automatic and one stick shift," explains Konstantin Rudakov, one of Zhuravlyov's followers and a corresponding member of the Russian Academy of Sciences. "The automatic is much more complicated but is much easier to use."

The history of the mathematical tool used by Forecsys dates back to the 1950s. At that time, only classical mathematical models prevailed. They described any phenomenon by means of a set of equations. Any college freshman is familiar with the simplest examples of these models. In a university course of mathematical physics, the problems to be solved would be "ideal" for the most part. One of them, for example, the so-called "oscillations of the infinite string," should put even those ignorant of science on their guard. Naturally, no physicist has ever seen this infinite string in real life. To understand how an ordinary string oscillates, a multitude of restrictions have to be introduced into the problem, and as a result, a single equation turns into a complex system. In reality, the same thing happens when any phenomenon - be it physical, biological, economic, or historical in nature - is described by means of mathematics. "One can count the cases when scientists succeeded in building a good mathematical model for some data domain on one hand," Rudakov notes. This is why to determine reliability of, say, a new car part, engineers still hit it, throw it, and run a lot of other tests, despite the elaborated theory of elasticity. This is why insulation manufacturers test their products for rupture, despite the theoretical fundamentals of electrical engineering. And this is why, despite the 100-year existence of radio, the quality of any radio network's coverage is determined in field tests, not in the silence of laboratories.

But even if a good mathematical model does exist, it doesn't guarantee success. It's almost impossible to describe the pattern of most "solved" systems on paper. When the computer came into being, special numerical methods began to develop and simple classical problems' "solvability" increased. But this applies only to simple problems. "With more than 5 or 6 variables, traditional numerical methods don't work," Rudakov asserts. And, say, to describe a human being's blood circulation, hundreds and hundreds of variables are needed.

Gold from the lab

In the mid-50s, an alternative approach to classical modeling emerged. "If when sitting behind your car's steering wheel, you begin to think about what is happening inside the engine, you'll never move an inch. Despite the car's extremely complex arrangement, at any moment drivers make very simple decisions - to hit the accelerator or the brake or turn the steering wheel. That's all," explains Rudakov.

With this new approach, phenomena could be described not as multitude of complex processes and long equations but as a population of precedents - a limited number of data received for describing this phenomenon under specific circumstances. This was the idea Yuri Zhuravlyov, a candidate of science at that time, used to develop his mathematical methods.

Until 1964, the future academician had been mainly occupied with theory and wrote his doctoral thesis on algebra and logic. If the party and government hadn't given him an urgent assignment, he would have remained a theorist. The assignment involved gold exploration, which, on the face of it, has little to do with algebra. Russia was rich in gold, but each confirmed deposit taken separately was minor by world measures. "The premier at the time, Alexei Kosygin, issued an order to find a gold deposit of the so-called South-African type in the USSR (there were only 8 fields of this type in the world: in South Africa, in Brazil, in Ghana etc.)," Zhuravlyov recollects. "This type of field is extremely specific. It's like a thin pancake stretched over a vast territory. The pancake's thickness can be 10-15 cm and its depth can be 2-3 km. Just try and find something like that!"

Geologist Fyodor Krendelev and geophysicist Alexei Dmitriyev came to the Institute of Mathematics at the famous Akademgorodok, or Academic Compound, not far from the city of Novosibirsk. They wanted to know whether there was a mathematical method to meet the government's demand. They turned to Zhuravlyov. "I had a look and realized that the task was absolutely hopeless: its functions had 150 variables which had to be extrapolated to 15 points to boot," Zhuravlyov recollects. "Perhaps I would have turned them down if US professor Edward Feigenbaum hadn't visited Akademgorodok the year before. He made an unforgettable impression on all of us, in part due to his scarlet-lined jacket." Feigenbaum made a speech and some extravagant statements. He claimed that not a single really difficult problem can be solved by purely mathematical means. In order to solve a difficult problem for which there is no direct method, one should use human expertise and spy on specialists dealing with the problem. Zhuravlyov decided to take his advice. "After that, we got together with geologists and began to reflect on it," Zhuravlyov says. "I kept trying to find out how exploration geologists normally looked for gold. Our meditations lasted for a long period of time - for about a year." There were only 7 fields, but more than a hundred characteristics and tons of geological prospecting data on each field. The geologists related what information they paid attention to when determining how "golden" a field was. Zhuravlyov described all this in mathematical language (later, this was called the principle of using specific descriptions).

The ultimate end of the analysis was to classify the fields. It was necessary for Zhuravlyov to draw a line through specific fields' "description space": the fields of South-African type would be on the one side of the line and all the rest - on the other.

In the end, Zhuravlyov, together with geologists, devised a field classification algorithm. "From a mathematician's point of view, it was incorrect. A normal mathematician would have thrown up his hands and left. But we decided to give it a try," Zhuravlyov relates. "First, we verified the algorithms against the available material. We would remove one of the known fields and apply the method to the remaining ones, comparing them to the discarded field. The comparison was carried out using a very complex procedure involving several tens of thousands of elementary comparisons with different parts of the description. The result turned out 100% correct - the method clearly identified the field's attributes versus the South-African type. Alexei Nikolayevich Kosygin summoned me personally, and for an hour and a half I told the premier about our methods. Kosygin was a smart guy - he understood everything, and our method was given the green light.

Reconciling the algorithms

After this golden success, Zhuravlev began increasingly to realize that "there is some math that is still unknown and is capable of solving problems similar to the geological one, whereas classical mathematics can't." Several hundred algorithms, similar to the "geological" one, had already been described by that time. All of them handled large amounts of data in complex environments and gave answers to very simple questions from various spheres of human life. In medical diagnostics, they determined whether an individual had tuberculosis or lung cancer; in politics, they tried to predict victory or defeat for a certain party; in the military, they predicted whether a nuclear site existed in a specific area of enemy territory.

Zhuravlyov began to study these new algorithms: "Like any conscientious student, I simply combed through the literature available to me in all possible languages. I collected everything that was known in the world about the subject. At first glance, all the methods looked different. Some algorithms were based on statistics, while others were based on logic or algebra. So, I decided to establish some order by building a general-purpose system that describes all these algorithms."

And Zhuravlyov succeeded. In the 70s, he fully systematized these algorithms. It was no easy thing to do. The formalization of a single algorithm took a dozen pages of mathematical text. In the end despite their external dissimilarity, all the algorithms turned out to be constructed in the same way.

Zhuravlyov invented the largest family of recognition algorithms in the world. However, even the most powerful computer was unable to pick out the optimal algorithm for solving a specific problem. In order to deal with this problem, Zhuravlyov introduced extremely simple operations on these algorithms such as addition, multiplication, and multiplication by a number. "Well, true, it's not your usual addition. As a matter of fact, God knows what should be done with these algorithms. No one knew or did this before, and most people thought such things couldn't exist in nature," Zhuravlyov comments. Basically, he created an "algorithm of algorithms."

Using these algorithm operations, the mathematicians managed to improve their accuracy and quality dramatically. Some algorithms' shortcomings could be compensated for by other's merits. "We crossed the abyss on little twigs and stones rather than jumping over it," Rudakov commented figuratively on the heart of the algebraic approach. Zhuravlyov still holds that he was simply lucky with the fundamental algorithms. "They were written by smart people and therefore proved to be so good," he says.

The mathematics of porn sites

Application of Zuravlyov's methods started back in the 1970s. He predicted a crisis in Bangladesh in 1973 based on descriptions of the situation in the country. Right afterwards, a coup occurred. He also predicted the crisis on Cyprus between Turkey and Greece. Zhuravlyov cooperated with physicians as well, trying to diagnose mental disorders, and a team led by Zhuravlyov created a system for diagnosing the so-called vibration disease common among construction workers who use jackhammers. In chemistry, they predicted the properties of diverse compounds.

By the beginning of 1990s, the algebraic approach had been well developed. At that moment, however, the state stopped supporting the sciences. Zhuravlyov, Rudakov, and their disciples had to take up applied research in earnest. Fortunately, Zhuravlyov's mathematical construction works excellently in extremely diverse situations. One particular application called "client environment analysis" has proven especially interesting and promising. The analysis is based on descriptions of client behavior in certain situations, be they traders at a stock exchange, cellular subscribers, or Internet users. It identifies and interprets client behavior types, as well as searching for interrelationships and regularities in a client environment.

The system for the Moscow Interbank Currency Exchange (MICEX) was the first successful project of the Zhuravlev-Rudakov team. In 1996, when the mathematicians arrived, trading at MICEX occurred through a computer network. The exchange was a reliable information system uniting over 3,000 computers across the country. However, organizers could only observe quotes, and could not monitor traders' actions (to make sure that trading does not use insider information or price manipulation) or establish who and what influences prices. In more scientific language, they needed to have an objective picture of the client environment of trade participants.

When the Forecsys team started working with MICEX, they had to come up with features of trade participant behavior. At first, there were 150 features, and their number is now growing. Transactions of trader behavior - records of who sold what when for how much- were the basic data provided by MICEX. "We made something that works like an X-ray," says Rudakov. The system calculates the "closeness" of traders and divides them into groups by behavior. The set of features classifying traders can be altered. One can see how a single trader affected price development and who made prices fall. "A single player can make the market fall dramatically with only a small deal, or you can put a large amount of stock on the market so that no one will notice," says Yury Chekhovich, one of the system's authors and Forecsys Deputy General Director. "No one but the computer."

The company's experience with a large Russian Internet portal was no less interesting. The so-called log-files (records about visitor number N visiting a certain page at a certain moment in time) served as a basis for analysis. After Forecsys' X-ray unit processed the information, sites fell into several distinct groups. The first group was RBK and other news portals (utro.ru, Lenta.ru, etc.). This group's users very rarely visited the sites of other groups. The second group included sites about theaters and entertainment and the third group - porn sites. Their users also fell into very distinct preference groups.

The client environment analysis technique works well with supermarket shoppers, too. Here, stores can process the log-files of regular customers' behavior. Those with discount cards are identifiable and their purchases traceable. When are goods "close"? When the same customers buy them. When are customers "close"? When they buy similar goods. Therefore, stores can identify groups of customers and groups of goods and display goods in a new way. Furthermore, within the first few days of sales, a new product can be easily assigned to a certain group of consumer preferences and put on the "right" shelf.

From data cemeteries to databases

The application of the mathematical technique developed by Zuravlyov and his disciples is far broader, however. It can be applied to the notorious average scores on school certificates that say nothing about a student's abilities or to the Dow Jones index, the arithmetical average of 30 major companies' stock quotes.

But bank reliability ratings, much discussed in Russia, exasperate Konstantin Rudakov most of all: "They take some factors out of balance, pull some out of thin air, multiply, add the results, and then call this mess a reliability index! In addition, they only look at a bank's current state. What about a bank like SBS-Agro? Just a few months before its bankruptcy it was rated second after Sberbank! If we look at bank ratings several months before the 1998 crisis and then see which of them went bust, we see an equal percentage biting the dust from each reliability group. What does Forecsys propose? To analyze bank balances for the last few years and find out which of them lost their licenses. Then, dead banks' numbers should be analyzed for the few months before their bankruptcies. There are your precedents." Based on these precedents, Zhuravlyov and his team could construct a decision rule in mathematical language - a surface that divides the multitude of banks into existing and bankrupt. Once such a surface has been constructed, an analyst can accurately determine a bank's actual state at any time. All you have to do is figure out where the bank falls.

The Forecsys technology promises great benefits to cell phone and insurance companies as well. With increasingly severe competition, these companies are taking desperate steps to attract new customers. This is especially clear among cell phone companies. Cellular operators introduce new rate deals almost every month and have already come to a dead end. When inventing new calling plans, Russian managers look at the strategies of companies in other countries, estimate competitor responsem and consider how ARPU may change. But in many respects they rely on their intuition. They have a far more powerful weapon in their arsenal. Their billing system records the time and duration of each call and whether it is incoming or outgoing. Here is an excellent database for analysis. Forecsys' specialists are ready to divide all mobile communication users into groups, the criteria being set in different ways. It is possible to draw a line between loyal customers and those who might leave. It is possible to introduce a new plan based on targeted groups of users who might want it. It is possible to analyze the elasticity of demand versus price for a relatively new service. Forecsys technology enables companies to introduce a marker in users' description space identifying calls made from stolen phones, as they are always different from the previous owner's calls. "Whether identifying unreliable traders at MICEX and unauthorized access to a communications network, our techniques work on the same principle - they spot a deviation from the normal pattern," Chekhovich explains. "Of course, statistical systems, which are now widely used by communication operators, work on the same principle. But our products are more flexible - they enable a user to establish a multistage gradation of the situation and to respond more promptly to change, for example, in the general direction of calls."

Generally speaking, the techniques based on Zhuravlyov's mathematics can work in any environment with a lot of data and customers. They can be perfectly applied by banks when forecasting depositor activity and developing new types of accounts or by the Tax Ministry when analyzing taxpayer behavior. But at present, as Forecsys' General Director Alexander Cherepnin put it in a nutshell, most Russian companies and agencies have "data cemeteries, not databases. We need to change the way people think."