Tomsk software recognizes emotions by analyzing voice
Scientists in Siberia’s Tomsk are developing software to recognize human emotions by analyzing one’s voice. The solution can be used in testing a new employee’s professional fitness
Oct 08, 2013
The developers work at the department of complex computing systems security of Tomsk State University of Control Systems and Radio Electronics (TUSUR).
The voice recognition solution is believed to be able to identify not only a person’s emotional state but also the way he or she feels physically. This is an extremely important factor in a kind of work that requires utmost concentration, especially when human lives depend on how a specific employee performs.
According to Anton Konev, an associate professor at the TUSUR department, to show his emotional and physical fitness for certain types of occupation, all one is requested to do is pronounce a few phrases. If the person is sick or in a state of hyperarousal, his voice will betray this, and the software will see whether there are anomalies.
The approach is reportedly based on the assumption that the way a person pronounces sounds and words is dependent upon his or her psychophysiological state. When one is overly tense or angry, he pronounces sounds faster than in a state of contentment when he tends to drawl.
In addition to the length of speaking or uttering sounds the software is said to be able to show a range of other parameters, including, for example, the frequency of a voice pitch, also known in physiology as fundamental tone, which varies with vibrations of one’s vocal folds. A physically and emotionally fit person produces smooth speech with the fundamental tone barely changing; in a state of superexcitation, his fundamental tone shows sharp rises and drops.
In addition to determining the state of a person before hiring for certain work, new TUSUR solutions can also be used in forensic inquiries, when a court has some voice recorded and identifying who the voice belongs to is required.
According to the project developers, the weak points of many prior solutions in voice recognition stem from their disregard of the speaker’s emotional state. One’s tiredness, sorrow, joy and other emotions tell on the way he or she speaks.
"Our software algorithms are based on intrinsic biological features that our auditory system has in perceiving the speech signals," the scientists said.
The previous generation of similar Tomsk solutions has been successfully used for the past ten years in the rehabilitation of oncology patients after surgeries on their phonation organs.
The voice recognition solution is believed to be able to identify not only a person’s emotional state but also the way he or she feels physically. This is an extremely important factor in a kind of work that requires utmost concentration, especially when human lives depend on how a specific employee performs.
According to Anton Konev, an associate professor at the TUSUR department, to show his emotional and physical fitness for certain types of occupation, all one is requested to do is pronounce a few phrases. If the person is sick or in a state of hyperarousal, his voice will betray this, and the software will see whether there are anomalies.
The approach is reportedly based on the assumption that the way a person pronounces sounds and words is dependent upon his or her psychophysiological state. When one is overly tense or angry, he pronounces sounds faster than in a state of contentment when he tends to drawl.
In addition to the length of speaking or uttering sounds the software is said to be able to show a range of other parameters, including, for example, the frequency of a voice pitch, also known in physiology as fundamental tone, which varies with vibrations of one’s vocal folds. A physically and emotionally fit person produces smooth speech with the fundamental tone barely changing; in a state of superexcitation, his fundamental tone shows sharp rises and drops.
In addition to determining the state of a person before hiring for certain work, new TUSUR solutions can also be used in forensic inquiries, when a court has some voice recorded and identifying who the voice belongs to is required.
According to the project developers, the weak points of many prior solutions in voice recognition stem from their disregard of the speaker’s emotional state. One’s tiredness, sorrow, joy and other emotions tell on the way he or she speaks.
"Our software algorithms are based on intrinsic biological features that our auditory system has in perceiving the speech signals," the scientists said.
The previous generation of similar Tomsk solutions has been successfully used for the past ten years in the rehabilitation of oncology patients after surgeries on their phonation organs.






