Expanding the dictionary base for ABBYY Lingvo 8.0. Best Practices of Russian offshore IT outsourcing companies. RUSSOFT
Attention: the new version of RUSSOFT website is available at russoft.org/en.
RUS | ENG

Supported by:

Project Title: Expanding the dictionary base for ABBYY Lingvo 8.0

Company:Atapy Software View Company Profile
Client (Country):ABBYY Software House, Russia
Duration, months:5
Total Effort, person/months:10
Total Views:7394

The Challenge

ABBYY Software House, a world leader in OCR/ICR and linguistic technologies, develops and markets Lingvo electronic dictionaries product line. Immediately after release of Lingvo 7.0 ABBYY Linguistic Department started working on Lingvo 8.0 - the newest product version with extended language base. This work involved digitisation of world's latest best-of-breed dictionaries reproducing the modern state of languages to be included in the new version.

Needless to say that high dictionary text recognition accuracy was a must. A single mistake in a single character could break the words' alphabetical order, tear the word away from its word-building paradigm, and a number of such mistakes would make the dictionary unsearchable. Moreover, adequate interpretation of special dictionary marks was no less vital for the project. These special marks were to be used as field delimiters during the automatic database conversion process; and therefore needed to be 100% accurately recognized. Special marks appeared in the text either as text characteristics (font boldness, italization), or as special symbols (brackets, asterisks), or as a combination of the two (italization + brackets indicating a dictionary comment). In this situation omitting a single bracket or mixing an italicized word for a plain text word would lead to misdirection of valuable information. Due to all these features the project required both intelligent programming and qualified manual effort to support machine methods - a true challenge for any contractor in the media service area.

The Solution

ABBYY turned to ATAPY Software, their outsourcing partner in Novosibirsk, Russia, for digitisation of two dictionaries from the list picked out by ABBYY Linguistics Department. The 3-volume 1750-page Leping's German-Russian Dictionary and the 830-page Narumov's Spanish-Russian Dictionary were to be recognized and proofread for subsequent automatic conversion to a complex database.

"ATAPY has reached 99.992% text accuracy in the German-Russian Dictionary, which corresponds to 1 mistake per 8,760 symbols, and 99.997% quality for the Spanish-Russian Dictionary project - equals to 1 mistake per 31,500 symbols" - noted Anna Zhavoronkova, Lingvo Project Manager at ABBYY Software House. "Besides, they have done some additional work on correction of mistakes in the source dictionary text, including typographical misprints and even mistakes in special dictionary marks, almost impossible to detect without tailored programming tools and profound knowledge of linguistics."

Tools and Technologies

  • ABBYY FineReader
The dictionaries were scanned and recognized using ABBYY FineReader OCR system, which was specially tuned-up for processing of this material. Then a team of qualified multi-lingual operators proofread, and then cross-checked the results using double verification technique to ensure recognition accuracy. The double verification technique allowed to detect certain unexpected cases - such as, e.g. typos in the source dictionary text, which had been partially corrected according to ABBYY guidelines. In its effort to automate the proofreading work to maximum possible degree, ATAPY Software developed and customized a number of in-house scripts, including Italics Helper, an automatic italics detection tool and Glyphica, a tool for quick and easy element substitution. For Leping's Dictionary ATAPY developed a custom converter with built-in spell- and punctuation-checking utilities, which allowed to weed out the mistakes unspotted during the previous stages and finally convert the material into Lingvo vocabulary database.

Related links

http://www.abbyy.com/lingvo.asp?param=3587

Post Your Story, Tell All About Your Success!

If you want the story of your company to appear on the portal please fill out this form and send it to [email protected]. We would also like you to leave contact information (name, e-mail, phone) of a person who is responsible for filling the form to clarify any questions, which could appear.

Available Success Stories From Companies:

Disclaimer

All information contained in this Section is owned by RUSSOFT.org and its Participants and is protected by Russian and international copyright laws. Any reproduction or republication of all or part of this Section has to remain intact and include a notice on the copyright of RUSSOFT.org or the Participants, as applicable.

While the information of this Section has been presented with all due care, RUSSOFT.org does not warrant the accuracy, completeness, usefulness and truth of Section’s information, links and logos derived from third parties. RUSSOFT.org is not liable for any loss or damage occurring from the use of this Section’s materials.