Project Title: Expanding the dictionary base for ABBYY Lingvo 8.0
The Challenge
ABBYY Software House, a world leader in OCR/ICR and linguistic technologies, develops and markets Lingvo electronic dictionaries product line. Immediately after release of Lingvo 7.0 ABBYY Linguistic Department started working on Lingvo 8.0 - the newest product version with extended language base. This work involved digitisation of world's latest best-of-breed dictionaries reproducing the modern state of languages to be included in the new version.Needless to say that high dictionary text recognition accuracy was a must. A single mistake in a single character could break the words' alphabetical order, tear the word away from its word-building paradigm, and a number of such mistakes would make the dictionary unsearchable. Moreover, adequate interpretation of special dictionary marks was no less vital for the project. These special marks were to be used as field delimiters during the automatic database conversion process; and therefore needed to be 100% accurately recognized. Special marks appeared in the text either as text characteristics (font boldness, italization), or as special symbols (brackets, asterisks), or as a combination of the two (italization + brackets indicating a dictionary comment). In this situation omitting a single bracket or mixing an italicized word for a plain text word would lead to misdirection of valuable information. Due to all these features the project required both intelligent programming and qualified manual effort to support machine methods - a true challenge for any contractor in the media service area.
The Solution
ABBYY turned to ATAPY Software, their outsourcing partner in Novosibirsk, Russia, for digitisation of two dictionaries from the list picked out by ABBYY Linguistics Department. The 3-volume 1750-page Leping's German-Russian Dictionary and the 830-page Narumov's Spanish-Russian Dictionary were to be recognized and proofread for subsequent automatic conversion to a complex database."ATAPY has reached 99.992% text accuracy in the German-Russian Dictionary, which corresponds to 1 mistake per 8,760 symbols, and 99.997% quality for the Spanish-Russian Dictionary project - equals to 1 mistake per 31,500 symbols" - noted Anna Zhavoronkova, Lingvo Project Manager at ABBYY Software House. "Besides, they have done some additional work on correction of mistakes in the source dictionary text, including typographical misprints and even mistakes in special dictionary marks, almost impossible to detect without tailored programming tools and profound knowledge of linguistics."
Tools and Technologies
- ABBYY FineReader
Related links
http://www.abbyy.com/lingvo.asp?param=3587Post Your Story, Tell All About Your Success!
If you want the story of your company to appear on the portal please fill out this form and send it to [email protected]. We would also like you to leave contact information (name, e-mail, phone) of a person who is responsible for filling the form to clarify any questions, which could appear.
Available Success Stories From Companies:
Disclaimer
All information contained in this Section is owned by RUSSOFT.org and its Participants and is protected by Russian and international copyright laws. Any reproduction or republication of all or part of this Section has to remain intact and include a notice on the copyright of RUSSOFT.org or the Participants, as applicable.
While the information of this Section has been presented with all due care, RUSSOFT.org does not warrant the accuracy, completeness, usefulness and truth of Section’s information, links and logos derived from third parties. RUSSOFT.org is not liable for any loss or damage occurring from the use of this Section’s materials.






