Sayre's paradox

Sayre's paradox is a dilemma encountered in the design of automated handwriting recognition systems. A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized.^[1]^[2] The paradox was first articulated in a 1973 publication by Kenneth M. Sayre, after whom it was named.^[3]

It is relatively easy to design automated systems capable of recognizing words inscribed in a printed format. Such words are segmented into letters by the very act of writing them on the page. Given templates matching typical letter shapes in a given language, individual letters can be identified with a high degree of probability. In cases of ambiguity, probable letter sequences can be compared with a selection of properly spelled words in that language (called a lexicon).^[4] If necessary, syntactic features of the language can be applied to render a generally accurate identification of the words in question.^[5] Printed-character recognition systems of this sort are commonly used in processing standardized government forms, in sorting mail by zip code, and so forth.

In cursive writing, however, letters comprising a given word typically flow sequentially without gaps between them. Unlike a sequence of printed letters, cursively connected letters are not segmented in advance. Here is where Sayre's Paradox comes into play. Unless the word is already segmented into letters, template-matching techniques like those described above cannot be applied. That is, segmentation is a prerequisite for word recognition. But there are no reliable techniques for segmenting a word into letters unless the word itself has been identified. Word recognition requires letter segmentation, and letter segmentation requires word recognition. There is no way a cursive writing recognition system employing standard template-matching techniques can do both simultaneously.

Advantages to be gained by use of automated cursive writing recognition systems include routing mail with handwritten addresses, reading handwritten bank checks, and automated digitalization of hand-written documents.^[1] These are practical incentives for finding ways of circumventing Sayre's Paradox.

Avoiding the paradox

One way of ameliorating the adverse effects of the paradox is to normalize the word inscriptions to be recognized. Normalization amounts to eliminating idiosyncrasies in the penmanship of the writer, such as unusual slope of the letters and unusual slant of the cursive line.^[4] This procedure can increase the probability of a correct match with a letter template, resulting in an incremental improvement in the success rate of the system. Since improvement of this sort still depends on accurate segmentation, however, it remains subject to the limitations of Sayre's Paradox. Researchers have come to realize that the only way to circumvent the paradox is by use of procedures that do not rely on accurate segmentation.^[1]

Avoiding the paradox

Directions of current research

References

External links

Related Articles

Related Articles

Offline Cursive Handwriting: From Word To Text Recognition

"Chapter 17 - Hidden Markov Models for Off-Line Cursive Handwriting Recognition"

"Machine recognition of handwritten words: A project report"

"A survey on off-line Cursive Word Recognition"