Segmentation of Symbols

A recognizer can view symbols at any granularity. For instance, most handwriting recognizers see individual letters and numerals as symbols. A recognizer for cursive writing, on the other hand, may see a complete word as a single symbol without distinguishing each letter of the word.

No matter how it views symbols, a recognizer must separate them within a stream of written symbols, a process called segmentation. The task of segmenting letters is greatly facilitated if the application provides box guides. In this case, the recognizer can assume that strokes lying within a box constitute a single character. The problem of accurate segmentation becomes more difficult for unguided text.

Segmentation is a crucial issue for recognizing different handwriting styles. The following table lists the forms of input in decreasing order of constraint on the user. The information in the table is taken from IBM Research Report RC 11175, No. 50249, (May 21, 1985), An Adaptive System for Handwriting Recognition, by C. C. Tappert.

Input form	Definition
Boxed input	Each character appears within its own box.
Discrete spaced	A set of strokes in a given space belong to the same character. (This is also called external segmentation.)
Discrete run on	Printed characters can overlap.
Cursive	Letters are connected by ligatures. The recognizer must either identify discrete letters or interpret a whole word at a time.
Mixed	The recognizer can segment discrete, run-on, and cursive writing.

Figure 8.1 illustrates these various styles.

The Pen API places few restrictions on the recognizer. At a minimum, however, a default recognizer must be able to recognize discrete characters because many applications do not use boxed input.

Software for developers: Delphi Components
.Net Components
Software for Android Developers
More information resources: MegaDetailed.Net
Unix Manual Pages
Delphi Examples