Pull up to Climb: Part-of-speech disambiguation using patterns in text

When: 15th Feb 2018, 17:30 - 15th Feb 2018, 19:30
Where: Room 9130, Cantor Building, City Campus, Sheffield Hallam University S1 2ND
Town/City: Sheffield
Organiser: BCS South Yorkshire Branch
Price: Free
Further Information: Further Information

“In 1989, a China Airlines flight, flying in zero visibility, crashed into the side of a mountain shortly after takeoff. On the voice recorder, the last words of the Chinese pilot to the co-pilot were, “What does pull up mean?” When I first heard this story, I wondered why a pilot, presumably trained in the international English used for aviation, would not understand a command from the tower. On investigation, I learned that the official term used in “control tower” talk is climb. However, the warning system built in to U.S.-made planes issues the message “Pull up!” when altitude drops or an object looms ahead.” (Emily A Thrush, ‘A Study of Plain English Vocabulary and International Audiences’, 2001.)

The controlled language ASD-STE100 is used for safety-critical maintenance documentation. To make text as clear as possible, each approved term in ASD-STE100 usually has only one part of speech. For example, the word 'oil' is permitted as a noun, but it is not permitted as a verb.

An effective term checker must give a warning only if a term is used incorrectly.

Example:

Noun, correct, no message: The oil was dirty.
Verb, incorrect, message: When you oil the bearing, you must...

Mike shows how sets of patterns in text can be used to identify the part of speech that a term has (noun or verb). For example, in the structure ARTICLE+X+BE, X is a noun. Most text is more complex, and rules can be very complex. Although disambiguation is not always possible, it is sufficiently good to make an effective checker for ASD-STE100.

Although the presentation is primarily about part-of-speech disambiguation, Mike includes an overview of how the patterns are implemented in XML in a customized version of the open-source software LanguageTool.

 

S
M
T
W
T
F
S
2
3
4
9
10
18
23
24
25
27