Part-of-speech disabiguation using patterns in text

Date/Time: 
Thursday 8 December 2016, 5.30pm - 7.30pm

Venue:
Room 7140, Stoddart Building, City Campus, Sheffield Hallam University, Howard Street, Sheffield, S1 1WB

Speakers:
Mike Unwalla

Abstract

The controlled language ASD-STE100 (www.asd-ste100.org) is used for safety-critical maintenance documentation. To make text as clear as possible, each approved term in ASD-STE100 usually has only one part of speech. For example, the word 'oil' is permitted as a noun, but it is not permitted as a verb.

An effective term checker must give a warning only if a term is used incorrectly. Example:

  • Noun, correct, no message: The oil was dirty.
  • Verb, incorrect, message: When you oil the bearing, you must...

Mike shows how sets of patterns in text can be used to identify the part of speech that a term has (noun or verb). For example, in the structure ARTICLE+X+BE, X is a noun. Most text is more complex, and rules can be very complex. Although disambiguation is not always possible, it is sufficiently good to make an effective checker for ASD-STE100.

Although the presentation is primarily about part-of-speech disambiguation, Mike includes an overview of how the patterns are implemented in XML in a customized version of the open-source software LanguageTool (www.languagetool.org).

Speaker

Mike Unwalla is a freelance technical writer with more than 20 years of experience. He helps organizations to supply clear and effective user manuals to their customers. For more information, refer to www.techscribe.co.uk.

Timetable

The event will start with light refreshments at 5:30pm with the talk commencing at 6:30pm and concluding at 7:30pm.