Sentences, Words, and Syllables
A Thai sentence is a
single unit, words are not separated from each other by blanks.
To divide Thai sentences into words is not fundamentally different from
hyphenating European words into syllables.
An English sentence written as a Thai-like sentence would appear as:
"theflowersofthefinestgreenhousesarenotwasted"
When this sentence is divided into individual words the context will be
the decisive factor.
A "green
house" (i.e. a house painted green)
is quite different from a "greenhouse"
(i.e. a glass building for growing vegetables or flowers), but
the context makes clear that this sentence is about greenhouses.
Other sentences could be rather conflicting: "Godisnowhere"
could be divided into "God is nowhere"
or "God is now here".
This phenomenon is not very different compared to what happens to
European languages, e.g. the English language: a "rec-ord" (i.e.
report, document) or a
"re-cord" (i.e. maximum
achievement)!
The Thai people read words in context.
A last example of a transcribed Thai sentence and a word-by-word
translation (actually the Thai write sentences without spaces):
mi:
sa:mi: phanraja:
ramruaj
khu: nung maj mi:
lu:k
is husband
wife
rich couple
one not have child
If the word ramruaj ร่ำรวย is
divided into 'ram' ร่ำ and
'ruaj' รวย the word 'ram' could be placed at the end of
the sentence:
mi:
sa:mi: phanraja:
ram
However, that would change the meaning of the sentence.
'Ram' means "to scent"
(spraying perfume), so the meaning
of this text line would be changed to:
"is husband wife spraying perfume".
This is absolutely not allowed.
Newspapers require narrow columns. Therefore newspapers do wish to
hyphenate Thai words, but only at
boundaries that can not be misintepreted. The Thai Hyphenator consists
of two layers, the first layer divides sentences into words, the second
layer
divides words into syllables. The segmentation of words is
distinguishable from the division in syllables.
The Thai word for "chairman", a compound with the Sanskrit prefix pra, can be divided as: ประ^ธานี.
The word date (day of the
month or year) วันที่ can not be split into วัน and ที่, because the
meaning would become day
followed by ¹) place, ²) in , ³) those, that, plus the other words in
the sentence.
Despite this limitation a high density of hyphenation can be
realized thanks to *TALŌ's two-layer technology of the Thai language
model.