One of the missing features in Hunspell (the program used for the Khmer spelling checker) is that when two words are incorrectly joined together it suggests the two words be split into two separate words with a space between them. This causes more work for the user as they need to then delete the space and add a zero-width space in its place. We requested a feature change in Hunspell to suggest a zero-width space between words for Khmer rather than a visible space.
The National Council of Khmer Language has a website with some useful information on standardized spellings in Khmer (specifically for modern terms). You can visit their website here: http://www.nckl.gov.kh
and view documentation on chosen spellings here
We’ve just opened up a Khmer translation portal at Crowdin.net
The purpose is to collect as many translated documents as possible to import both into the translation memory as well as build the term glossary so that current and future translation projects can be based on the work of what has already been done in the past. Software translation can be difficult, but since some software has already been translated into Khmer, there is quite a bit of cross-over, so by using the Crowdin site, you can benifit from the collaborative work of others.
The translation memory currently has 27,921 strings, and the term glossary has 4,482 terms with more on the way. All these tools can be edited by users, as we as downloaded for use in other programs.
If you have ever tried viewing a Khmer Unicode document on a PC that was created on a Mac you might find that the font fails to render correctly. This is because Mac fonts do not completely follow the rules for Khmer Unicode in the same way that PC’s do.
But now Didi from SIL has revised his Mondulkiri font to deal with this issue. By using the Mondulkiri font, it will force you to type Khmer Unicode in a way that it will display correctly on both PC and Mac. This is a great step forward for Khmer Unicode.
We recently ported our SBBIC Khmer keyboard to Mac. We added a colon symbol (“:” with right ALT+L or OPTION+L on Mac) as well as a dash (“-” with right ALT+D or OPTION +D on Mac). The keyboard is based on the Khmer OS and NiDA keyboard.
1. Unzip the keyboard layout by either simply double clicking the zipped file or by using other software like StuffIt. Safari unzips automatically.
2. The keyboard will either have the extension .keylayout
3. In the Finder, choose Go > Computer or type Shift-Command-C. This opens up your account folder.
4. Expand the Macintosh HD item, then the Library item, scroll down to find Keyboard layouts.
5. Drag the keyboard layout you saved earlier into the Keyboard layouts list.
6. Log off the computer or restart it.
7. Open System Preferences > Language and Text. Click the Input Sources tab. Scroll down until you find Khmer SBBIC V2. Make sure the checkbox is selected. The layout is now ready to use.
8. To access the key layout, click on the flag at the top of your screen at the right hand corner, Select the keyboard layout from the list. Or type Command-Space to scroll through your language options.
9. The keyboard will be listed as Khmer SBBIC V2.
10. If you cannot find a letter, click on the flag at the top of your screen at the right hand corner, Select Show Keyboard Viewer
Want to use Khmer numbering for page numbers in InDesign? Download this script and place it in the InDesign scripts directory. When you run the script you should see a new paragraph style called “Khmer” which will use Khmer numbering if applied as the page numbering paragraph style. Let us know if you have any questions in the comments.
There are two bug requests that we would like to see fixed in LibreOffice that would benefit Khmer. Would you take the time by commenting on each bug stating reasons why it is important that it be fixed?
Here are the two bugs:
1) While LibreOffice can automatically line-break Khmer, currently it cannot correctly check spelling without a user manually inputtng zero-width spaces – we would like to see this fixed so that users no longer have to type zero-width spaces between Khmer words in order to use the Khmer spelling checker: Update: THANK YOU! This bug has been fixed!
We are pleased to announce that LibreOffice Pre-Release 3.6 (Download: LibO-Dev_220.127.116.11.beta2_Win_x86_install_multi.msi or newer) now incorperates the latest ICU version which has the ability to automatically line-break Khmer Unicode (which we posted about previously here). This means you no longer have to manually add a zero-width space between words in order to correctly line-break in your documents! The screen-shots below show a sample LibreOffice document in LibreOffice 3.5 (that does not automatically line-break Khmer), a document with manual zero-width spaces added, and a document in LibreOffice Dev 3.6 with automatic Khmer line-breaking. As you can see the results are looking good!
LibreOffice Without the New ICU Automatic Khmer Line-Breaking
LibreOffice with Manual Word-breaks Added
LibreOffice Dev 3.6 With Automatic Khmer Line-Breaking
The automatic word-breaking does not yet currently work for spell checking, so in order to spell check in Khmer you will still need to continue to manually add zero-width spaces between words – but this is a great step forward for the Khmer language on computers! And hopefully in the near future we will no longer need to manually add spaces between words in Khmer in order to spell check.
Please try out the new LibreOffice pre-release and let us know how it works for you. Any issues you have with line-breaking (if something breaks incorrectly), please let us know in the comments so we can work towards debugging and increase the accuracy of the word-breaker in ICU. Special thanks to George for helping us make this a reality.
If you haven’t yet, make sure and check out www.glosbe.com Already there are quite a few entries for Khmer to English (including the SBBIC dictionary), and more is on the way! It is also very easy for others to collaborate by editing/adding new translations so it has great potential!
We’ve been working on getting code into ICU to allow Khmer Unicode to automatically break between words and the newest release of ICU now includes a Khmer word breaker. But access is difficult (unless you are a programmer). So we have made a small program that uses ICU and will allow you to use the Khmer word breaker in Linux (Windows will come soon). We’ve only tested this on Ubuntu 11.x so please test it and let us know if you have any problems. There is still room for improvement, so please let us know how it works for you.
The word-breaker is currently dictionary based, so it will work best on documents that have correct spelling. In the future we hope to add additional programming that will better deal with “unknown” words.
To use the program in Ubuntu place the Unicode .txt file you want to break in the same directory as sbbic-khmer-breaker.out and open the console to the directory where sbbic-khmer-breaker.out is located and type: ./sbbic-khmer-breaker.out yourinputfile.txt youroutputfile.txt (changing the names of the text files to the names you desire).
Again, if you have any issues, please don’t hesitate to ask in the comments.
This latest release includes the ability to ensure all quotes and brackets have been closed as well as adds some additional word-coherency checks (to make sure you followed the same style of spelling throughout your document – our list adheres to Chuan Nath’s spelling whenever possible).
This extension can be used with: OpenOffice
which are both free, and opensource word processors.
Please let us know in the comments if you have any trouble, or would like any additions to the grammar checker.
We just came across a site called Tatoeba that is a community designed to create an online sentence dictionary. Khmer has not been officially added, but it is in the works (you can add your own sentences and translations and then add them to the public Khmer list here: http://tatoeba.org/eng/sentences_lists/show/765
UPDATE: Our solution does not yet work perfectly – line-breaks do not work with a hair space, so we are still in the process with Adobe trying to find a solution that will work without any issue.
With the release of InDesign CS 5.5 Hunspell dictionaries are now supported. This means we can use the SBBIC spelling dictionary with InDesign! There are some issues though, because InDesign was not tested fully with Khmer, but we are able to get around them (even though it makes things a bit complex). Right now our solution is MAC ONLY because I don’t have my PC here with me – but we will include PC instructions soon (and they won’t me much different than the Mac instruction).