Sitewide
RSS Feed:
|
By: Peter Abrahams, Practice Leader - Accessibility and Usability, Bloor Research Published: 5th January 2009 Copyright Bloor Research © 2009 |
A happy New Year to all my readers.
This holiday season was unusual in the fact that the Christian festival Christmas, the Jewish festival Chanukkah (חנכה), and the Islamic New Year Maal Hijra, all occurred at the same time.
The previous sentence raises the question as to how it should be tagged in HTML. It contains three different languages, the Hebrew in its native script and in transliteration, and the Arabic in transliteration only. To add to the complication the Hebrew script should be read from right to left whilst its transliteration should be read from left to right.
Before I try and answer this question I need to briefly explain why it is important to tag multilanguage documents correctly. The reasons include accessibility needs such as:
Besides the accessibility needs other systems may be able to benefit from knowing the language of the text:
Having set myself this holiday question to investigate I went straight onto the web. I quickly discover that there are two attributes related to internationalisation (I18n):
My next discovery was that there is an international standard (ISO 639 -1) that specifies the two character abbreviations of languages; so I found out that Arabic is " ar and Hebrew is he. Which left me with the problem of how to distinguish between Hebrew in native script and transliteration.
This led me into the world of Request for Comments (RFC) of the Internet Engineering Task Force (IETF). Being a world of standards it is by nature very detailed, precise and pedantic. This is as it has to be but it does make it difficult for a newcomer to comprehend and be able to navigate to the relevant area. I found out that a language attribute can be made up of more than one part and found a list of recognized combinations; this included 'az-Latn' for Azerbaijani transliterated in to Latin text. Thus it appeared to me that using 'he-Latn' would be a reasonable answer for my Hebrew transliteration. However, the document I was looking at said that I had to formally register it. My attempt to register it failed with a message that suggested that my formatting of the request was incorrect. Luckily I had found an e-mail address of someone who obviously understood the subject and I decided to use the personal touch rather than talk to a computer again. I am delighted to say that this approach resulted in a very quick response even though it was that their days between New Year and the restarting of work next week.
A few more e-mails from the RFC community explained everything to me. I had been looking at an out of date RFC and I should have been looking at RFC 4646. This Best Current Practice (BCP) says that a language attribute can be made up of sections relating to language, script, region and variants. The agreed values of these sections can be found here and they can be combined in any reasonable way, which includes ‘he-Latn' and ‘ar-Latn'.
So I now have the answer to my question. If you look at the source of the relevant sentence you will see that it has been tagged correctly.
It is also relevant to point out that although this article has concentrated on HTML the language attribute can be used in other forms of documentation, for example tagged PDF.
I would like to thank all those who have helped me on this journey.
It has raised two new questions for me:
My journey was more complex because Google initially pointed me at the older documents on the subject. I assume that this was because there were more references to the older documents. Is there any way we can ensure that old and obsolete documents drop down the Google search list more quickly?
I also found the standards documents difficult to understand as a newcomer. Is there any way to make them easier to understand by relatively casual users like myself. I am hoping that writing this article may help other people who are trying to solve the same or a similar problem.
Wishing everyone an accessible and usable and well tagged New Year.
We are no longer accepting comments against this item. We suggest contacting the author directly.
5th January 2009: 'Peter Abrahams' (Author) said:
Having written this article, it has been pointed out to me that the w3c has some useful documents on I18n http://www.w3.org/International/resource-index.php?topic=lang and Tutorial: Creating (X)HTML Pages in Arabic & Hebrew http://www.w3.org/International/tutorials/bidi-xhtml/ .
Which gives more detail than my article can.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761