TLex Detailed Overview
TLex in brief
User interface / Real-time article preview
Customising the language of the meta-language
"Smart References" (automatic cross-reference tracking)
Sounds and images
Automatic Media File Linking
Bilingual editing features
Automatic lemma reversal
Translation Equivalent fanouts
Dynamic "Smart Styles" [Advanced]
Multiple dictionaries from one database
Multiple "views" or sets of styles
Customisable dictionary grammar (DTD editor) [Advanced]
Entry locking (check-in/check-out) system
User statistics / monitoring
Integrated corpus query tool
XML Line by Line Importer
Ruler Tool [Advanced]
Advanced Statistics And Statistics Filters
Scripting language and 'formulas' [Advanced]
Online dictionary support
Online dictionary module
Online dictionary logging and usage analysis module
Extendibility: TLex API (Application Programming Interface)
Fully customisable sorting
Fully customisable input/output architecture
TLex can handle virtually all of the world's languages (thanks to full Unicode support throughout), and includes features such as immediate article preview, integrated corpus, full customisability, automatic cross-reference tracking, automated lemma reversal, Online and Electronic Dictionary modules (for Web or CD-ROM publishing), export to MS Word format and typesetting software such as Adobe InDesign, and also includes teamwork (network) support.
TLex can be used to compile monolingual, bilingual and multilingual dictionaries, and supports the production of dictionaries in hardcopy (printed), electronic (CD-ROM), and online (Web) formats. The TLex Suite also includes tlTerm, a similarly sophisticated termbase system for the compilation and management of terminology, that also aids translators.
One of the goals in the development of TLex has been to produce a dictionary compilation tool that is easy to learn and use, based on the principle that users should not need to have an advanced level of computer literacy in order to compile dictionaries.
A freely downloadable reader application is available that can be used to view TLex, tlTerm and tlDatabase files.
Following is a summary of some of the main features of TLex.
At all times, TLex displays a list of all lemmas in the dictionary at the left of the main language window, from which lemmas may be quickly selected for viewing or editing. A WYSIWYG (what you see is what you get) preview of the currently selected article is shown on the right. This preview updates immediately in response to changes.
One useful feature of the article preview area is that related cross-references are also immediately shown, that is, all articles that are cross-referenced from, or have cross-references to, the current article.
Click-to-edit: Another useful feature of the article preview is that clicking on any field within the preview will immediately take you to the appropriate area(s) for editing that field.
Note that fonts, formatting, colours etc. are all customisable via the TLex styles system. Style settings are also preserved when generating formatted output (i.e. to Rich Text Format (RTF) for Microsoft Word, OpenOffice or WordPerfect, or to Adobe InDesign or QuarkXPress).
Article preview area (on right), showing articles that have cross-references to (and articles cross-referenced from) the currently selected article. (Data Copyright Northern Sotho Lexicography Unit, Pan South African Language Board.)
With TLex, it is possible to customise the language used to display information such as part of speech tags, usage labels and cross-references. The language for these tags can be changed throughout the entire dictionary simply by selecting a different set of text labels for these elements, as shown in the screenshots below. This allows different language versions of a bilingual dictionary to be produced, from the same database, depending on the language of the intended target audience, allowing mother-tongue speakers to use the dictionary 'in' their own language.
This functionality carries through to the TLex online dictionary and electronic dictionary modules. In this online Northern Sotho - English linguistics terminology list, for example, the user interface is available in both Northern Sotho and English, and the software has been configured to automatically select the appropriate language used for the various tags in accordance with the language selected by the user for the user interface of the dictionary.
Toggling to English as the language of the meta-language (e.g. "noun", "verb"). (Data from the Oxford Bilingual School Dictionary: Northern Sotho and English.)
Toggling to Northern Sotho as the language of the meta-language (e.g. "leina", "lediri"). (Data from the Oxford Bilingual School Dictionary: Northern Sotho and English.)
Another use for customising the tags is to select abbreviated tags for output formats where space is limited, such as paper or cellphone text message, but to use longer, unabbreviated tags in formats where space is effectively unlimited, such as for electronic dictionaries and online dictionaries.
One of the most powerful, time-saving features of TLex is its ability to keep track of the homonym numbers and sense numbers of cross-reference targets, and automatically update the cross-references whenever the homonym or sense numbers change in a cross-reference target entry. For example, if a user has created a cross-reference from an entry to sense 2 of the target entry, and senses 1 and 2 of the target entry are swapped around, TLex will automatically update the cross-reference to refer to sense 1.
TLex also displays a real-time preview of all cross-referenced entries related to the currently selected entry as you work - both incoming references, and outgoing references. This allows you to immediately see if the cross-references are correct. It also prevents common problems that usually occur in dictionary production, for example the inadvertent creation of dead references when a user deletes an entry that has an incoming cross-reference to it elsewhere in the dictionary.
TLex thus ensures full cross-reference integrity at all times.
"Smart References" in action in a monolingual dictionary of the Northern Sotho National Lexicography Unit in South Africa.
The search function allows the entire dictionary to be rapidly searched for some given text. Search options such as case-sensitivity and "find whole word only" are available. Regular expressions may also be used in the search function.
The filter function allows you to specify criteria for viewing, and working with, a subset of dictionary articles, for example, "show all plural nouns", or "show all entries that aren't yet marked with a part of speech".
Filter conditions for inclusion (conditions that must be met) and exclusion (conditions that must not be met) may be combined, for example, you may select to view "all the lemmas with definitions that do not have usage examples", or "all the lemmas with translation equivalents that do not have sound recordings attached". In each case, you can also select whether all the specified conditions must be met ("AND"), or any of the specified conditions ("OR").
The filter function is shown in the screenshot below.
TLex screenshot showing the filter function, in this case used to find and display only entries labelled 'ecclésiastique' (ecclesiastical) in a Louisiana French - English dictionary (shown with permission).
There are many different types of filter conditions. Some allow you to do things such as view "all the work of a particular user"; this can be very useful for project management.
The dictionary compare/merge function allows different versions of a dictionary database to be compared with one another. Differences are displayed visually, as can be seen in the screenshot below. Various actions can be taken with relation to differing lemmas, for example, the differences can be added to the current database, merged, or replace the current article.
The compare/merge function is especially useful in situations where lexicographers are split up geographically, where it may not be possible to have a high-speed network connection to the main database. Changes may then be made "offline", and periodically merged back into the main database.
TLex screenshot showing the dictionary compare/merge dialog, which can be used to visually display the differences between two dictionary databases, and provides functions to add or merge changes back into the main database. Differences within the entry are highlighted.
The batch merge tool allows multiple changes to be quickly merged into the main database - either all changes, or a selected subset of modified entries.
Sound recordings may be linked to any dictionary field. These may then be placed online, or included as part of an electronic dictionary product, allowing users to hear recordings of mother-tongue speakers pronouncing words, usage examples, etc.
You can also add images to articles. Images can be attached anywhere within an article.
Sample entries showing support for images in TLex.
One of the more powerful new features in TLex 2010 is the automatic image/sound/video file linking tool, which like many of the other features in TLex, can literally reduce days or weeks of usually manual work to minutes. This tool allows file naming conventions to be used to automatically map media files to entries (e.g. "headword_homonymnumer.mp3"); this allows literally tens of thousands of media files to be automatically linked to their correction locations within seconds. A log file is also produced, detailing media files which could not be mapped to entries, and entries for which no file was mapped. The naming convention can be changed on a per-project basis, and configured using 'regular expressions'.
The TLex Suite also allows video files to be linked in to the document.
TLex has several features to assist with the compilation of bilingual dictionaries, aimed at speeding up compilation, and helping you to ensure that both sides of the dictionary receive balanced treatment.
When working on a bilingual dictionary, TLex's side-by-side view mode allows you to view or work on both sides of the dictionary simultaneously, as shown in the screenshot below.
Whenever a lemma is selected, a list of "bilingual references" is displayed in the top left of the language window. This is a list of all articles in the other side of the dictionary whose lemma signs appear as a translation equivalent in the currently selected article.
When linked view mode is enabled, this is taken a step further: whenever a lemma is selected on one side of the dictionary, all such related articles on the other side of the dictionary are instantly displayed in the other window, allowing you to immediately see the treatment of corresponding lemmas while you work. This is shown in the screenshot below.
Bilingual "linked view mode" in a French-Dutch/Dutch-French dictionary (Van Dale / Le Robert). The right side shows all Dutch lemmas whose lemma signs appear as a translation equivalent in the selected French article on the left. Data Copyright (shown with permission).
When working on a bilingual dictionary, TLex provides automatic lemma reversal functions to assist with, and speed up, the process of generating the reverse side of your dictionary. Lemmas may be reversed individually, or the entire dictionary may be reversed in one go. When reversing a single lemma, you can select which entries or aspects of entries you want included, as shown in the screenshot below.
TLex screenshot showing the lemma reverse tool. When auto-reversing lemmas, individual word senses and combinations can be easily selected or deselected for reversal.
This powerful feature automatically shows entries related to the current one via a shared Translation Equivalent (i.e. all other entries in the same side that share a Translation Equivalent that appears within the current entry). This is exemplified in the screenshot below.
Translation Equivalent fanouts.
The Styles system allows you to fully configure the visual appearance of any field, such as colour, font, font size, and common text/punctuation to appear before and after fields (e.g. automatic brackets around a part of speech field). Paragraph style options allow more advanced configuration of indentation, spacing and border properties.
With the TLex styles system, changes to the formatting and 'look' of the dictionary can thus be made centrally, at any time, and the changes will immediately reflect throughout the entire dictionary - one merely specifies the desired appearance of a particular type of information.
Customising the styles of different fields in TLex.
One of the normally tedious tasks that TLex fully automates for you, is the entering of sense numbers, homonym numbers, or numbering of any other field. Homonym numbers are always automatically calculated by TLex. Sense numbers are also automatically calculated whenever you add or remove senses, or change the order of senses within a lemma.
The Styles system allows you to easily configure or change the type of numbering that appears in the output (e.g. Roman numerals, Latin letters, circled digits, and so on), as well as specify the rules that determine when the numbers are visible (e.g. "always", or "only when more than one sense", or "only when there are subsenses").
Since all these aspects are just part of the Styles system, they can be easily changed at any time during the dictionary project - you change the numbering scheme centrally, in one place, and the change immediately reflects throughout your dictionary.
Automatic sense numbering in TLex. The numbering style can easily be changed at any time.
[Advanced] Version 4 of TLex allows you to set up rules (of any possible complexity) that allow the appearance of a field to change dynamically in different situations. As a simple example, you might want the automatic punctuation before a usage example to differ depending on which field happens to precede it. Or, you might want to automically generate a fullstop after the example only if it doesn't end on an exclamation or question mark.
Another example is to automatically highlight the most frequent headwords in some way, as shown in the following screenshot.
"Smart Styles": Automatically and fully dynamically highlighting the most frequent words (red, and with a "*") based on the corpus frequency ranking. (Data shown in sample is the Adam Kilgarriff BNC top 6318 wordlist.)
TLex has several sophisticated features that allow multiple dictionaries or "editions" to be compiled in, and generated from, the same database. Apart from the customisable meta-language aspects already discussed, these also include the ability to configure multiple sets of styles, and the "masks" system, discussed below.
Using the Styles system, it is possible to set up multiple sets of styles for a single dictionary database. This allows different "views" to be created for the same data, e.g. "compact" vs. "long" (for hardcopy and electronic output respectively), and can also be used to generate two different dictionaries from the same database, e.g. "pocket edition" vs. "desktop edition".
A real-time preview of the selected article is simultaneously shown for all editions as you work (as shown in the screenshots below), and you can quickly toggle between the different views with a single keypress.
Certain fields can also be hidden in different views, allowing you to, for example, hide monolingual definitions when generating bilingual dictionary output from a semi-bilingual database, or hiding translation equivalents when generating the monolingual edition.
"Masks" allow you to select individual elements within an entry to be present or absent in each edition by simply selecting the element and ticking off the desired edition(s) from a list. Automatic numbering is recalculated on-the-fly as necessary for each edition. These aspects are shown in the screenshots below.
Screenshot demonstrating the "multiple editions from one database" features. Here the user is working on the "Pocket" edition, but also simultaneously sees a preview of the "Full" edition (which has its own independent styles). "Sense 2" is selected and has been excluded from the "Pocket" view. Note that the sense numbering is automatically intelligently recalculated for each edition - thus sense 3 displays as sense 2 in the Pocket view.
Simultaneously compiling the Full and Pocket editions in an English-Afrikaans/Afrikaans-English bilingual dictionary. Data Copyright (shown with permission).
Full and pocket editions being compiled simultaneously in TLex for a French-Dutch/Dutch-French dictionary. Data Copyright (shown with permission).
Final full and pocket editions of above Le Robert & Van Dale French-Dutch/Dutch-French dictionary. Data Copyright (shown with permission).
TLex includes a built-in DTD (Document Type Definition) editor that allows the user to fully customise the 'dictionary grammar' (basically, the DTD) for each dictionary project. The DTD specifies the types of data fields and the entry structure that comprise the dictionary grammar. The TLex DTD system is based on the industry-standard XML DTD system.
To allow new users to get up and running quickly without the need to worry about the complexities of setting up a DTD, TLex creates a sensible default DTD for all new dictionary projects. Template DTDs may also be created, allowing new dictionary projects to be based on an already configured DTD.
The TLex DTD system further allows the value of any field to be limited to selection from a closed list, for example part of speech types or usage labels. One may also choose between being able to select multiple list items or only one list item for a particular field. Using closed lists saves time, and also prevents mistakes and inconsistencies (such as some lexicographers entering "noun" while others enter "n" in the part of speech field).
The multi-user and network support allow a team of users to work simultaneously on a single dictionary stored on a central database server (e.g. PostgreSQL, Microsoft SQL Server or Oracle). Entry locking ("check-in/check-out") prevents changes by one user overwriting changes by another.
Multiple users may be configured for each project, each with their own logon name and access password. TLex keeps track of which users have modified or created which entries. It is also possible to do this at any element level, for example, you can keep track of which users have modified particular senses within an entry.
The check-in/check-out system provides user locking of entries, preventing one user's changes from being overwritten by another's. A user "checks out" the entry when making changes (thereby locking it), and checks the entry back in to the server when done.
The (semi-)automatic check-out system is user-friendly and unobtrusive, prompting if you want to check an entry out for editing whenever you try to make changes.
Different icons are displayed to visually indicate the check-out status of entries, as shown in the screenshot below:
Icons showing entry-locking status.
A fine-grained privileges system allows different users to be given different levels of access to the database, for example users can be disallowed from modifying or creating or deleting entries, or prevented from being able to access advanced areas such as the DTD editor.
This tool assists managers in tracking/charting the progress of a dictionary or terminology project as a whole, or of individual members of a team; for example, you can display a graph of the last 14 days showing the number of modifications made, either in total or by particular users. The time range can be selected.
User statistics / monitoring graph showing the number of edits done by "Bob" in the last two weeks.
Other tools allow additional levels of monitoring, such as a "Filter" to show only entries changed or created by a particular user (or users). This can be combined with smart searching in the "modified" timestamp field; for example you can choose to "show all entries modified by Bob during April 2007", or "show all entries changed on 17 May 2007". Finally, these can also be combined with the "Sort By" option that allows you to sort entries by the last-modified or created date and time.
The built-in corpus tool integrates corpus query functionality directly into the editing process.
The "auto-search" option automatically launches a corpus search for the currently selected headword or term. A usage example can also automatically be generated and attached to the current entry from the currently selected corpus line at the click of the button.
Integrated corpus query tool, with auto-extracted usage example.
The Encrypt Corpus Files option allows you to optionally protect your corpus from potential illegal copying and distribution - only the protected, encryped files need to be distributed to compilers.
TLex 2010 includes numerous improvements to the integrated Corpus tool (F6), including a new "general wordlist" tool, performance and accuracy improvements, better support for a larger variety of languages, and support for very large files (larger than 4GB).
TLex 2010 also includes the fully-fledged standalone tlCorpus application, which also contains numerous additional useful features; see the tlCorpus product page for more details.
Within the TLex 2010 Suite, tlCorpus and TLex can 'talk to one another'; for example, you can directly select and copy corpus line sentences from search results in tlCorpus, and 'paste' them into TLex, and it will automatically place them in the appropriate fields for usage examples.
This allows one to insert cross-references anywhere within the text of a field, as shown in the screenshot below.
Inline cross-references (above: Preview; below: Attribute content).
TLex 2.0 adds the ability to import XML data into TLex, in addition to the existing XML export functionality. Existing dictionaries can thus be imported, or data may be processed/modified externally and re-imported.
The new 'XML Line by Line' importer can be helpful for importing XML or XML-ish data that may be imperfect; the importer will continue to the next line in the file even if a line (each line corresponds to an entry) contains invalid XML. Lines containing invalid XML can be partially correctly processed up until the point where an error appears; these can be easily located directly afterwards using the 'Parse error' filter under 'Filter (F5)' (and thus either corrected after import, or in the original source document and then re-imported). This importer can also perform automatic corrections of some cases of SGML-like incorrectly nested XML tags, e.g. "<i>1<b>2</i></b>". See the relevant sample under the 'Samples' heading on the 'start page' for an example, and more detailed documentation.
One of the goals of modern lexicography is to create so-called "balanced" dictionaries, in which the lemma sign count and page count distributions of the alphabetic sections are spread proportionally according to corpus-calculated frequencies of words beginning with the corresponding letters. Broadly speaking, this, in theory, represents a more optimally useful dictionary to end-users, and can also help serve as a planning tool for managing a dictionary project (as effort/time should also ultimately, in general, be spread more or less according to the same proportions). TLex contains a built-in Ruler Tool that allows you to calculate the ruler for your current dictionary, and also allows you, at any time, to dynamically compare it to an "ideal" ruler calculated from a corpus.
TLex Ruler Tool: The red line indicates "ideal" percentage distribution of, while the blue and yellow lines indicate actual distribution of, space allocation (article length) and lemma sign count distribution (number of articles) respectively, for the English side of the Oxford Bilingual School Dictionary: Northern Sotho and English. With all measurements being within one percentage point of the ideal, this dictionary can be seen to be well balanced.
The TLex Ruler Tool also helps prevent some of the problems that traditionally occur during dictionary compilation, such as entries in different parts of the dictionary having different average lengths due to factors such as 'rapidly approaching deadline' or different users with different approaches working in different parts of the dictionary.
TLex includes advanced document statistics functionality. Version 2010 adds numerous additional and highly detailed statistics about structures and field values that occur in a document, as well as style statistics, entity statistics, Lua script statistics, and more. Advanced 'statistics filters' allow you to do things such as 'show all entries where a sense contains 3 or more usage examples'.
TLex 2010 also includes a detailed 'word count' function, showing total word counts, word counts for each field (i.e. each type of information), as well as word counts for both the actual XML data and the 'formatted' output.
TLex 3.0 brings built-in programmability via integration of the Lua scripting language. This allows countless new possibilities.
Lua Scripting Attributes allow attributes to become 'calculation fields' that can be used to dynamically programmatically generate the text displayed for that attribute within an entry - similar to formulas in a spreadsheet. Some example uses of this might be to implement more "intelligent" punctuation, styles or numbering.
External scripts can also be used to perform various kinds of modifications on the database via the TshwaneLua API (Application Programming Interface).
A free reader application, tlReader, is now available that can be used to view TLex dictionaries or tlTerm termbases.
TLex is fully localisable, meaning, the entire interface of TLex itself can be translated into any language. A built-in localisation editor makes it easy to add additional languages.
TLex in Français (French).
TLex in Español (Spanish). Translation (beta) by Dr. Ignacio Navascués.
Localised Cilubà version of TLex. Translation and Data Copyright Prof. N.S. Kabuta et al.
The built-in TLex localisation editor. Changes can be "applied" immediately.
TLex provides several methods for placing a dictionary online. There are two basic methods. The first is to generate "static" output, where the dictionary is placed online as a pre-generated file (e.g. HTML, XML, RTF, PDF, MS Word .doc, etc.). The second method, using the TLex online dictionary publishing system, dynamically generates output, and provides far greater flexibility and functionality.
One can directly create 'static' HTML pages by using the "Export HTML" feature; this simple example bilingual Northern Sotho - Chinese dictionary was exported from TLex in this way. This also demonstrates the Unicode support (the HTML file is encoded using UTF8). There are various output options, e.g. one may choose whether or not to use stylesheets.
It is also possible to export to XML format from TLex. An XML stylesheet transform may then be applied to generate an HTML page.
A more powerful and flexible method of placing a dictionary online is to use the TLex online dictionary software module (licensed separately). This is a customisable set of PHP scripts that easily allows you to place your dictionary online (or, if you don't have the necessary in-house IT skills, we can manage the site for you). The software allows users to perform searches (with related cross-references also being shown), or additionally may display a list of headwords ("browse mode"), allowing users to browse the full dictionary. All user activity is logged, and the powerful logging and usage analysis module can be used to easily help identify areas of weakness in the dictionary content, e.g. frequent searches for untreated words. Configuration options control various aspects, such as whether or not you want to allow full browsing or allow wildcard searches.
"Browse mode" is demonstrated in this Northern Sotho - English linguistics terminology list and in this Sheng - English lexicon.
A feedback form is also provided, allowing dictionary users to send feedback directly to the dictionary compilation team. (This form may also be disabled via a configuration option.)
The user interface of the online dictionary is fully localisable (i.e. supports multiple languages for the user interface), as can be seen in this online Swahili - English dictionary, which allows users to use the dictionary in either Swahili or English. The user's language preference is remembered using cookies.
Meta-language customisation: The localisation system goes another step further, and even allows customisation of the actual article content, as described in the section customising the language of the meta-language.
The online dictionary software module provides the option to keep extensive, detailed (anonymous) usage logs. An intricate and multifaceted log file tracks every single action of every single user - date and time stamping every lookup, ordering founds and not-founds, monitoring long-term vocabulary retention, etc. - with a multitude of customisable summaries being presented to the lexicographers.
Overall statistics may be retrieved of the average hit/miss rates, the most commonly looked-up words, and the most frequent founds and not-founds. This information allows dictionary compilers to prioritize improvements to the dictionary based on the frequency of searches for different words.
Each dictionary user is assigned a unique ID (using a web browser "cookie") that allows return visits to be (anonymously) tracked, and allows statistics to be shown for particular users.
General logging of all dictionary lookups (date/time range set).
The logging module also displays graphs of the number of unique visitors and the number of searches per day, per week and per month, as shown in the following screenshots.
Detailed view of overall number of lookups and number of visitors (date/time range set).
Overall number of lookups and number of visitors (per week and per month).
Providing a single lexicography solution that suits everyone's needs would be a tall order. In order to allow TLex additional flexibility, an "extendibility API" (Application Programming Interface) will be available, allowing developers to customise and extend TLex in various ways, such as adding new sorting methods or input/output methods, as described below.
There are many different approaches to sorting text for the many different languages of the world; there is no "one-size fits all" approach. In order to be able to handle any possible method for sorting entries, TLex provides an extendibility mechanism whereby TLex "plugins" can be created to add new sorting methods.
The default sorting method supported by TLex is a configurable four-pass table-based system based on the ISO 14651 standard. The four different passes are used for various characteristics that may take precedence over one another, e.g. the so-called "base alphabet", diacritics, uppercase/lowercase differences, and so-called "ignorable" characters (typically non-alphabetic characters such as spaces and punctuation marks).
Other sorting methods supported include Chinese radical / stroke count ordering.
The TLex API allows for the extension of the input/output system. By means of "plugins", developers can add support to TLex for new or existing file formats or databases. This can also be used to develop importers that can be used to import existing dictionary data from other formats. A plugin for importing dictionaries stored in the Shoebox dictionary format is currently under development.
By default, TLex can work fully with both a normal disk file format, a relational database via ODBC (Object Database Connectivity) and SQL (Structured Query Language), or XML format. Supported output formats include XML (with optional XSLT), HTML (with or without CSS), RTF (Rich Text Format) and text.