TshwaneLex/TshwaneTerm: FAQ (Frequently Asked Questions); Tips & Tricks; Undocumented
Listed here are answers to common queries, as well as some potentially useful features that haven't quite yet made "full-fledged feature" status and/or have otherwise not yet been documented elsewhere, as well as other useful "tips & tricks" for TshwaneLex and TshwaneTerm.
F12 - Larger Attribute Editing Window
[TshwaneLex 2] Pressing F12 while in an "Attributes (F1)" window pops up a larger overlay window for editing the current attribute.
(Note one can also widen the "Attributes (F1)" window with the "View/Wide Tools window layout (Ctrl+Alt+L)" option.)
Useful Shortcut Keys
- Esc - Jump to 'quicksearch' box
- Shift+Esc - Jump to main lemma/entry list
- In 'Attributes (F1)': Tab - Next field
- In 'Attributes (F1)': Shift+Tab - Previous field
- F12 - (From any text editing window) - open a larger popup edit window, useful if the edit box is too small
Do TshwaneLex and TshwaneTerm work on Windows Vista?
Yes.
Remember selected columns for "Import/Wordlist or CSV" (ImportCSVDefaults.txt)
If one regularly imports CSV documents with given columns, defaults for the corresponding attributes can be configured by creating a text file in the TshwaneLex/TshwaneTerm application folder (typically something like "c:\Program Files\TshwaneLex") called "ImportCSVDefaults.txt". The contents of this text file are simply what you would see in the right-side column of the 'Import CSV' dialog box, e.g.:
Lemma::LemmaSign
Lemma::PartOfSpeech
Lemma::Sense::TE::TE
A blank row corresponds to a "Nothing". If this text file is present, then when the 'Import CSV' dialog is opened, it will automatically attempt to fill in the right-side column with the fields read from the text file (provided those fields exist in the DTD; if not a "Nothing" will be added instead).
Use different fonts for certain attributes under 'Attributes (F1)'
The font settings under "Tools/Options" allow you to configure the font that will be used for 'Attributes (F1)' boxes. Sometimes this is not fine-grained enough, e.g. you may want to use different fonts for different fields, or for different sides/sections of the dictionary. You can configure specific fonts to be used for individual attribute types under 'Attributes (F1)' by creating a text file "Fonts.txt" in the TshwaneLex/TshwaneTerm application folder. Each line of the file must have three comma-separated values: The name of the 'language' or 'section', followed by an "Element|Attribute" item, followed by the font name. The following example, for a bilingual Tshivenda/English dictionary, would use "DejaVu Sans" for the headword box on the Tshivenda side of the dictionary and for the Tshivenda 'Translation Equivalent' box on the English side:
Tshivenda,Lemma|LemmaSign,DejaVu Sans
English,TE|TE,DejaVu Sans
Change the width of the entry/lemma list
[New from 2006-11-15] For some projects the lemma list width may seem a bit on the small side. The width can be configured by creating a DWORD registry key called "EntryListWidth" under "HKEY_CURRENT_USER/Software/TshwaneDJe/(ApplicationName)/UserInterface". This specifies the width as a percentage of the width of the application window. The default is around 11%.
[TshwaneLex 4] As of TshwaneLex Suite 4 (future release), one can change the width of the Lemma List or Term List dynamically in TshwaneLex or TshwaneTerm using the keyboard shortcuts Ctrl+Alt+Shift+Right and Ctrl+Alt+Shift+Left.
Disable the splash screen (faster startup)
Under "HKEY_CURRENT_USER/Software/TshwaneDJe/(ApplicationName)/App", create a DWORD registry key called "NoSplash" and set the value to 1.
Increase the number of results shown under Search (F3)
Under "HKEY_CURRENT_USER/Software/TshwaneDJe/(ApplicationName)/Settings" (create if necessary), create a DWORD registry key called "MaxSearchResults" and set the value to the desired maximum number of search results.
Tilde expansion settings and style
[New from 2006-11-16] Create "HKEY_CURRENT_USER/Software/TshwaneDJe/(ApplicationName)/Settings". Underneath this, there are two possible settings:
TildeStyle (DWORD)
0: Underline (default)
1: Bold
2: Italics
3: No styling
TildeMethod (DWORD)
Set this to 1 for alternate tilde expansion strategy for when first letter is uppercase, e.g. lemma "university ... U~ of Cape Town"
Other Project- and Language/Section-specific Window Settings
Under "HKEY_CURRENT_USER/Software/TshwaneDJe/(ApplicationName)/PerLexicon/(ProjectKey)/SectionID(Num)/", the following settings apply:
Disable large headword in Preview ShowHeadwordPreview (DWORD): [New from 2007-06] If 0, the large headword displayed at the top of the Preview is disabled.
Limit number of entries shown in Preview MaxPreviewEntries (DWORD): [New from 2007-06] Limits the maximum number of entries following the selected one that are displayed
[Advanced] Useful Regular Expressions for "Search (F3)"
'Word Boundary' Syntax
Use "\b" to find word boundaries; "\<" and "\>" for left/right word boundaries. (NOTE: For TshwaneLex/TshwaneTerm releases prior to June 2007, use [[:<:]] and [[:>:]] for left/right word boundaries.)
Find all examples (or combinations) in which the "~" was not used
Do a field-specific (e.g. "Example::Example") regular expression search under F3 for "^[^~]+$".
Find all definitions (or examples etc.) which end with a fullstop
Do a field-specific (e.g. "Definition::Definition") regular expression search under F3 for "\.$".
Find all definitions (or examples etc.) which do not end with a fullstop
Do a field-specific (e.g. "Definition::Definition") regular expression search under F3 for "[^\.]$".
Find all definitions (or examples etc.) which begin with an uppercase letter (or with a lowercase letter)
Do a field-specific (e.g. "Definition::Definition") case-sensitive regular expression search under F3 for "^[A-Z]" (or "^[a-z]" for lowercase letters).
Find all single-word (i.e. non-multi-word) headwords/terms which begin with an uppercase letter
Do a case-sensitive regular expression search under F3 for "^[A-Z][^ ]*$".
=> A mistake people sometimes make, particularly inexperienced users, is to enter headwords or terms starting with a capital letter, e.g. "Computer" instead of "computer"; this search can help automatically find such cases, while ignoring multi-word terms that should be uppercase, e.g. "World Wide Web".
Find all entries beginning with a certain letter or range of letters
See the section below titled "Exporting all Entries Beginning with a Certain Letter (e.g. 'A'), or a Range of Letters (e.g. 'Q to Z')"
Our regular expressions use the Perl syntax: Regular Expression Syntax Reference.
How to find empty attribute values for a given attribute type?
There are at least two methods:
(1) In the DTD editor, configure the attribute value to be 'required'. Then do an "Error check" (or use the "Required attribute" F5 Filter).
(2) Do a field-specific F3 search for the regular expression "^$".
Exporting all Entries Beginning with a Certain Letter (e.g. 'A'), or a Range of Letters (e.g. 'Q to Z')
[TshwaneLex 2] To find or export all entries beginning with a certain letter or letters, one can use a field-specific regular expression search, combined with a search filter, as explained in the following steps:
- Go to "Search (F3)"
- Make sure "regular expression" is ticked
- Make sure "Whole word only" and "Case sensitive" are not ticked
- Click "Fields <" to open the field list
- Click "Clear all" at the bottom of the field list
- Tick "Lemma::LemmaSign" in the field list
- Enter "^a" into the search box (replace "a" with the desired starting letter). For a range of letters such as 'q to z', use the search query "^[q-z]". (NOTE: If you enter "^a" and it changes into a single character, turn off "replace-as-you-type" under the "Tools" menu.)
- Click "Search"
- Click "As Filter" in the search results (this will apply the current search results as a 'filter')
Now export as usual (e.g. to RTF), making sure to select the "Use filters" option.
(For languages with prefixes, e.g. if a "-" precedes roots/stems in the headword, the regular expressions "^-?a" and "^-?[q-z]" can be used respectively.)
Exporting a Filtered Subset of Data to a TshwaneLex File
Apply the desired filter(s) under "Filter (F5)", and select "File/Save a copy/TshwaneLex file". Select "Use filters", and if desired, "Include incomplete articles".
This can also be used to clear a side of the dictionary, by applying a filter on that side that leaves no entries remaining (e.g. 'subtract all entries with a lemma sign').
NB: One thing to watch out for when exporting a filtered subset in this way, is that cross-references to articles that are not also included by the filter will be removed from the exported data.
Clearing One Side of a Bilingual Database
See previous question.
Creating a New Blank Database From an Existing One as Template
If one wants to retain certain settings from the original database, such as the user logons, see previous question.
If one only wants to use the core TshwaneLex DTD and styles, but lose e.g. the user logons, one can use "DTD Templates", explained in the User Guide.
Set the Incomplete Flag on All Entries
Open "Edit/Search and replace". Under "Fields", do 'Clear all' and tick only the 'Incomplete' attribute. Do a find of "0" and replace with "1" (enter these without quotes).
Clear the Incomplete Flag on All Entries
Open "Edit/Search and replace". Under "Fields", do 'Clear all' and tick only the 'Incomplete' attribute. Do a find of "1" and replace with "0" (enter these without quotes).
How to View All the Latest Work
The quickest method [TshwaneLex 3] is to go to "Format (F4)" and select the 'modified date' (typically named "Modified") attribute under "Sort by". The most recent work will now be at the bottom of the Lemma List or Term List. Once done, you can select the "-" option from the "Sort by" list to go back to normal.
Note that this can be combined with "Filter (F5)"; for example, you could filter on a particular username to see all the latest work of that user only.
View All Work Done on a Particular Day or Days
Go to "Search (F3)". Click "Fields <" to display the fields list, click 'Clear all' and tick only the 'Modified' attribute on the lemma/entry element (this means "restrict the search to the 'last modified date' field"). Enter a date into the search box in YYYY-MM-DD format (e.g. "2008-03-20", without quotes) and click 'Search'.
You can also easily search for work from an entire month by searching for that month specified in YYYY-MM format, e.g. "2008-03".
To find work from multiple days, you can tick 'Regular expression' and then separate them with the "|" (vertical line, meaning 'or' in regular expression syntax) character, e.g. you could search for "2008-03-20|2008-03-21|2008-03-22". More advanced regular expressions could be used to construct fancier queries.
Note that this can be combined with "Filter (F5)"; for example, you could first filter on a particular username to see the work of that user only.
[Advanced] Inline Elements (PCDATA)
'Inline elements' refers to when an element is used inside the text of a field, i.e. when another field type occurs somewhere within the text of a single "Attributes (F1)" box. Take the following simple example Dutch - French dictionary article:
aanbouwmeubel meuble [m.] assemblé par éléments
The gender "m." of "meuble" is indicated in italics with square brackets within the actual translation "meuble assemblé par éléments". Using an inline element, the gender information may be specially tagged as such, e.g. with its own "gender" element, using XML syntax inside the translation equivalent field, like so:
meuble <gender>m.</gender> assemblé par éléments
Here "gender" is an element defined in the DTD. It thus has its own style in TshwaneLex, allowing it to automatically always appear with its own particular distinctive font/colour/formatting as well as automatic punctuation such as the square brackets around it.
The sample "English - French Inline Element Sample.tldict" that comes with TshwaneLex demonstrates this usage of inline elements.
Inline Elements Background:
In XML, only "PCDATA" can contain inline elements - regular attributes may not. This is because for an inline tag to be understood, the text that it occurs in must be 'parsed' (i.e. basically 'processed'), and ordinary attribute values are not parsed - only the PCDATA section of an element is parsed (PCDATA actually stands for "Parsed Character Data"). The PCDATA of an element is basically everything that falls between the opening tag of an element and its corresponding closing tag:
<Element ...>THIS IS PCDATA</Element>
Ordinarily, by default, a translation equivalent in TshwaneLex is stored as a regular attribute of an element, which will be exported as XML that looks like the following:
<TE TE="meuble assemblé par éléments"></TE>
If the "TE" attribute above is instead marked as being used for the PCDATA of the TE element in the DTD Editor, then it will be exported as XML that looks like the following (note the translation equivalent is now in the PCDATA section of the element):
<TE>meuble assemblé par éléments</TE>
This then allows inline elements to be used, e.g. the following is valid XML:
<TE>meuble <gender>m.</gender> assemblé par éléments</TE>
Using inline elements within a regular attribute, however, is invalid XML:
<TE TE="meuble <gender>m.</gender> assemblé par éléments"></TE>
Inline elements may also be used for basic formatting, such as bold and italics, instead of the TshwaneLex "%" markup characters:
<TE>here is some <b>bold</b> text</TE>
Advantages of Inline Elements:
In the above example of the "gender" element, using an inline element effectively allows "knowledge" of what the "m." means to be encoded into the data, rather than just "dumb" text. This has advantages, e.g. in the TshwaneLex Electronic Dictionary module, if the end-user clicks on a gender label, a grammar window explaining the gender system could be displayed. A smarter search index could also be generated, allowing the end-user to search for "meuble assemblé", since the search system could 'know' that the gender label is not strictly part of the translation.
Inline Cross-references [TshwaneLex 3]
Another advantage of inline elements in TshwaneLex is that they allow for the creation of inline cross-references - i.e. cross-references somewhere within the text of a field. This can be configured in the DTD Editor, in the "Attributes" section (using the 'XRef target' and 'XRef display' checkboxes). The 'XRef target' attribute is used to specify the actual cross-reference, while the optional 'XRef display' attribute, if filled in, indicates some substitution text to appear in place of the cross-referenced headword (if you need to display something different). (If both of these two settings are used, they must be ticked for different attributes (or PCDATA) within the same element.)
The 'XRef target' attribute type may be used on either PCDATA (for inline cross-references - this will likely usually be the case) or normal attributes. In the following example it is used in PCDATA:
<reftype>See</reftype> <ref>bunny</ref>
If "ref" has a PCDATA with 'XRef target' selected, then TshwaneLex will try to resolve "bunny" as a cross-reference and make a hyperlink. It doesn't have to be PCDATA, it could be a regular attribute; you could thus also do something like:
<reftype>See</reftype> <ref target="bunny" />
The 'XRef display' is for cases where the text that is displayed in the Preview should be different from the actual cross-reference target. For example, if we create a "ref::display" attribute and check the 'XRef display' option for it, we could do the following:
Check <ref display="Google">http://google.com/</ref> for more info.
or non-PCDATA equivalent:
Check <ref display="Google" target="http://google.com/"> for more info.
The output will then display "Check Google for more info", but if you click on "Google" the actual link will be "http://google.com/". Apart from hyperlinks to websites, this also has lexicographic applications (the inline cross-references sample in TshwaneLex demonstrates this).
'XRef target' is required to be ticked if you want to use inline cross-references, but 'XRef display' is optional. If you use it though, it must be on the same element as the 'XRef target' it corresponds to. This can also be used on either PCDATA or normal attributes.
Note that inline cross-references are not "smart cross-references", although will display as a hyperlink if the cross-reference target is found. It is suggested to use normal smart cross-references instead of inline cross-references unless you have a definite need to e.g. have complex sentences with cross-references appearing anywhere inside a sentence.
Changing the style of the cross-reference type
By default the cross-reference type is displayed with the same style as the cross-reference headword, e.g.:
hound SEE dog
It is sometimes desirable to use a different style for the cross-reference type, e.g.:
hound SEE dog
This can be achieved to some extent by using markup tags in the 'display labels' under "Dictionary/Edit cross-reference types", e.g.:
%bSEE%b
In this example, the bold tags 'cancel out' the surrounding bold from the overall style of the entire cross-reference. In future, we intend to add more advanced controls for the appearance of cross-references.
[Advanced] Inline Elements: Validating PCDATA or Inline Attributes Against a Closed List
Releases of TshwaneLex later than July 2007 include the ability to do closed list item validation for inline elements. A PCDATA section can be specified to be of 'closed list' type in the DTD editor, and if its value isn't in the selected list, it will be highlighted as red in the Preview (likewise for attributes written inline). For example, if you have in an etymology field:
From <lang>Old French</lang> <originword>entreprendre</originword>
then the part inside the "lang" tags may be checked against values in a closed list, such as a list of valid origin languages; thus if you make a typo:
From <lang>Old Frenhc</lang> <originword>entreprendre</originword>
the language name will be highlighted in red, immediately tipping the user off that there is a mistake.
[Advanced] Inline Elements: Special <i>'Tagging'</i> Shortcut Keys
[New from 2007-03-22; TshwaneLex 2] Special "tag" shortcut keys can be created under "Tools/Options/Keyboard shortcuts (macros)" that make tagging data with inline elements under "Attributes (F1)" far more convenient and user-friendly than typing out tags manually. This involves creating a shortcut key with the following format for "Text to insert when shortcut is pressed":
$TAG$:tagname
For example:
$TAG$:gender
Pressing this shortcut key in an "Attributes (F1)" box will then automatically 'intelligently' output either an opening "<gender>" or closing "</gender>" tag as appropriate, or if some text is selected, surround the selected text with a pair of opening and closing tags.
Re-sorting Entries or Fixing Homonym Numbers
Under normal circumstances these should not go wrong, but should circumstances outside the norm result in the ordering of entries or of homonym numbers being incorrect, you can use "Re-sort lemmas" under "Dictionary/Configure sorting" (use for each section of the dictionary). The database analyse/repair options under "Tools/Database administration" will also repair homonym numbers.
Viewing an Entry While Working on Another
It may be useful sometimes to refer to a certain entry while working on another, so that both are on the screen simultaneously. One 'trick' here is to use "Search (F3)" to get the first entry into the search results window, then select and work on the other entry.
Create A Windows Shortcut to Directly Open an ODBC Database
As of 2007-06, TshwaneLex/TshwaneTerm can now be launched with an ODBC database specified as command-line parameter (e.g. "odbc|datasourcename|tl_"), allowing e.g. Desktop or Quicklaunch shortcuts to be created directly that directly open a particular ODBC database.
To do the same but using the "cached ODBC" interface, use "cached" instead of "odbc", e.g. "cached|datasourcename|tl_".
Getting Started with Corpus (F6) [TshwaneLex Suite 3]
1. Configuring the Corpus
Step-by-step:
- Prepare your corpus files as text files. (It may be a good idea to save them all to a specific dedicated folder, but that is not
necessary.)
- Under "Corpus (F6)", click on "Configure" and select "Texts/Add multiple".
- Click "Browse" to select the folder containing the text files (the "recurse" option will specify whether or not TshwaneLex/TshwaneTerm will also auto-add text files from subfolders within the selected folder).
- Click "OK".
- The desired corpus files should now appear in the list.
The configuration will be saved along with the particular database.
More files can be added at any time later, or files may be removed from the list.
2. Doing a Corpus Search
Once the corpus files are configured, you can perform search queries on the corpus. Either a query can be entered manually under "Corpus (F6)", or you can tick the "Auto-search" option, and TshwaneLex/TshwaneTerm will then automatically launch a corpus search for the current headword/term each time you select an entry.
The most recent results are kept in memory, thus if you select another entry and then go back again to the first entry, the search results should re-appear immediately. If the search had not yet completed, it will automatically continue on its way again.
3. Sorting the Results
The ordering of corpus search results can be configured by clicking on "Configure" under "Corpus (F6)", and under "Sort", using "Move up" and "Move down" to change the order of sort items. For example, by moving "Word Before Search Term" to the first position, the entries will first be sorted on the word to the left of the search term within a results line. (If the word to the left is the same for two lines, the next item in the list will decide how they are further sorted, and so on.)
4. Auto-grabbing Usage Examples
One of the powerful time-saving features of TshwaneLex/TshwaneTerm 3 is the ability to automatically 'grab' a sentence from a line in the corpus results and attach it as a usage example in the current entry. To do this, use the following procedure:
- Select the desired 'Sense' element in the Tree View to which you wish to attach the example
- Select the desired line in the corpus results by clicking on its number in the left column
- Press the shortcut key Ctrl+F7
Note: This "relies on" the default "Sense" and "Example" elements from the default TshwaneLex DTD being present.
5. Copying Selected Examples (Corpus Lines) to Clipboard
You can use the shortcut "Ctrl+C" to copy the currently selected corpus line to the clipboard.
6. Corpus Encryption
The TshwaneLex/TshwaneTerm corpus tool includes a facility to 'encrypt' corpus files and protect them with a password. The resultant encrypted files can be used within the "Corpus (F6)" tool provided one has the password, but outside of TshwaneLex/TshwaneTerm the files will be unreadable. This allows you to protect your corpus from possible theft by members of the team or anyone else with access to the computers.
To apply encryption to all or part of your corpus, click on "Configure" under "Corpus (F6)", click on the "Texts" tab, then select one or more files that you would like to encrypt from the list. Multiple files can be selected by holding in "Ctrl" on the keyboard while clicking with the mouse. Alternatively, if you wish to encrypt all files, click on "Encrypt all". You will be prompted for the password that will be used to protect the files. Enter the password carefully and click "OK". New copies of each chosen file will be saved (to the same folder) with an extension ".tecrypt". These files may then henceforth be distributed to the compilers instead of the original text files.
IMPORTANT:
- The encryption password is case-sensitive, meaning "a" is considered different from "A".
- Make sure to keep a backup copy of the original corpus files in a safe place. Do not lose the originals. If you forget the password, the original files can not be recovered.
Corpus (F6): Increase the maximum number of corpus lines returned
By default only the first 1000 results for a search are returned; this can be changed. Under "HKEY_CURRENT_USER/Software/TshwaneDJe/(ApplicationName)/Settings" (create if necessary), create a DWORD registry key called "MaxCorpusResults" and set the value to the desired maximum number of corpus lines.
Corpus (F6): Increase the maximum number of cached sets of search results
The most recent sets of corpus search results are cached, so that if you immediately return to a recent search, the results can be displayed immediately. The number of results to cache can be changed. Under "HKEY_CURRENT_USER/Software/TshwaneDJe/(ApplicationName)/Settings" (create if necessary), create a DWORD registry key called "MaxCorpusCachedResults" and set the value to the desired maximum number of search results. The default is 20. A note of caution, arbitrarily changing this to a very high number may impact performance.
Error message "There was an error saving the database. Return code: -1" when saving
The most common cause of this is having the same dictionary file open more than once simultaneously (e.g. in two separate TshwaneLex windows). Failing that, it may be that the file is marked as 'read-only'.
[Advanced - Lua] Getting Started with Lua Scripting [TshwaneLex Suite 3]
=> The 'Getting Started with TshwaneLua' guide is being moved to the TshwaneLua page.
[Advanced - Lua] Sample TshwaneLua Scripts? [TshwaneLex Suite 3]
=> Sample Lua scripts are available here.
[Advanced - Lua] I get "C stack overflow" message [TshwaneLex Suite 3]
Try declaring variables and/or functions "local" (i.e. with local keyword in front).
[Advanced - Lua] How to typecast an object? [TshwaneLex Suite 3]
If you have (for example) a variable of type tcNode called NODE which (NB) you know is of type tcReference, you can typecast it using tolua.cast as follows:
local Reference = tolua.cast(NODE, "tcReference");
Windows Vista Makes 'Ding' Noises While Working in TshwaneLex When Clicking on List Controls
This is the only known issue with TshwaneLex on Vista, and is a bug in Windows Vista that affects many applications. Until Microsoft fixes this, you should be able to work around it by turning off the 'Default Beep' system sound via the Control Panel, or disabling system sounds altogether. Alternatively there is a registry-based workaround described here.
Templates
The idea behind 'templates' is to pre-create entire element tree sub-structures and then enter these 'in one go' in the Tree View. Templates are not yet directly supported in TshwaneLex/TshwaneTerm, but are planned for a future release. In the meantime, for many applications, there are 'work-arounds' - tips that allow one to achieve a similar effect in certain cases. One, for 'lemma templates' as a whole, is to create a few 'dummy' entries, perhaps sorted to the top of the dictionary always, and use the "Lemma/Duplicate article" (Ctrl+Shift+U) menu command in TshwaneLex (or "Entry/Duplicate entry" in TshwaneTerm) when you wish to create an instance of the 'template'. A similar technique can be used for sub-element trees within an entry, by using the Tree View 'copy' and 'paste' commands (Ctrl+C and Ctrl+V respectively) to duplicate Tree View structures from a 'template' source. Another possibility in some cases is to use the DTD child relation constraints, e.g. specifying a "one or more" relation will cause the child element to automatically be created when one of its parent elements are created. Finally, for advanced users, the built-in Lua scripting could be used.
XML Importer - Tips
Data in XML form can be imported into TshwaneLex or TshwaneTerm via the "File/Import/XML" menu option.
It is best to import data into a 'clean/empty' document, i.e. to select the XML import command when no database is open in TshwaneLex/TshwaneTerm.
Note that after importing XML, there would usually be no TshwaneLex/TshwaneTerm styles, thus all imported entries will usually be displayed in a default text style in black on a white background. You can use the "Format/Styles" menu option as usual to add styles once you are satisfied with the import.
TshwaneLex has one or two basic 'expectations' of how the data should be structured in order to import the data in a meaningful way (i.e. in a way that allows TshwaneLex to 'understand' what some of the key fields are, such as the headword). The following is an example of roughly the simplest XML document that can be thrown at the importer:
<Dictionary>
<Language>
<Entry LemmaSign="cow">
</Entry>
</Language>
</Dictionary>
Note that the element for a 'dictionary entry' appears at the third depth level in the document, and should contain an attribute called "LemmaSign" that contains the headword; this allows TshwaneLex to recognise which attribute it should use as the headword for purposes of sorting, indexing in the Lemma List, and so on. (If the headword is in a different element or attribute, it will still be imported - TshwaneLex will just not 'know' to use that field for the Lemma List and so on.)
The names of the elements above ("Dictionary", "Language" and "Entry") can be anything, although their structure is important (i.e. second-level element represents each 'section' or 'side' of a dictionary within TshwaneLex, and third level represents the list of entries within that section).
Note that you do not necessarily need a DTD attached to the data - if importing XML data with no DTD, TshwaneLex will attempt to construct a DTD based on the elements/attributes it encounters. For well-structured data, this can work well.
Here is a slightly more complex example (note "TE" stands for "Translation Equivalent" for, in this case, bilingual English - Afrikaans data):
<Dictionary>
<Language>
<Entry LemmaSign="dog">
<Plural>dogs</Plural>
<Sense>
<TE TE="hond" />
<Definition Definition="A domestic mammal that barks" />
</Sense>
</Entry>
<Entry LemmaSign="cat">
<Sense>
<TE TE="kat" />
</Sense>
</Entry>
</Language>
</Dictionary>
The entries do not need to be correctly sorted within the XML (e.g. "dog" then "cat" above) - TshwaneLex will automatically resort them according to the default configured 'sort method' (which can also be changed at any time later on).
'Merge' XML Import
If you want to import entries into an existing database, the most important thing is to 'tell' the importer which 'side' (section/language) of a dictionary to import sets of entries into. This is done by filling in the language "Name" attribute with the exact same name configured for a language side/section under "Dictionary/Properties". This is shown in the following example:
<Dictionary>
<Language Name="English">
<Entry LemmaSign="dog">
<Sense><TE TE="inja" /></Sense>
</Entry>
</Language>
<Language Name="Zulu">
<Entry LemmaSign="inja">
<Sense><TE TE="dog" /></Sense>
</Entry>
</Language>
</Dictionary>
To do the 'merge' import, one then just selects "File/Import/XML" while the desired database is open.
NB: It is a good idea to always do a 'File/Create a backup' before doing a 'merge import'.
How do you pronounce 'TshwaneLex'?
Click on the "Pronunciation" link near the top of the TshwaneLex page to hear an audio recording. (Note how the "sh" is not pronounced as in English, but more resembling an ordinary English "s".)
Where does the name 'TshwaneLex' come from?
'Tshwane' is the African name for the city in which TshwaneLex is produced and where the company is headquartered, namely Pretoria, while 'Lex' refers to 'lexicon' (or alternatively 'lexicography'). For further background, see the Wikipedia entries on Tshwane and Pretoria.
|