AfrikaansDeutschEspañolFrançaisKiswahiliSerbo-Croatian简体中文
TshwaneDJe - Dictionary Software, Terminology Software, Translation Software and Consulting

Professional Data Conversion

Contact us to discuss your data conversion requirements or obtain a quote.


Confidentiality Assured

We take customer confidentiality very seriously, and have strict policies and measures in place to prevent either your data or confidential information about your project from being divulged to third parties.

If you have dictionary, terminology or other data in formats such as Microsoft Word, QuarkXPress, InDesign, FLEX, Multiterm, Folio, Toolbox, Excel or MS Access, and would like to convert it to a properly structured database (e.g. XML / TLex / tlTerm), we can help; we have a great deal of experience doing such data conversions, from large projects for major commercial or government organisations, down to small projects for individuals.

Overview

A data convertor, or “importer” involves the development of a customised ‘parsing’ system based on the structure, appearance and storage format of a given dictionary or other document – usually from an “unstructured” or weakly structured format (such as Microsoft Word or Adobe InDesign) to a “structured” format in which data fields are semantically marked up (such as TLex). A custom importer allows a large amount of dictionary data to be automatically converted into a structured format far more rapidly and more accurately than manually re-entering the data (for example, typically within around one month). The task generally requires a skilled programmer.

Typical Deliverables

We can convert from:

MS Word
Excel
MS Access
InDesign
PageMaker
QuarkXPress
Multiterm
Toolbox
FLEX
XML
Folio Views (Folio Flat File)
WordPerfect
OpenOffice
SQL varieties

Just about anything ...

  • A TLex dictionary database file (or tlTerm termbase, or tlDatabase file) containing the final imported data (the guaranteed percentage worth of entries – see the following section). (If desired or preferred, this may also be supplied in XML format.)
  • A full record of entries that could not be automatically converted (see following section), if applicable.
  • Basic styling of the data, implemented in the TLex Styles system, including e.g. automatic punctuation.
  • [Dictionaries] As one of the final steps in a dictionary data conversion, we also usually automatically convert all cross-references into TLex “smart cross-references”. A log is generated of any cross-reference errors encountered, for example a cross-reference to an entry that does not exist in the imported data (these usually mostly represent errors in the actual dictionary data, and must ultimately be corrected by the customer). Failed cross-references will still be imported, but will be stored in a distinct field, allowing the customer to search for them easily in TLex.
  • Basic customisation of the DTD (Document Type Definition), including e.g. automatic numbering fields.

Accuracy and Data Conversion Rate

Depending on the data's size and complexity, we usually guarantee a 95% conversion rate, i.e. that 95% of the entries will be fully converted. Due to the difficult nature of automatically converting large amounts of complex data, the remaining 5% of entries unfortunately usually needs to be entered manually by the customer (e.g. by copy-and-paste, field by field). If desired, we can also quote for doing the last 5% of "manual" entries as a separate, additional job. Otherwise, the list of 5% “failed entries” will be provided to the customer.

We usually make a ‘best effort’ attempt to parse more than 95% of the data, depending on the available time and budget and other resources; however this can not be guaranteed.

For data that is already reasonably well-structured, we may be able to guarantee a 100% import. We can determine this prior to quoting when analysing the data.

Representative Sample

In order to estimate the costs involved in doing a data conversion, we must be provided with a sizeable sample of the data in order to study it. It is crucial that the sample of data be representative (in terms of structure, appearance and consistency) of the full data set – otherwise, we cannot prepare an accurate estimate, and will not be able to guarantee the percentage of successfully imported entries beyond those provided in the initial sample used to prepare the estimate. We can thus only really guarantee a particular percentage success rate if we receive all of the data to be converted, in its final form, in advance – thus ideally, the entire set of data should be provided to us in order to prepare an accurate quotation.

We do not mind signing an NDA (non-disclosure agreement) or confidentiality agreement. (We do require that we be able to have a copy of the data on our systems temporarily, for as long as is required to do the conversion.)

Delivery Timeline

The amount of time it will take for us to complete an importer will depend on the complexity, size and consistency of the data. Unfortunately it is difficult to guarantee a particular delivery time; however, the average time is around two to three weeks. If a particular deadline must be guaranteed, we can do this, but would have to quote higher in order to be able to prioritise programmer time.

Cost

The cost will generally be based on an estimate of the number of hours required to convert the data, multiplied by our hourly rate of 40 Euro per hour. Discounts may be available for large jobs. Rates may be subject to change.

What is not typically involved

Note that by default, our data conversion quotations do not include implementing further manual corrections on the data itself, as one might do when for example typesetting a dictionary. It also does not generally include "cleaning up" tasks on the data other than those that should reasonably be considered part of the data conversion itself (e.g. cleaning up tasks that represent new improvements to the content), although we usually do some clean-up, within reason. Generally however, tasks such as these should be quoted for separately. The same applies for post-import data re-modelling (for example, doing restructuring of the DTD (Document Type Definition) after the data has been converted to make improvements not directly related to converting the data).

Advantages of Data Conversion

  • Data in XML is "future-proof" and prevents software lock-in
  • Data better structured and more meaningfully "semantically marked up"
  • Allows you to do more with the data
  • Can be exported more easily to typesetting software
  • Can be more easily and readily published in other media, e.g. cellphone/mobile/smartphone

? Data Conversions FAQ (Frequently Asked Questions)

Q: Does the resulting data have to be used in TLex/tlTerm/tlDatabase?

A: No. We can provide your output as pure, structured XML, in a relational database form (e.g. SQL Server / Oracle / MySQL / PostgreSQL), or even in MS Word or InDesign format (with correct Word or InDesign styles).

Q: Can any data get lost in the process?

A: No. We are extremely careful and follow strict procedures at all stages to ensure that absolutely no data gets lost. If there is anything that cannot be automatically converted, it will be logged for manual entry.