Professional Data Conversion
If you have dictionary, terminology or other data in formats such as Microsoft Word, QuarkXPress, InDesign, FLEX, Multiterm, Folio, Toolbox, Excel or MS Access, and would like to convert it to a properly structured database (e.g. XML / TLex / tlTerm), we can help; we have a great deal of experience doing such data conversions, from large projects for major commercial or government organisations, down to small projects for individuals.
A data convertor, or “importer” involves the development of a customised ‘parsing’ system based on the structure, appearance and storage format of a given dictionary or other document – usually from an “unstructured” or weakly structured format (such as Microsoft Word or Adobe InDesign) to a “structured” format in which data fields are semantically marked up (such as TLex). A custom importer allows a large amount of dictionary data to be automatically converted into a structured format far more rapidly and more accurately than manually re-entering the data (for example, typically within around one month). The task generally requires a skilled programmer.
Depending on the data's size and complexity, we usually guarantee a 95% conversion rate, i.e. that 95% of the entries will be fully converted. Due to the difficult nature of automatically converting large amounts of complex data, the remaining 5% of entries unfortunately usually needs to be entered manually by the customer (e.g. by copy-and-paste, field by field). If desired, we can also quote for doing the last 5% of "manual" entries as a separate, additional job. Otherwise, the list of 5% “failed entries” will be provided to the customer.
We usually make a ‘best effort’ attempt to parse more than 95% of the data, depending on the available time and budget and other resources; however this can not be guaranteed.
For data that is already reasonably well-structured, we may be able to guarantee a 100% import. We can determine this prior to quoting when analysing the data.
In order to estimate the costs involved in doing a data conversion, we must be provided with a sizeable sample of the data in order to study it. It is crucial that the sample of data be representative (in terms of structure, appearance and consistency) of the full data set – otherwise, we cannot prepare an accurate estimate, and will not be able to guarantee the percentage of successfully imported entries beyond those provided in the initial sample used to prepare the estimate. We can thus only really guarantee a particular percentage success rate if we receive all of the data to be converted, in its final form, in advance – thus ideally, the entire set of data should be provided to us in order to prepare an accurate quotation.
We do not mind signing an NDA (non-disclosure agreement) or confidentiality agreement. (We do require that we be able to have a copy of the data on our systems temporarily, for as long as is required to do the conversion.)
The amount of time it will take for us to complete an importer will depend on the complexity, size and consistency of the data. Unfortunately it is difficult to guarantee a particular delivery time; however, the average time is around two to three weeks. If a particular deadline must be guaranteed, we can do this, but would have to quote higher in order to be able to prioritise programmer time.
The cost will generally be based on an estimate of the number of hours required to convert the data, multiplied by our hourly rate of 40 Euro per hour. Discounts may be available for large jobs. Rates valid for 2010 and may be subject to change.
Note that by default, our data conversion quotations do not include implementing further manual corrections on the data itself, as one might do when for example typesetting a dictionary. It also does not generally include "cleaning up" tasks on the data other than those that should reasonably be considered part of the data conversion itself (e.g. cleaning up tasks that represent new improvements to the content), although we usually do some clean-up, within reason. Generally however, tasks such as these should be quoted for separately. The same applies for post-import data re-modelling (for example, doing restructuring of the DTD (Document Type Definition) after the data has been converted to make improvements not directly related to converting the data).
Q: Does the resulting data have to be used in TLex/tlTerm/tlDatabase?
A: No. We can provide your output as pure, structured XML, in a relational database form (e.g. SQL Server / Oracle / MySQL / PostgreSQL), or even in MS Word or InDesign format (with correct Word or InDesign styles).
Q: Can any data get lost in the process?
A: No. We are extremely careful and follow strict procedures at all stages to ensure that absolutely no data gets lost. If there is anything that cannot be automatically converted, it will be logged for manual entry.