When Kurzweil 1000 is asked to open a file, it often reads and converts the file into a temporary file in its own format - KES. If the format of the original file is text, RTF, Braille, HTML, XML, or DAISY, it does that conversion using techniques that we have written here at Kurzweil Educational Systems. If the format of the original file is an image, or is PDF, then the conversion is actually a recognition, and an OCR engine is used. If the format of the original file is something else - Microsoft Word, for example, then a third party conversion program is used. It is told to convert the file into a temporary RTF file. That RTF file, is then converted again by Kurzweil 1000 into KES. And you wondered why it took a long time to open some files?
When you save a file, if you are not saving into the KES format, then a conversion is happening. Again, if the output format is Text, RTF, Braille, HTML, XML, or DAISY, the conversion is done using code that was written (and is controlled) by Kurzweil Educational Systems. If the format is something else, then a two step process is used - K1000 will convert the file to RTF, and then will invoke a third party conversion program to create the final file in the requested format.
Beginning with this release, we have provided a dialog that lets you control some of the details of file conversions. You can access this dialog using the Conversion menu item in the Settings menus. Its just below the Verbosity menu item. It will open the Conversion Settings dialog. This dialog has the usual OK and Cancel buttons at the bottom, and two important list controls at the top. The first is labeled "Action", and allows you to choose between those settings that affect the Opening of a document, and those that affect the Saving of a document. The second list is labeled "Format", and lets you choose among a selection of document formats. Below those two controls, there are a variable number of other dialog box controls. Exactly what they are and what they do is determined by the settings of the first two controls. I'll go through each of the possibilities here.
Action = Opening, Format = Text.
Split Long Pages - a list box, whose possible settings are Enabled and Disabled. The default is Enabled. K1000 looks for form feed characters when it opens text file. If the amount of text between form feeds exceeds some amount, a page break is forced. This setting allows you to disable that action, so that the resulting KES file has no more pages than indicated by the form feeds in the text file. The mnemonic for this control is ALT+"P".
Paragraph Analysis - a list box, whose possible settings are Enabled and Disabled. The default is Enabled. K1000 uses a fairly sophisticated analysis to try to figure out where end of paragraph marks should be placed. The analysis is sensitive to attributes such as first line indent, average text length, the presence of blank lines, and even tries to make sense out of tables, block indents, and hanging indents. If you disable it, you will end up with each line in the original text file being treated as an end of paragraph. This preserves the look of the original text file, at the expense, often, of its editability. The mnemonic for this control is ALT+"A".
Action = Opening, Format = Braille.
Language - a list box, whose possible settings are Default, Danish, Dutch, English, German, Icelandic, Italian, Norwegian, Russian, Spanish, and Swedish. The default is, well, Default. Default behavior is to look at the language supported by the current reading voice, and use it whenever a Braille document is being opened. This setting won't do much if you aren't back translating, but it can be pretty useful if, for example, you know you are opening a Spanish Braille document. The mnemonic for this control is ALT+"L".
Action=Opening, Format=PDF.
Emphasis - a list box, whose possible settings are "Recognition of Images" and "Extraction of Text". The default is "Recognition of Images". The mnemonic for this control is ALT+"E". PDF files are unusual in that they can contain images and text. Unfortunately, they don't always contain text, and even when they do, that text may not contain all of the text that a sighted person would see when looking at the image of a page in the PDF file. When you open a PDF file, the recognition engine extracts the text and, potentially, recognizes the images for each page in the file. If you choose to emphasize the recognition of images, the text will be used to correct minor OCR mistakes, but the bulk of the results will come from the images. This is the default for this setting. Its primary advantage is that you are pretty much guaranteed to get access to all of the text that is represented in the PDF file - regardless of whether it is available as text from that file. There are, however, a few disadvantages. It is usually slower, and, if all of the text was there, it is likely to be less accurate. The alternative setting is "Extraction of Text". If text data is available for a page in a PDF file, that data will be trusted. Recognition will be done only to associate the text data with the image data on the page. Note that if no text data is available for a page, the image will be recognized and the results of that recognition will be made available to you. The advantages of this approach include both speed and accuracy. However, if portions of the page contain text represented only as an image, those portions will be ignored. It may be difficult for you to tell, when you read the page, that portions of it are missing. Note also that this setting interacts with your choice of recognition engines, and somewhat different results will result depending on which engine you choose, and which treatment you choose to emphasize.
Action = Opening, Format = RTF.
Split Long Pages - a list box, whose possible settings are Enabled and Disabled. The default is Enabled. RTF files may already contain page breaks, but K1000 will insert additional ones if the text of a page, in its assigned font, wouldn't fit on a 14 inch printed page. By disabling this setting, you can make sure that the number of pages in the opened file matches those that exist in the original RTF file. The mnemonic for this control is ALT+"P".
Action = Opening, Format = Other.
Use Microsoft Office for Conversions - a list box, whose possible settings are Enabled and Disabled. The default is Enabled. Microsoft Office comes with a conversion service that can convert documents in a number of different formats to RTF. From there, Kurzweil 1000 can convert the file from RTF to KES. This conversion package is usually a better choice than its alternative - a conversion service from another vendor that comes with Kurzweil 1000. However, if the conversion service from Microsoft has not been completely installed, our attempt to use it will bring up an unvoiced dialog from Office, asking you to complete the installation. If you do not have a screen reader running, this may look as though K1000 has hung. In this circumstance, it might be better to disable this setting. The mnemonic for this control is ALT+"M".
Action = Saving, Format = Text.
Add a Blank Line after each Paragraph - a list box, whose possible settings are Enabled and Disabled. The default is Disabled. One of the problems with plain text as a format is that it does not have a specific character used to mark an end of paragraph. Most of the settings for saving text have to do with trying to overcome, in one way or another, that limitation. If you enable this setting, each paragraph ending will always be followed by a blank line. If you do not enable it, paragraph endings will be followed by blank lines only if that is the case in the original file. The mnemonic for this control is ALT+"B".
Indent the First Line of each Paragraph - a list box, whose possible settings are Enabled and Disabled. The default is Disabled. If you enable this setting, the first line of each paragraph will begin with a tab, or with a certain number of spaces. If you do not enable it, paragraphs will have first line indentations only if the first line in the original paragraph begins with a tab. The mnemonic for this control is ALT+"I".
Spaces used for a First Line Indent - a text box. Possible values are the numbers 0 through 10. The default is 0. When zero, this setting indicates that a first line indent should be created with a tab character. Otherwise, the setting indicates the number of spaces to be used. This setting has no effect if the first line indent setting above it is disabled. The mnemonic for this control is ALT+"S".
Line Endings - a list box, whose possible settings are Preserve, Remove, or Wrap to Fit. The default is Preserve. When set in this manner, each line in the text file will have the same length as the original scanned lines - assuming that they were scanned by Kurzweil 1000. When set to Remove, each text line will be equal to a paragraph. Needless to say, this can create rather long text lines, but most text processors can automatically wrap long lines to fit within the width of the display window. Finally, the Wrap to Fit setting, which interacts with the maximum width setting that follows, will pretty much ignore the original line endings, but will introduce line endings as necessary to keep each line within a paragraph under a particular maximum limit The mnemonic for this control is ALT+"L".
Maximum Width of each Text Line - a text box. Possible values are the numeric range 30 through 250. The default is 80. This setting is important only if Line ENdings are set to "Wrap to Fit". It establishes what can be considered a margin for the document. Line endings will be added to keep lines under the number specified here. They can exceed that number only if a word has a length larger than the length specified here. The mnemonic for this control is ALT+"M".
Action = Saving, Format = Braille
Type of Braille - a list box, whose possible settings are Grade 1 and Grade 2. The default is Grade 2. This setting take effect whenever a Braille document is saved. The mmenonic for this control is ALT+"T".
Language - a list box, whose possible settings are Default, Danish, Dutch, English, German, Icelandic, Italian, Norwegian, Russian, Spanish, and Swedish. The default is, well, Default. Default behavior is to look at the language supported by the current reading voice, and use it whenever a Braille document is being written. The mnemonic for this control is ALT+"L".
Action = Saving, Format = Other
Use Microsoft Office for Conversions - a list box, whose possible settings are Enabled and Disabled. The default is Enabled. Microsoft Office comes with a conversion service that can convert RTF documents to a number of different formats. This conversion package is usually a better choice than its alternative - a conversion service from another vendor that comes with Kurzweil 1000. However, if the conversion service from Microsoft has not been completely installed, our attempt to use it will bring up an unvoiced dialog from Office, asking you to complete the installation. If you do not have a screen reader running, this may look as though K1000 has hung. In this circumstance, it might be better to disable this setting. The mnemonic for this control is ALT+"M".
This new dialog caused a few other changes. First, we have removed the maximum text length setting from the General Settings dialog, since we've replaced it with a few different settings in the text saving section of the conversion settings dialog. Second, the dialog for saving partial changes has a new value in the list of possible settings categories: Conversion.