OCR (Optical Character Recognition) from SmartLF scans

March 26th, 2008

Digital scan information derived from a text document is still just a series of dots and although it can be read by people it is not understandable by the computer - it’s termed ‘unintelligent’.  The operation to convert these meaningless dots into a form that can be reprinted and used properly by a computer is called optical character recogntion (OCR).  OCR is often needed to complete the conversion of scans from a typical mechanical or architectural drawing fully into its vector form once the arcs, lines and circles have been changed using raster-to-vector conversion.  CAD system files in their simplest form consist only of co-ordinate number data which is then displayed to the operator as vector information onscreen.

Optical character recognition (OCR) will need to be provided by additional software in addition to that supplied by Colortrac.  Sometimes it can be part of larger CAD or indexing software applications which can themselves also control the SmartLF scanner.

Character recognition software generally takes one of two forms- recognition of machine printed text information (computer generated fonts) and handwritten characters, which of course can be almost infinitely variable in shape and form.  OCR for handwriting is generally more complicated.

 Many small format scanners come supplied with the first type of OCR but the technique to recognise handwritten characters is slightly different.  Traditionally more costly, handwriting OCR has recently become more available and efficient mainly due to the emergence of PDAs (Personal Digital Assistants), Palm pcs and similar devices that use touch-sensitive screens with wands or pens.  A few specialist companies have become world-leaders in the field of hand-written character recognition technology and often software houses who have also developed their own systems as part of their CAD-based raster to vector products offer inclusion (upgrades) of these systems as upgrade options.

Traditional CAD systems OCR embedded as part of the application generally require a series of hand-writing training sessions to be undergone by the user before the recognition software is properly useable.  The bought-in (external licensed) software is often more efficient and will usually be good enough to use straight away.

Both types of system basically need to be directed to the areas on the scan that require converting.  What happens then is that the recognition software begins the task of analysing the characters against its character shape library.  If the software is fairly certain about a shape’s identity and the match is good it will convert it and replace the raster with the ASCII (computer text) character.  If the software is not so sure of the match it will stop and request assistance from the operator.  The amount of training and direction provided by the operator are directly linked to the success of this type of operation.

 Many old pre-CAD drawings can be successfully modified and updated by adding vector information alongside or over the top of the original raster drawing - the so-called ‘hybrid’ drawing.  The benefit of this approach is that old drawings do not need to be completely re-drawn.  Old scans can even be re-scaled to view alongside the real-world sized drawings of regular CAD.

For more information on the providers of raster-to-vector and OCR software see the CLASP pages at

http://www.colortrac.com/scanning_software/software_clasp.htm 

SMARTLF SCAN SPEEDS - WHAT AFFECTS IT?

February 6th, 2008

Colortrac dynamically control the speed of the SmartLF scanner based on the computer in use and the scan parameters set by the operator.  The scanner data rate is linked to 3 main parameters:

the width of the document being scanned, the resolution used and the colour mode i.e. black & white or 24-bit colour .

However it also depends on the performance of the electronics in the computer.  Where possible SmartLF will start to adjust its mechanical speed downwards once the maximum data rate through the USB2 interface for the connected computer has been reached. Generally an average computer (2.4Ghz Pentium) should be able to maintain a constant speed for a 200 dpi scan in full colour mode.  Even a really fast computer will stop perhaps every 400mm when scanning an A0 at 400 dpi in full colour when set at scanner maximum speed because the USB2 just cannot keep up with the scanner. 

To keep the scanner running continously use the software to slow the scanner slightly.  One of the biggest causes of intermittent scanning is a badly fragged (mal-structured) hard disc.  Use of the Windows degragging tool (System Tools) will ensure that your computer is able to store large files in contiguous (non fragmented) sectors on disc and thus avoid spending large amounts of time searching and moving around the disc looking for the next available areas to store the data.   Well maintained storage also speeds up recall time as well of course.

But here’s another big difference between what Colortrac do and the way other scanners work. . .

SmartLF does not buffer scan data so when the document stops moving in the SmartLF scanner this just about coincides with the scan data being ready for disc access at the computer’s hard disc.  Many other scanner manufacturers buffer the scan data in the scanner and give an impression of overall speed and scan completion but in reality the operator still needs to wait until all the data has been accepted by the computer before the file can be opened or the next scan started.  There is no way around the fact that USB2 can only transmit a certain amount of data per second.  Buffering has most value as a marketing trick while adding unnecessary cost to the product.

FAQ 29 How do I prevent documents being scanned skewed?

January 25th, 2008

The SmartLF All-In-One software (SmartLF’s included software) is effectively a batch program.  After the first scan or copy, the program will continue to scan or copy every time media is detected in the scanner.  When a document is crumpled or the edges are broken batch scanning can get a little tricky sometimes.  If you can see that the next document might present a problem you might like to try the following:

Before inserting the next document, press the SCAN button on the scanner.  This will temporarily toggle the full automatic action of the system to off.  To start the next scan first load the document into the scanner, wait for it to stage and then press SCAN again.  This will re-engage automatic batch processing until the next press of the SCAN button.

Alternatively, mouse click the FILE icon onscreen before the next document is inserted into the scanner.  This has the same effect as above and will stop the scanner automatically taking the document but still allow the document to be loaded.  The media can be reloaded as many times as required by using the BACK arrow on the scanner panel to rewind the document.

ScanWorks by default is not a batch program and requires the operator to initiate the scan each time by pressing the SCAN button on the scanner, clicking the scan icon in ScanWorks or by pressing F9 on the keyboard.  ScanWorks does have a batch mode and like SmartLF All-In-One will automatically begin scanning after the document has staged in the scanner.  ScanWorks Batch, like most batch processes it is designed for minimum user intervention and there is no opportunity to correct a skewed load once the document has loaded -  except by canceling the batch.

To control ScanWorks and SmartLF All-In-One properly via the scanner panel requires the Events logic in the Windows Scanners & Cameras device (the SmartLF scanner) to be set for ‘Take no action’.  Go to the Control Panel to check these settings (see p.11 of the manual).  Without these changes Windows may attempt to start other programs when the scanner buttons are used and remote control of the software from the scanner will not be possible.

The interval between the scanner sensing the document and the scanner staging the document is called the media load delay.  This feature is internal to the scanner and at the moment there is no provision in SmartLF for altering this feature. The setting is fixed currently at 1.3 seconds and is reckoned to be the best compromise between speed and efficiency for the SmartLF.

Finally experiment with the way the original is inserted into the scanner.  For a document that has a strong upward curl it is often useful to hold the document by the side edges and twist the leading edge down into the roller area rather than placing the document face down on the platen and trying to keep it flat while moving it towards the rollers.

FAQ 28/2007 Does SmartLF do Linux?

December 12th, 2007

Linux is an Open Source operating system and many versions are in circulation at any given time.  Support for Linux is mostly only forum based and this can make driver maintenance quite a difficult and time consuming task.  Also because of the radically different technology used in SmartLF Windows drivers it is not a trivial exercise to reproduce its operation in a Linux environment.  These factors coupled with the relatively small amount of interest from the Linux community has always deterred Colortrac from investing in a SmartLF driver for this operating system.

A possible way around the problem for some users may be to try an emulation running on the Linux host and a couple of products are now available on the market.  VMWare Workstation 3.0 http://www.vmware.com/products/ws a product that produces a complete virtual machine on the Linux host and should be able to handle all the hardware interfaces including the USB2 ports although we haven’t tried it ourselves.  A second product that does something similar is Win4Lin http://www.netraverse.com .  This works through the existing Linux hardware and should appeal more to the newer user because it leaves files on a shared hard drive - ideal for the scanner user. 

Although we have reports that the VMware product for Macintosh provides a working Windows environment for Mac users of SmartLF we have no evidence that their Linux product works with Colortrac scanners.   Note: A Windows licence may be required.

Disclaimer: Colortrac only supports SmartLF on true Windows computer hardware.  This information is provided freely, at the user’s own risk and does not constitute any form of guarantee of operation. 

FAQ 27/2007 Does Fusion emulation software work with Colortrac SmartLF scanners?

December 11th, 2007

Yes it does!  VMWare Fusion v1.1 by VMWare is a software emulation product that runs on the Macintosh and lets it run applications written for Windows.  It also supports pc software and hardware that uses USB2 ports - like a large format scanner.   You obviously need to have an Intel-based Mac and it must be capable of running 64-bit operating systems with a Core 2 Duo or Xeon processor but apart from these, that’s all you need.  Fusion will use the Mac’s existing Boot Camp partition to host the Windows virtual machine or you can use its Easy Install utility to set up a new virtual machine.  Either way this appears to be a perfectly satisfactory way to run SmartLF from Mac hardware.  For more information go to:

http://www.vmware.com/products/fusion/overview.html

Like other silmilar solutions on the market it would appear that you don’t even need to reboot the Mac - a feature that makes it particularly attractive to scanner operators who’d rather not spend too much time in the Windows world.  Note: A Windows licence may be required.

Thanks go to Allied Images Ltd, the UK distributor for Colortrac who told us about this product.  Apparently a UK customer has used Fusion to run a SmartLF Gx 42 scanner from a Mac Book when the pc they were going to use wouldn’t work.

Disclaimer: Colortrac only supports SmartLF on true Windows computer hardware.  This information is provided freely, at the user’s own risk and does not constitute any form of guarantee of operation. 

FAQ 26/2007 Does SmartLF have a scan statistics option?

December 10th, 2007

ScanWorks does have a facility for outputting scan statistics - on a ‘per scan’ and ‘all scans since last reset’ basis.  When ScanWorks scan logging (or statistics) is enabled, ScanWorks puts a file called ScanLog.txt in the following locations:

WindowsXP-Documents&Settings\User\ApplicationData\ScanWorksData folder

Windows Vista - Users\User\AppData\Roaming\ScanWorksData folder

Once enabled ScanLog.txt lists the area scanned etc for each colour mode as each scan is made.  When the summation button is pressed on the ScanWorks GUI individual scan statistics are summed and added in a totals list at the end of the file.  Pressing the summation button has the secondary effect of resetting statistics and will overwrite the contents of ScanLog.txt without warning at the end of the next scan.  To suspend statistics uncheck the enable button in ScanWorks Preferences.

scan_stat.jpg

The contents of a running ScanWorks Statistics scan logging session

**********************************************************************************
Monday, October 29, 2007 11:19:27
 C:\Documents and Settings\Operator\My Documents\Scanworks\TEST-01.tif
  8.24″ x 11.69″ @ 200dpi, RGB
 Area =  0.67 square feet, Scantime = 00:00:07, Status = Saved

Monday, October 29, 2007 11:19:49
 C:\Documents and Settings\Operator\My Documents\Scanworks\TEST-02.tif
  8.24″ x 11.69″ @ 200dpi, RGB
 Area =  0.67 square feet, Scantime = 00:00:07, Status = Saved

Monday, October 29, 2007 11:20:01
 C:\Documents and Settings\Operator\My Documents\Scanworks\TEST-03.tif
  8.24″ x 11.69″ @ 200dpi, Black and white
 Area =  0.67 square feet, Scantime = 00:00:06, Status = Saved

Pressing the summation button adds the following information onto the end of the file.

Totals for Monday, October 29, 2007 11:19:27 through
           Monday, October 29, 2007 11:20:32 (0 days, 0 hours, 1 minutes elapsed)
               Total square feet:   2.01
               Total saved scans: 3
               Total rejected scans: 0
               Total scans: 3
               Total RGB scans: 2
               Total 256 color scans: 0
               Total 16 color scans: 0
               Total Grayscale scans: 0
               Total Black and white scans: 1
               Total edited post scan: 0
****************************************************************************************