docWorks is the first software that converts scanned pages to searchable and metadata enriched digital objects in one seamless workflow. From the import of the scans to the export as standardized formats, libraries and archives stay in full control of their files and do not have to deal with incompatibilities of modules or lost data shipments.
Categorize. Organize. Archive.
How does docWorks operate
docWorks “converts” scanned images. More specifically, docWorks identifies the information contained in scanned pages (such as text and structure), saves this information in an XML file, and adds this file to the image. The two essential conversion steps are OCR and segmentation of the document by logical units (articles, chapters, etc.). Only through OCR can a scanned page be searchable, and zoning and structure recognition ensure that only relevant search results are displayed. For instance, if no zoning/structure is applied, a multiple-word search within newspapers might display thousands of results because the single search words are being found throughout an entire newspaper page. Segmentation of the page by its different articles will ensure that the search words are found in the same article.
The conversion process runs through different steps: Following import, the scanned images are “cropped,” meaning they are cut to a consistent size. This step is followed by zoning (segmentation of the page by classified blocks and columns) and the editing of structure (paragraph, chapter, article), text correction, and metadata. Next comes a standardized output of the data in METS/ALTO files, which are stored in the archive and fed into the presentation system. Every workflow step consists of an automatic analysis executed by docWorks and a manual correction. This correction can be done by the docWorks user, or it can be outsourced to specialized service partners.
Introducing docWorks Starter
Driven by the passion to make knowledge searchable and accessible for everyone, Content Conversion Specialists (CCS) has been developing digitization software technology for over 35 years. Their renowned metadata creation software, docWorks, is used by prestigious libraries worldwide. In order to help institutions to take the first step in the OCR of their holdings, CCS has just released a new standalone version, docWorks Starter. This introductory version is user-friendly and easy to learn, requires no IT efforts and is up-scalable for every level of quality and quantity. Features Includes:
- Save resources through high degree of automation
- Process different types of material (newspapers, magazines, books, manuscripts, pictures, cards, etc.)
- Benefit from an all-in-one solution which includes OCR
- Create structural metadata for chapters, articles, contributions etc in an automated document analysis
- Define jobs and projects flexibly with the help of a wizard
- Easily connect it with external systems, e.g. catalogue via Z 39.50
- Save time and human resources with integrated background processing
- Make use of various standardized output formats incl. METS/ALTO, PDF and ePub