|Posted on Friday, Apr 9, 2021 - 16:17: |
A preview version of X-Ways Forensics 20.3 is now available. The URL of the download directory for all recent versions can be retrieved by querying one's license status as always.
What's new in v20.3 Preview 1?
* The OCR capabilities of the software package Tesseract can now be utilized from within X-Ways Forensics and X-Ways Investigator. The package can be downloaded from our web server. Updated download instructions are available from the same place as always. If Tesseract is found by v20.3 in the subdirectory \Tesseract of the installation directory when v20.3 is first run, Tesseract will be activated automatically. Otherwise please go to Options | Viewer Programs to indicate the path.
* OCR can be applied as part of logical searches or indexing to suitable files such as document scans or digitally stored faxes in TIFF format or PDF documents that contain only graphic content. The default file masks includes even *.jpg, however, whether applying OCR to every JPEG file in a case is a little excessive or necessary is up to you to decide, and you have full control over the scope of the search using various means anyway. Please be aware that high-resolution photos cost a lot of time to check of text. Digital photos in JPEG and HEIC format will be rotated according to the instructions in the Exif metadata to restore the correct orientation and thus hopefully allow OCR of text that was originally photographed roughly horizontally. If the ordinary text decoding is already successful for a given file of a type that is contained in both file masks (*.pdf), OCR will not be applied additionally. The option "Store decoded text for context preview and future searches" will also keep text derived from OCR stored in the volume snapshot.
* Search hits returned by the logical search in OCR-derived text are identified as such in the Descr. column and highlighted in a different color. The Descr. filter allows you to list only such OCR search hits or not OCR hits. Older versions of X-Ways Forensics can see OCR search hits from v20.3 when opening the same case, but won't know that they are OCR search hits.
* You can select up to two languages for text recognition at the same time, after clicking the ... button for this in Options | Viewer Programs. However, there is a trade-off if you select Chinese/Japanese and a Western language at the same time. This will deteriorate the recognition of the Asian characters. You may want to select *only* Chinese/Japanese for much better recognition in that language. English (actually Latin) letters can still be recognized in that case, even if English is not expressly selected, at reduced quality. Select both Chinese/Japanese and a Western language at the same time only if correct recognition is more important to you in the Western language.
* Preview mode now has a separate submode in addition to Raw submode, called Text mode, in which pure text from non-picture files is extracted, just like for the logical search with the decode option. That submode can also be useful to better understand how text is extracted from various document types, in particular from spreadsheets, for which different extraction options exist that may differ in output, especially in formatting.
* If the ordinary text extraction/decoding in Text submode does not return any result or if the previewed file is a picture file, and if Tesseract is available and active, OCR will be applied. This allows you to better understand how well OCR will work in searches for the kind of files that you are dealing with. You can also experiment with different languages selected and compare the quality of the results. The submode button is named "Text" by default, but will change its label to "OCR" to make you aware that OCR is or was employed to retrieve the text. OCR can be time-consuming for multi-page TIFF and PDF files, but can be interrupted by the user if necessary. If a logical search or indexing has applied OCR to a file before and the result was stored in the volume snapshot, then the OCR-based preview will be available instantly and OCR will not be re-applied from scratch.
* Both submodes Raw and Text in Preview mode remain active until you leave Preview mode or select a file of a different type. If you prefer to make either of these submodes more persistent, so that it remains active even when previewing files of different types, you can hold the Shift key while clicking the respective submode button.
* The Tesseract package that is downloadable from our web server already has support for the following languages integrated, in alphabetic order:
chi_sim: simplified Chinese (horizontal writing only)
chi_tra: traditional Chinese (horizontal writing only)
jpn: Japanese (horizontal writing only)
kor: Korean (horizontal writing only)
Other languages can be added if you can find .traineddata files for them at https://github.com/tesseract-ocr/tessdata_fast. Such files simply need to be put into the \tessdata subdirectory of Tesseract. Or you can visit https://github.com/tesseract-ocr/tessdata_best to download higher quality OCR engines for any of the supported languages. (Please note that OCR takes considerably more time with them.)
* Supported file types are generally the following: PDF, PostScript (PS), TIFF, JPEG, HEIC, PNG, GIF, BMP, WEBP, AutoCAD DXF, Photoshop PSP, and maybe more.
* Ability to use the Descr. filter to focus on search hits in misaligned UTF-16 text.
* Ability to highlight search hits in alternative e-mail previews.
* Cyclic tab key order defined in the main window also in search hit list mode.
* Some minor improvements.
* Same fix level as v20.2 SR-1.
|Posted on Friday, Apr 9, 2021 - 20:15: |
Thank you....this is a great update
|Posted on Tuesday, Apr 13, 2021 - 20:14: |
* Slightly more complete OCR output for certain PDF documents.
* Some minor improvements.
|Posted on Thursday, Apr 15, 2021 - 11:30: |
Very nice Stefan. I will try and take a look as soon as I can and explore this great addition to XWF. I know OCR has been hoped for by many of us for some time.
|Posted on Friday, Apr 16, 2021 - 11:51: |
* New X-Tension API functions XWF_PrepareTextAccess() and XWF_GetText().
* Some fixes and minor improvements.
|Posted on Monday, Apr 19, 2021 - 7:19: |
* One fix and one minor improvement.
|Posted on Tuesday, May 4, 2021 - 19:43: |
* Compressed data chunks in NTFS-deduplicated files are now decompressed, i.e. such files can now be opened. Requires access to Windows 8 or later.
* New option to name carved files after the number of their respective first sector, either with or without leading zeroes.
* Ability to apply the Flex Filters to the additional columns of event lists, such as event timestamp and event description.
* In Ext file systems, a new volume snapshot option allows running a more in-depth parsing of deleted directory entries during the initial creation of the volume snapshot, even if they are misaligned in relation to the current directory entries. This might find additional previously existing files in Ext, at a likely manageable risk of finding some garbage entries as well. The checkbox for this is labeled "Ext: Try misaligned deleted dir entries".
* The file "GREP Expressions.txt", in which X-Ways Forensics recalls friendly names of your favorite regular expressions, is now named "Regular Expressions.txt". Please rename your existing file, if you have one.
* Several of the changes and fixes of v20.2 SR-3.
|Posted on Friday, May 28, 2021 - 15:37: |
* Ability to interpret backup bundles created by Apple Time Machine as disks, by opening and interpreting the file "com.apple.TimeMachine.MachineID.plist". Requires WinHex Lab Edition or higher. Once interpreted, in X-Ways Forensics you can add the simulated disk to the case as an evidence object if considered relevant, as usually for example by right-clicking the tab and invoking the menu command for that.
* Faster performance when dealing with undefined/sparse areas in backup bundles, VDI, VHD, VHDX, and VMDK disk images, including differencing images, i.e. virtual disk images with "parents".
* There is now an option to not show internal information such as examiner name and case path and image paths in the case report, if the report is generated for people outside of your organization.
* There is also an option to not show the technical description of evidence objects. That could be useful to avoid unnecessary discussions with computer laypersons in court or elsewhere about what a "sector size" is etc.
* The number of matches of blockwise hashing that is output as search hits is now mentioned in the Messages window, to inform the user of the results and remind him or her of in what form to find them.
* The Recover/Copy command, when applied to the Case Root window and when run with the option to recreate partial paths will now gather child objects of files in subdirectories just like when applied to one particular evidence object.
* Several minor improvements.
|Posted on Thursday, Jun 10, 2021 - 8:53: |
* Improved detection of generating device class and processing state of pictures based on dimensions.
* Various improvements.
* Fix level above v20.2 SR-4.
|Posted on Monday, Jun 14, 2021 - 20:30: |
* The new X-Tension API functions XWF_GetColumnTitle and XWF_GetCellText allow to retrieve the contents of all directory browser cells as text.
* The command "Import evidence objects" now by default imports all evidence objects in a case, and only evidence objects marked as important if you hold the Shift key at the moment when the import starts.
* If a parameter in the command line is the path or name of an .xfc file, and if at that point when the parameter is processed a case is already open, then the evidence objects of that .xfc file will be imported into the already active case. (In previous versions this would have closed the active case and opened the other case.)
* Some minor improvements.