Extracting jpg files with certain tex... Log Out | Topics | Search
Moderators | Edit Profile

X-Ways Support Forum » Advanced Features » Extracting jpg files with certain text string « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Duncan Clarke
Username: dac_retrodata

Registered: N/A
Posted on Saturday, Jul 18, 2009 - 21:02:   

I have two CCTV disks, each containing about 12 million jpg files.
I am attempting to extract only those files with the text string CASH.
Is there a simple way to do this?
Thanks
Top of pagePrevious messageNext messageBottom of page Link to this message

Craig Ball
Username: craigball

Registered: 4-2006
Posted on Saturday, Jul 18, 2009 - 22:12:   

Where in the file would the text string occur? If it's sufficiently early in the file, you could perhaps create a custom header to carve JPGs that also includes the string.
Top of pagePrevious messageNext messageBottom of page Link to this message

Stefan Fleischmann
Username: admin

Registered: 1-2001
Posted on Sunday, Jul 19, 2009 - 0:45:   

I would simply run a logical search for that search term in all these JPEG files, excluding slack, and in the search hit list you can do whatever you like with those that contain the search term, e.g. select, tag, copy, ...
Top of pagePrevious messageNext messageBottom of page Link to this message

Craig Ball
Username: craigball

Registered: 4-2006
Posted on Sunday, Jul 19, 2009 - 1:29:   

I was (perhaps mistakenly) assuming the JPGs are in free space and being identified by carving. If they are active JPG files tracked by the file system, my way is stupid.

Stefan: If the JPGs were in free space and the string CASH were just a few bytes beyond the JPG header, would doing it the way I propose not be faster, or would you still approach the effort by first carving JPGs and then searching the carved files for the string?
Top of pagePrevious messageNext messageBottom of page Link to this message

Stefan Fleischmann
Username: admin

Registered: 1-2001
Posted on Sunday, Jul 19, 2009 - 2:08:   

If so, then yes. It would also save memory when not unnecessarily including many million files in the volume snapshot in the first place. For 12 million files I would now recommend v15.4 because of its considerably reduced memory requirements.
Top of pagePrevious messageNext messageBottom of page Link to this message

Duncan Clarke
Username: dac_retrodata

Registered: N/A
Posted on Sunday, Jul 19, 2009 - 9:53:   

I have tried running a logical search for the text string (one it per file) but after a quick start, overnight it is still showing 75 hours to run.

This, with a deadline of Monday afternoon GMT, is not looking particularly healthy - especially since those CASH files need to be examined for evidence.

The installation of Windows XP 64-bit on the Mac Pro (8 cores) is significantly quicker than my top-range "Windows PC" - but still nowhere near quick enough.

Anyone know where I can rent a CRAY from for a couple of days?!
Top of pagePrevious messageNext messageBottom of page Link to this message

Stefan Fleischmann
Username: admin

Registered: 1-2001
Posted on Sunday, Jul 19, 2009 - 10:13:   

1) If the relative offset of "CASH" in these files is fixed, use a custom signature definition and run the file type verification instead to identify the files. Faster.

2) If you use the logical search and you require "CASH" in upper case characters, then use the [x] "Match case" option. Faster.

3) Don't search for other search terms at the same time. Faster.

4) If you are running the search in the files on a logical drive letter (having added the logical drive letter to the case), stop the search and run it on the partition opened from within the physical disk instead. Faster.

5) If you have a choice of running the search on the physical disk or a raw image of it, search in the image instead. Faster.
Top of pagePrevious messageNext messageBottom of page Link to this message

Duncan Clarke
Username: dac_retrodata

Registered: N/A
Posted on Sunday, Jul 19, 2009 - 10:19:   

Stefan,
Thanks for your help.
One last question if I may; once the search has completed, how do I select *only* those files containing the CASH string, and copy them to a separate folder?
Top of pagePrevious messageNext messageBottom of page Link to this message

Stefan Fleischmann
Username: admin

Registered: 1-2001
Posted on Sunday, Jul 19, 2009 - 10:30:   

Ctrl+A in the search hit list for the search term "CASH", right click, Recover/Copy
Top of pagePrevious messageNext messageBottom of page Link to this message

Duncan Clarke
Username: dac_retrodata

Registered: N/A
Posted on Sunday, Jul 19, 2009 - 21:13:   

Thanks - that worked. However, despite having worked on this for ten days, the client has now decided we were looking for the incorrect data....
Now that we have the correct search string, is there any way to copy the files into separate folders of, say, 50,000 files per folder? Windows (and Mac OS) have problems dealing with a folder containing over a million files..
Thanks for all help.
Top of pagePrevious messageNext messageBottom of page Link to this message

Stefan Fleischmann
Username: admin

Registered: 1-2001
Posted on Sunday, Jul 19, 2009 - 21:37:   

Only if you select and copy 50,000 files at a time.
Top of pagePrevious messageNext messageBottom of page Link to this message

Pánczél, Levente
Username: panczel_levente

Registered: N/A
Posted on Monday, Jul 20, 2009 - 11:26:   

If I have to work with that many exported files, I do the following:
I export all files to one folder with bates numbering, and use excel to create .bat files that move them to separate folders, like this (sorry if I don't remember precisely)
md 00001
md 00002
...
and
ren F00001???.* 00001\F00001???.*
ren F00002???.* 00001\F00002???.*
...
This was not much slower for me, than any common file operation, if set up correctly:
Some NTFS parameters have to be adjusted for this to go fast.
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_behavior.mspx?mfr=true contains these; I suggest these in commons, but in this case it has a notable benefit to set disbale8dot3=1, disablelastaccess1 and mftzone=4. If set up this way then NTFS had no problems with folders having 30M+ files (until I tried to open it in explorer; but this is unnecessary for the above process).
Hope this give some ideas!

Add Your Message Here
Post:
Username: Posting Information:
Only registered users may post messages here, i.e. you need to have an account.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action:
Forum operated by X-Ways Software Technology AG.