| Author |
Message |
   
Duncan Clarke
Username: dac_retrodata
Registered: N/A
| | Posted on Saturday, Jul 18, 2009 - 21:02: | |
I have two CCTV disks, each containing about 12 million jpg files. I am attempting to extract only those files with the text string CASH. Is there a simple way to do this? Thanks |
   
Craig Ball
Username: craigball
Registered: 4-2006
| | Posted on Saturday, Jul 18, 2009 - 22:12: | |
Where in the file would the text string occur? If it's sufficiently early in the file, you could perhaps create a custom header to carve JPGs that also includes the string. |
   
Stefan Fleischmann
Username: admin
Registered: 1-2001
| | Posted on Sunday, Jul 19, 2009 - 0:45: | |
I would simply run a logical search for that search term in all these JPEG files, excluding slack, and in the search hit list you can do whatever you like with those that contain the search term, e.g. select, tag, copy, ... |
   
Craig Ball
Username: craigball
Registered: 4-2006
| | Posted on Sunday, Jul 19, 2009 - 1:29: | |
I was (perhaps mistakenly) assuming the JPGs are in free space and being identified by carving. If they are active JPG files tracked by the file system, my way is stupid. Stefan: If the JPGs were in free space and the string CASH were just a few bytes beyond the JPG header, would doing it the way I propose not be faster, or would you still approach the effort by first carving JPGs and then searching the carved files for the string? |
   
Stefan Fleischmann
Username: admin
Registered: 1-2001
| | Posted on Sunday, Jul 19, 2009 - 2:08: | |
If so, then yes. It would also save memory when not unnecessarily including many million files in the volume snapshot in the first place. For 12 million files I would now recommend v15.4 because of its considerably reduced memory requirements. |
   
Duncan Clarke
Username: dac_retrodata
Registered: N/A
| | Posted on Sunday, Jul 19, 2009 - 9:53: | |
I have tried running a logical search for the text string (one it per file) but after a quick start, overnight it is still showing 75 hours to run. This, with a deadline of Monday afternoon GMT, is not looking particularly healthy - especially since those CASH files need to be examined for evidence. The installation of Windows XP 64-bit on the Mac Pro (8 cores) is significantly quicker than my top-range "Windows PC" - but still nowhere near quick enough. Anyone know where I can rent a CRAY from for a couple of days?! |
   
Stefan Fleischmann
Username: admin
Registered: 1-2001
| | Posted on Sunday, Jul 19, 2009 - 10:13: | |
1) If the relative offset of "CASH" in these files is fixed, use a custom signature definition and run the file type verification instead to identify the files. Faster. 2) If you use the logical search and you require "CASH" in upper case characters, then use the [x] "Match case" option. Faster. 3) Don't search for other search terms at the same time. Faster. 4) If you are running the search in the files on a logical drive letter (having added the logical drive letter to the case), stop the search and run it on the partition opened from within the physical disk instead. Faster. 5) If you have a choice of running the search on the physical disk or a raw image of it, search in the image instead. Faster. |
   
Duncan Clarke
Username: dac_retrodata
Registered: N/A
| | Posted on Sunday, Jul 19, 2009 - 10:19: | |
Stefan, Thanks for your help. One last question if I may; once the search has completed, how do I select *only* those files containing the CASH string, and copy them to a separate folder? |
   
Stefan Fleischmann
Username: admin
Registered: 1-2001
| | Posted on Sunday, Jul 19, 2009 - 10:30: | |
Ctrl+A in the search hit list for the search term "CASH", right click, Recover/Copy |
   
Duncan Clarke
Username: dac_retrodata
Registered: N/A
| | Posted on Sunday, Jul 19, 2009 - 21:13: | |
Thanks - that worked. However, despite having worked on this for ten days, the client has now decided we were looking for the incorrect data.... Now that we have the correct search string, is there any way to copy the files into separate folders of, say, 50,000 files per folder? Windows (and Mac OS) have problems dealing with a folder containing over a million files.. Thanks for all help. |
   
Stefan Fleischmann
Username: admin
Registered: 1-2001
| | Posted on Sunday, Jul 19, 2009 - 21:37: | |
Only if you select and copy 50,000 files at a time. |
   
Pánczél, Levente
Username: panczel_levente
Registered: N/A
| | Posted on Monday, Jul 20, 2009 - 11:26: | |
If I have to work with that many exported files, I do the following: I export all files to one folder with bates numbering, and use excel to create .bat files that move them to separate folders, like this (sorry if I don't remember precisely) md 00001 md 00002 ... and ren F00001???.* 00001\F00001???.* ren F00002???.* 00001\F00002???.* ... This was not much slower for me, than any common file operation, if set up correctly: Some NTFS parameters have to be adjusted for this to go fast. http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_behavior.mspx?mfr=true contains these; I suggest these in commons, but in this case it has a notable benefit to set disbale8dot3=1, disablelastaccess1 and mftzone=4. If set up this way then NTFS had no problems with folders having 30M+ files (until I tried to open it in explorer; but this is unnecessary for the above process). Hope this give some ideas! |