Live Business Chat
2013 May 19, 03:14:06 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Some accounts got accidentally deleted by anti-spam software during a spammer attack on this forum. Please re-register. If you have trouble, contact badon or tamo42 in the chat. This is a friendly non-profit discussion group about making money. You won't be able to see all forums at first. You have to register to see more forums. Click the "NOTIFY" button every chance you get to receive instant alerts about new information.
 
   Home   Help Search Calendar Login Register  
Pages: 1   Go Down
  Print  
Author Topic: my integrity script for large (8GB) files needs testing / proposal for tool  (Read 265 times)
0 Members and 1 Guest are viewing this topic.
patrick
Dreamer
*

Karma: +0/-0
Offline Offline

Posts: 9


« on: 2012 Feb 14, 04:31:49 am »

I am currently testing my script for creating redundancy files for my movie data base.
It will recurse through all dirs and put seperate recovery files for each movie file in a destination dir.
These files are just so big that you dont want to recreate recovery files over and over again.
So when you decide to remove a movie from the database you just delete the movie and the according recovery files while
keeping the other recovery files. Saves a lot of time (my DB is about 1.5 TB ...)
Although it is finished, I would like to do some more testing with it and could really need a tool to simulate data corruption.

lets call it thrashit:

thrashit /error [infile] [outfile]
copies a file and fills it with random zeroes/errors while writing it out where /error=0-100% 
/error=100 would yield a copy completely filled with errors/zeroes

since we are dealing with large files, speed would be important ....
is there already a tool which could do this ? if not, it shouldnt be too hard to do. my C knowledge is kind of rusty and I dont think it would
be a good idea if I made it in Python ... a small standalone tool would be nicer.

so ... ehm .... persicum ? I bet you could do it in 5 min.... :) .... or are there any options in your rsc32 which could this ? .... I would not be surprised :)

Logged
persicum
Moderator
Serious Business
*****

Karma: +2/-1
Offline Offline

Posts: 122


« Reply #1 on: 2012 Feb 14, 05:24:51 am »

Sorry, I cannot write a recursive or wildcard patcher, because it would be harmful software. One can accidentally destroy and modify many good files on his HDD
Logged
patrick
Dreamer
*

Karma: +0/-0
Offline Offline

Posts: 9


« Reply #2 on: 2012 Feb 14, 05:38:46 am »

I know ... I totally agree with you. thats why it would have an [infile] and an [outfile] so that nothing can be overwritten (if infile == outfile then error and exit)... and it does not need to do recursion, just one file by one. it just creates a corrupt copy of a given file and does not overwrite anything.
just one neat handy little tool so we could check our assumptions.

maybe the name thrashit would be a little offending, sounds kind of dangerous ... "createtestfile" sounds less harmful dont you think ? :)
Logged
persicum
Moderator
Serious Business
*****

Karma: +2/-1
Offline Offline

Posts: 122


« Reply #3 on: 2012 Feb 14, 09:35:15 pm »

some times I simulate a broken file as follows:
1) Split a file onto several volumes by RAR-archiver WITHOUT compression (switch -m0)
2) Join these volumes in one big file (copy a+b+c...)
Logged
patrick
Dreamer
*

Karma: +0/-0
Offline Offline

Posts: 9


« Reply #4 on: 2012 Feb 15, 03:13:08 am »

I wrote a little python script yesterday which can now create broken files. works like this:

ctest2.py 200 512 test.mkv brokencopy.mkv

it will create a broken/corrupt copy of test.mkv with 200 corrupt blocks of 512 bytes length spread evenly over the whole file.
Python is not that slow like I thought (and is so much nicer to use ... problem is you dont want to use anything else after using it...) 
It only writes "#" as error bytes no XORing but I think it will do the job.

I will post it after I have put in a better copy routine for huge files (still needs progress indication).

While testing with it I came to think if using single recovery files for single big files is a good idea after all. I am not sure if it will work at all.
The idea was to secure movie files (about 8GB) from HD sector faults with single recovery sets. One set for each file so that I dont have to deal with the hole
collection which is about 1.5 TB. Now will rsc32 be able to read/access the source file when it is corrupted by a Hd-sector error ?
Is redundancy on a file base a good idea after all ? Or do I have to switch to a sector based raid instead (which I do want to avoid cause I had more troubles with
corrupt raid configs than corrupt files ...) 
Logged
persicum
Moderator
Serious Business
*****

Karma: +2/-1
Offline Offline

Posts: 122


« Reply #5 on: 2012 Feb 15, 11:58:55 pm »

RSC32 works with files regardless if they readable or unreadable. –rt and –rrr would work with unreadable files without additional issues.
Besides, you may copy an unreadable file to broken-but-readable file by –ac switch.
The problem is just UNICODE names only =))) which I never use.
Check error level in your script after –wt switch, if the prog caught UNICODE and cannot handle a file??? o:


When people ask what is better – backup, sector RAID of file RAID, the answer should be: they cannot substitute each other, one needs them all!

Say, you already have RAID-array of several HDDs. Why you nevertheless need a file RAID?

1) Hardware RAID can be raid_1, raid_2 and so on, but you cannot have 5000 or 500000 independent disks. In case of RSC32 you have 500000 independent virtual devices which form an incredible raid-array.
2) After downloading you should immunize your content immediately to obtain its digital certificate. When you catch glitches, squares or fallings you may check if the file to be original. Without MD5 or similar your broken file can be easily propagated from on storage media to another. If your backup copy was made from a file which already was broken you may treat it by file-RAID.


Well, I have written you a proggie which can modify files by writing random bursts into them. But this NOT a copier, it is really a patcher!!! =)) Keep your origins in a safe place =))
Usage: FileName BurstNumber BurstLength
Say 1000 512 will write randomly 1000 packets of 512 bytes each of trash

Of course, RSC32 could not repair anything if Number of bursts will be greater than Number of recovery blocks.=))   

* KILLFILE.rar (22.81 KB - downloaded 26 times.)
Logged
patrick
Dreamer
*

Karma: +0/-0
Offline Offline

Posts: 9


« Reply #6 on: 2012 Feb 16, 02:48:02 am »

RSC32 works with files regardless if they readable or unreadable. –rt and –rrr would work with unreadable files without additional issues.

will it just skip the broken file or will it try to get as much of it like possible (in case of HD sector fault) ?

Say, you already have RAID-array of several HDDs. Why you nevertheless need a file RAID?

I dont have a hard raid (yet). I messed around with Freenas but found it to be unflexible. Changing size after setup is a problem, you need many disks etc. Also a NAS is slow and not very cost effectiv (if it runs 24/7 you could buy at least one 3TB drive every year ...) so thats how I came to this forum.   

Well, I have written you a proggie which can modify files by writing random bursts into them. But this NOT a copier, it is really a patcher!!! =)) Keep your origins in a safe place =))
Usage: FileName BurstNumber BurstLength
Say 1000 512 will write randomly 1000 packets of 512 bytes each of trash

Of course, RSC32 could not repair anything if Number of bursts will be greater than Number of recovery blocks.=))   

:) so we have 2 proggies now !  I attached mine. it will copy the file and put bursts in it.

persicum, I learned quite something about redundancy in the last few days and the more I dig into the more I learn about the powers of rsc32. I still have the feeling that I am missing a lot of information though: I am a little embarrassed but is there a manual for rsc32 ? I found the options.txt but no explanation for the options ... I read through the forum (not all of it though and tried
to read into Reed-Solomon codes at Wikipedia but dont understand a thing...). Any chances that there will be help.txt for rsc32 or a sticky or faq ?

* ctest.py (2.08 KB - downloaded 30 times.)
Logged
persicum
Moderator
Serious Business
*****

Karma: +2/-1
Offline Offline

Posts: 122


« Reply #7 on: 2012 Feb 16, 04:48:42 am »

Python prog is a good job! You can write your own tool since you are experiensed in file access. BTW, mine writes not strictly evently but randomly with uniform distribution...

Currently I have no hardware RAID too, I perform backups and 10% recovery just in case =]
Logged
Pages: 1   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.074 seconds with 18 queries.