Live Business Chat
2014 Apr 19, 01:00:53 am *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Some accounts got accidentally deleted by anti-spam software during a spammer attack on this forum. Please re-register. If you have trouble, contact badon or tamo42 in the chat. This is a friendly non-profit discussion group about making money. You won't be able to see all forums at first. You have to register to see more forums. Click the "NOTIFY" button every chance you get to receive instant alerts about new information.
 
   Home   Help Search Calendar Login Register  
Pages: 1   Go Down
  Print  
Author Topic: PAR3 move/rename brainstorming  (Read 1481 times)
0 Members and 1 Guest are viewing this topic.
eMPee584
Almost Nobody


Karma: +0/-0
Offline Offline

Gender: Male
Posts: 1



« on: 2011 Sep 06, 08:05:29 am »

Hi there,
supporting file moves/renames together with directory hierarchies is obviously quite an issue, some thoughts on this.

- moved (renamed) files have same CRC - if intact
- if broken, moved files can not be found easily, thus not repaired
- chances are that when moving files with complex filenames (for ex. 'our bands special gig at soandso pub - 2011-09-06 - wicked.flac'), filename can be trusted
- short and less informative filenames (DSC0031.JPG, thumbs.db.....) will clash more often => trusting filename to identify moved files is a problem
- other attributes like size and date could help with file identification
- identifying non-corrupted files can be easy by trying likely PAR entries
- matching corrupted files is the key challenge!

Maybe PAR3 can use a heuristic to identify files?
- for the common use case, folders are integrate entities (you should protect a readily organized folder hierarchy, not a folder you are still busy on restructuring)
- so file system objects have a relation to their parent folder, filename and attributes that can be given a weighted score
- most file corruption is few bad bytes scattered in chunks somewhere in file - not multi-kilobyte starting from beginning of file (after file header)
- Each PAR entry should only match to one file. If there's a near-perfect match and a low-probability match.. try the better one first.
- Corrupted file duplicates could be detected by a high match score to another file (which already has been PAR verified OK)..
- Asking the user for more information might help a lot, like for example:
  "Has the file foo/foo/unsorted/bar (3.5MiB, last changed 23 July 2011) been moved to foo/foo/sorted/bar and then overwritten with a 5KiB on 24 July 2011? [Yes/No/Maybe]"

- So a file that is in the original folder, with original file name and identical size and date would score high on all criteria - perfect match.
- A file that is in a different folder with a different name, but still has some file size and date might be marked as 'highly likely' match.
- A file with same name, but different folder and file size/date is probably a 'no match'
- A file with same name and folder, but different file size and date is most likely a changed file.
- Any file can be tested against best matching PAR blocks - if the file is completely different it is more likely to be a changed file than a case of file corruption.
- If within a folder tree, all files have been matched but one file is missing in folder A and a file of same size has been added to folder B, it might be the file from A.
- Matching a file that has been moved to another point in dir hierarchy, renamed and then overwritten (changed file size and time stamp) is not possible to match in the new location, but can be restored in the original (=at PAR creation time) location.

By the way, MultiPar is listed as open source / GPL on wikipedia, i just downloaded the latest package and there is no source in there. Where's the SOURCE, dude? ;-)
I am on linux and a heavy CLI user willing to help with a linux CLI version.
Logged
badon
Administrator
Capitalist Pig
*****

Karma: +57/-38
Offline Offline

Posts: 8408



« Reply #1 on: 2011 Sep 06, 12:04:03 pm »

I split this EXCELLENT post into its own topic so people can find it easier.
Logged

Do not PM questions. Answers should be publicly available.
Backup is not enough. Protect your data with MultiPar.
Writer of LBC Chinese coin investment articles.
Founder of the Coin Compendium (forum).
I type faster on a TypeMatrix.
Use my work. Give credit.
china-mint.info forum.
LBC makes you rich.
FreeArc is amazing!
Donate.
badon
Administrator
Capitalist Pig
*****

Karma: +57/-38
Offline Offline

Posts: 8408



« Reply #2 on: 2011 Sep 06, 12:10:18 pm »

I agree about MultiPar being opensource. Data integrity software is very important to be open source. Imagine finding a disk 50 years from now, and having no idea how to handle the data? If we have the MultiPar source code, we can be confident that our data will always be safe, even if we have to update the software to recover it.

I suggest MIT or BSD licenses, for maximum market usage of MultiPar. The more people using MultiPar, the more likely it is to be accepted as a standard.

Also, I agree about the command line version. That is important too, to ensure that MultiPar technology is available everywhere, on all systems, even as a component of other software systems (using scripting like sh, PHP, etc). I would love to be able to use MultiPar in PHP to generate par data sets for images, backups, etc.
Logged

Do not PM questions. Answers should be publicly available.
Backup is not enough. Protect your data with MultiPar.
Writer of LBC Chinese coin investment articles.
Founder of the Coin Compendium (forum).
I type faster on a TypeMatrix.
Use my work. Give credit.
china-mint.info forum.
LBC makes you rich.
FreeArc is amazing!
Donate.
persicum
Serious Business
***

Karma: +2/-1
Offline Offline

Posts: 144


« Reply #3 on: 2011 Sep 07, 04:54:28 am »

Quote
Where's the SOURCE, dude?
Yutaka Sawada wants to know who is interested in the source code.
Logged
Yutaka Sawada
Moderator
Mogul
*****

Karma: +9/-0
Offline Offline

Posts: 427


« Reply #4 on: 2011 Sep 12, 06:23:14 pm »

Quote
some thoughts on this.

I am not sure what you want to state. Searching of renamed/moved files is implemented in MultiPar already. For my English skill, it is difficult to understand sentence of non-fixed grammar. You would better to write some story of example cases, or just write "I want that." or "This is not enough." directly.

Quote
Where's the SOURCE, dude?

About source code, you should read my documents like Help or ReadMe in the archive. While you don't read my documents, you say you want to read source code, that is odd attitude. If someone is just a user, using without reading manual is ok. My MultiPar is almost same usage as QuickPar and easy to use. But, as a developper, reading documents is required to understand the mechanism.

Quote
Data integrity software is very important to be open source.

MultiPar is one of some parchive clients. There are other clients like QuickPar, phpar2, par2-tbb and more for many systems. Anyone can use his favorite parchive client. The important matter is the ability of repair in future. While WinZip and SecureZip are not open source, many people uses ZIP archive, because the file format is open public.
Logged
badon
Administrator
Capitalist Pig
*****

Karma: +57/-38
Offline Offline

Posts: 8408



« Reply #5 on: 2011 Sep 13, 09:27:36 pm »

MultiPar is one of some parchive clients. There are other clients like QuickPar, phpar2, par2-tbb and more for many systems. Anyone can use his favorite parchive client. The important matter is the ability of repair in future. While WinZip and SecureZip are not open source, many people uses ZIP archive, because the file format is open public.

7zip is better than all of them, and it is open source Smiley
Logged

Do not PM questions. Answers should be publicly available.
Backup is not enough. Protect your data with MultiPar.
Writer of LBC Chinese coin investment articles.
Founder of the Coin Compendium (forum).
I type faster on a TypeMatrix.
Use my work. Give credit.
china-mint.info forum.
LBC makes you rich.
FreeArc is amazing!
Donate.
badon
Administrator
Capitalist Pig
*****

Karma: +57/-38
Offline Offline

Posts: 8408



« Reply #6 on: 2011 Sep 26, 04:22:03 pm »

Probably the simplest solution to all the issues surrounding file moving and renaming is to use a file browser built into multipar for that purpose. Then, multipar could keep track of all changes as you do them, and update the par files in the process.
Logged

Do not PM questions. Answers should be publicly available.
Backup is not enough. Protect your data with MultiPar.
Writer of LBC Chinese coin investment articles.
Founder of the Coin Compendium (forum).
I type faster on a TypeMatrix.
Use my work. Give credit.
china-mint.info forum.
LBC makes you rich.
FreeArc is amazing!
Donate.
Yutaka Sawada
Moderator
Mogul
*****

Karma: +9/-0
Offline Offline

Posts: 427


« Reply #7 on: 2011 Oct 03, 07:11:46 pm »

from badon
Quote
Probably the simplest solution to all the issues surrounding file moving and renaming is to use a file browser built into multipar for that purpose.

The easiest solution to all the issues surrounding file moving and renaming is to use a specialized application for that purpose. There must be many file searching software on the world. (Windows OS has a built-in seaching feature, too.) There may be an application, which saves meta deta & checksum of files and check the construction of directory-tree. It will be possible to add a support of PAR2 file for those application. Essentially PAR2 file is a mass of checksums (CRC-32, MD5, Reed-Solomon). Only the difference is that, it contains checksums of many flagments and can recovery some of them.

The key is that; while it is difficult to create a whole new thing from nothing, it is easy to improve a existing thing one by one. MultiPar is not a new software, but is based on QuickPar's feature. Even though I selected an easier way, I am still fixing bugs for 5 years, hehe.

You had better to find a good specialized application for seaching at first. Then, I may help its author to adapt the application for PAR2 file support. Because the author must be a specialist of file searching, he would understand your aim/idea/thought/etc, and he will be able to improve his product with parchive. In this way, all of us become happy.
Logged
Pages: 1   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.114 seconds with 20 queries.