Live Business Chat
2015 Sep 06, 01:57:17 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Several registration problems were fixed on 2015-03-17, including one potential account security issue. If you had difficulty registering before that date, please try again, and contact badon in the business chat if it doesn't work for you. Hotmail is having problems, so try another email service if you never receive any activation or notification emails from the forum. If you succeeded in registering before 2015-03-17, it is recommended that you login and change your password immediately on your profile page using the Account Related Settings link.
 
   Home   Help Search Calendar Login Register  
Pages: 1 2 [3]  All   Go Down
  Print  
Author Topic: Finally writing down Design and Engineering Notes  (Read 321 times)
0 Members and 2 Guests are viewing this topic.
JDługosz
Entrepreneur
**
Offline Offline

Posts: 65


« Reply #20 on: 2015 Sep 02, 03:53:55 pm »

MSDN documentation:  As I recall the VS2010 installer for docs was strange, and there are steps to get it to install locally.  I gave up on their online help and find Google easier now, since the later versions of MSDN help have become harder to use.   (Note that Google gives better results than Bing or the on-page search box!)

Installing Win32 API documentation is your critical need; it does not need to match some version of Visual Studio.

re trying MinGW: I'm excited about the development of "Clang".  I don't know the status of a complete Win32 development system, but I think it has a component to drop-in the compiler into Visual Studio.  It can be used with the GCC toolchain and libraries.  But, the new C++ Library made for it is also especially interesting.

Logged
JDługosz
Entrepreneur
**
Offline Offline

Posts: 65


« Reply #21 on: 2015 Sep 02, 04:32:47 pm »

There are some FFT possible primes for 32-bit prime based Reed-Solomon Codes. (Fermat prime is the best for 16-bit prime based Codes.) Form of "constant * (2 ** n) + 1" seems to be good for FFT. (in this text, ** means power.) For example, Fermat prime 32769 = 32768 + 1 = 1 * (2 ** 16) + 1. I tested a maybe biggest 32-bit prime 4293918721 = 4095 * (2 ** 20) + 1 ago, which support max 1048576 blocks (2 ** 20).

 From reading my old (6 years ago !) note, there are some smaller primes for more blocks. A 32-bit prime 2151677953 = 513 * (2 ** 22) + 1 supports max 4194304 blocks (2 ** 22). A 32-bit prime 3221225473 = 3 * (2 ** 30) + 1 supports max 1073741824 blocks (2 ** 30). I'm not sure there are other possible primes. persicum might select something good prime in his RSC32.

A Galois Field can use a prime or a power of a prime.  2**n is handy (2 is prime!) because then a word of memory is a field element.  If I understand you correctly, using a prime number that fits in 32 bits, such as 4293918721 in the same manner has a problem: A 32-bit word may contain values in the range 4293918722 through 4294967295 are not in the field.
Logged
Yutaka Sawada
Moderator
Capitalist Pig
*****
Offline Offline

Posts: 670


« Reply #22 on: 2015 Sep 03, 10:05:54 pm »

Quote from: JDlugosz
I'll read this thoroughly and document what has been learned.

 When I tested my old samples on my new PC, I found that I had mis-understood algorithm's complexity and I might write wrong information somewhere. I correct my statement and put result for others who think about ECC algorithm.

 When data size is same, Encoding/Decoding time of PAR2 depends on the block size and number of blocks. The complexity is "block size" * "source block count" * "parity (or lost) block count". So, if you set two times larger block size, it requires double more encoding time. If you set half block size, the speed becomes double faster. I mis-understood the calculation cost.

 Now, using many "Cohort" improve speed by the number of cohorts. If there are 100 cohorts, the speed will become 100 times faster. Even though the recovering capability may be less, the calculation speed would be a merit in some cases. So, the "Cohort" idea of JDlugosz is reasonable for PAR2 style algorithm.


 I tested 4 ECC algorithms. James Plank style Reed-Solomon Erasure Codes is the algorithm in PAR2. Because I could not test SSE2 on my old PC, I compiled FWT version on new PC. While I cannot understand the FWT so much, I didn't optimize the code. (It should be double faster with AVX.) FFT and LDPC were compiled on my old PC, and don't use SSE. I didn't test 32-bit RS with FFT, as mine is much slower than the one in RSC32. (the speed of RSC32 would be similar level as 16-bit RS with FFT.)

Code:
Calculation time with single core of Celeron G1620 2.7 GHz

Data size = 468.75 MB
 = 16384 * 30000
 = 49152 * 10000
 = 163840 * 3000
 = 491520 * 1000
 = 1638400 * 300
 = 4915200 * 100

Redundancy = 20%

James Plank style 16-bit Reed-Solomon Erasure Codes (used in PAR2)
Data   |4.69MB * 100|1.56MB * 300|480KB * 1000|160KB * 3000|48KB * 10000|16KB * 30000|
no-SIMD|    4.9 sec |   15.1 sec |   50.5 sec |  152.1 sec |  511.1 sec | 1559.2 sec |
MMX    |    4.5 sec |   13.7 sec |   45.9 sec |  138.2 sec |  464.9 sec | 1424.7 sec |
SSSE3  |    2.7 sec |    8.3 sec |   27.6 sec |   81.4 sec |  265.6 sec |  803.5 sec |

Frederic Didier's 16-bit Reed-Solomon Erasure Codes with FWT
Data   |4.69MB * 100|1.56MB * 300|480KB * 1000|160KB * 3000|48KB * 10000|16KB * 30000|
no-SIMD|  170.4 sec |  242.9 sec |  312.8 sec |  222.5 sec |  283.4 sec |  397.9 sec |
SSE2   |   24.9 sec |   35.7 sec |   47.9 sec |   40.1 sec |   51.4 sec |   70.4 sec |

Michael Niedermayer style 16-bit Reed-Solomon Codes with FFT (Error Correction requires double or more time.)
Data   |4.69MB * 100|1.56MB * 300|480KB * 1000|160KB * 3000|48KB * 10000|16KB * 30000|
no-SIMD|    8.7 sec |   15.0 sec |   21.0 sec |   20.2 sec |   23.2 sec |   36.4 sec |

LDPC-Staircase style LDGM (MMX, Require 24% redundancy to recover 20% lost.)
Data   |4.69MB * 100|1.56MB * 300|480KB * 1000|160KB * 3000|48KB * 10000|16KB * 30000|
Encode |    0.4 sec |    0.4 sec |    0.4 sec |    0.3 sec |    0.4 sec |    0.4 sec |
Decode |    0.4 sec |    0.5 sec |    0.5 sec |    0.7 sec |    1.7 sec |    3.9 sec |

 I confirmed the speed merit of "Cohort" system for James Plank style Reed-Solomon Erasure Codes. Putting 3000 blocks in each cohort (total 10 cohorts) is 10 times faster than putting 30000 blocks in single cohort. Putting 300 blocks in each cohort (total 100 cohorts) is 100 times faster than putting 30000 blocks in single cohort. On the other hand, the minimun recoverying capability is 10 (or 100) times less.

 But, "Cohort" system doesn't work well for other faster algorithms. Though the speed of FFT or FWT depends on the number of blocks, the difference is small. Using 10 or 100 cohorts will improve speed by only 2 times. The effect of "Cohort" becomes bad trade off, when the algorithm has new technology. Then, Mr. persicum and Michael Niedermayer might state that interleaving (Cohort) was worthless.

 By the way, the speed of LDPC on new PC is unbelievable quick (almost instant). It seems that LDPC's speed depends on memory speed largely. Basically the LDPC algorithm is similar to "Cohort" idea. In the sample implementation, there are 4 nodes per each source block. 24% redundancy for 30000 source blocks gives 7200 parity blocks (Cohorts), and each parity block has 30000 * 4 / 7200 = 17 nodes (Stripes). The recoverying capability of LDPC depends on the Generator Matrix and possible damage model.

 I'm interested in new Reed-Solomon Codes by Sian-Jheng Lin. If the algorithm perfomes real speed as written in his papar, it will become the fastest (long code size) Reed-Solomon Codes on the world. But, I didn't test it yet, as I could not implement, hehe. I should ask him to give me something sample code...
Logged
Bulat Ziganshin
Almost Nobody

Offline Offline

Posts: 4


« Reply #23 on: 2015 Sep 04, 05:42:31 am »

A Galois Field can use a prime or a power of a prime.  2**n is handy (2 is prime!) because then a word of memory is a field element.  If I understand you correctly, using a prime number that fits in 32 bits, such as 4293918721 in the same manner has a problem: A 32-bit word may contain values in the range 4293918722 through 4294967295 are not in the field.

yes, with non-2**n field you have to recode source data to smaller base or ecc data from larger base. but there are smarter algos to do it - if prime number P is very close to 2**n, then you have very small probability that some n-bit number number is larger than P. so you can just reserve a few more bytes to store words larger than P. it seems the technique used by RSC32

FFT cannot be performed in GF(2**n) since it requires that the field have N-th primitive root of 1. i don't remember details now, but you may to start to look from https://en.wikipedia.org/wiki/Discrete_Fourier_transform_(general)#Finite_fields

PS: do you know about http://www.ffmpeg.org/~michael/git/noe/ and https://github.com/catid/wirehair/ ?
Logged
Bulat Ziganshin
Almost Nobody

Offline Offline

Posts: 4


« Reply #24 on: 2015 Sep 04, 08:39:48 am »

I'm interested in new Reed-Solomon Codes by Sian-Jheng Lin. If the algorithm perfomes real speed as written in his papar, it will become the fastest (long code size) Reed-Solomon Codes on the world. But, I didn't test it yet, as I could not implement, hehe. I should ask him to give me something sample code...

if you mean http://arxiv.org/abs/1503.05761 , its abstract says:

Quote
This paper develops an error correction decoding algorithm for (n = 2m, k) RS codes over GF(2**m), for k/n ≥ 1/2 and (n−k) a power of two.

This means that only encodings like 64+32, 64+48, 64+56... available, so the minimum size of ECC file would be 50% of source data that is rather serious limit for general-purpose ECC utility
Logged
Nyan
Dreamer
*
Offline Offline

Posts: 5


« Reply #25 on: 2015 Sep 04, 11:13:49 pm »

I downloaded Windows SDK for Windows 7. Because I don't use later Windows 8/8.1/10, current one is enough for me. I don't know why it's impossible to download documents of Visual Studio 2010 as DVD format.
I'm not too sure what you mean, but if you're having trouble downloading it, you can try the direct link:
http://download.microsoft.com/download/8/9/2/8928585D-136D-4528-AECC-2F211902A8D7/VS2013Documentation.iso (4GB)

After you have the ISO, you can mount/extract it and point the help viewer in Visual Studio to install from it. It should work even if your version of Visual Studio is not 2013.
It seems to work fine for me, but I can't find the Windows API reference in there.

I believe you can have multiple versions of Visual Studio installed at the same time though.
But if you don't want to change from VS2008 then I don't want to push you to Smiley

I implemeted SSSE3 on Pentium 3 (even SSE2 was un-available, hehe). I used yasm/masm to treat new CPU commands in independent ASM file ago, but it's difficult to write 64-bit ASM code nowadays.
Wow, that must have been difficult to test!

If Visual Studio 2008 becomes obsolate in future, I may use MinGW. (GCC with Windows API)
MSVC has some of its own oddities, so you may have to change a bit of code for it to compile under GCC/Clang. This shouldn't be too hard for command line programs, but I don't know about your GUI.
Logged
Yutaka Sawada
Moderator
Capitalist Pig
*****
Offline Offline

Posts: 670


« Reply #26 on: Yesterday at 12:34:06 am »

Quote from: JDlugosz
using a prime number that fits in 32 bits, such as 4293918721 in the same manner has a problem: A 32-bit word may contain values in the range 4293918722 through 4294967295 are not in the field.

 Your understanding is good. I did the same question ago, and I got good answer. (16-bit method from Michael Niedermayer, and 32-bit method from persicum.) Just replace the over value with non-used value.

 Finite field based on a prime 4293918721 (= 0xFFF00001) can contain values from 0 to 4293918720. (it's same as reminder from modulo by the prime). So, values from 0xFFF00001 to 0xFFFFFFFF are over the field. The over's range is 1/4096 of total range. Then, there must be non-used range between the other 4095/4096 range, when the number of values is equal or less than 4095. After finding the non-used range, you just move the non-used range to the over range. Addition or XOR is available to move all values to fit the finite field.

 For example, there are 4 values; 0x12300000 (ok), 0x45600000 (ok), 0xFFF00001 (over), and 0xFFFFFFFF (over). After you add 0x00100000 to each values, resulting values become all ok like; 0x12400000, 0x45700000, 0x00000001, and 0x000FFFFF. You can return them by subtract anytime.

 Because of this replacing method, prime based Reed-Solomon Codes gives a little larger recovery data. In above example, it requires one 12-bit value per 4095 32-bit values. In 16-bit Fermat Prime's finite field, it requires one 16-bit value per 65535 16-bit values. Anyway the calculation cost of replace is small, as same as calculating checksum for each block.
Logged
Yutaka Sawada
Moderator
Capitalist Pig
*****
Offline Offline

Posts: 670


« Reply #27 on: Today at 12:10:06 am »

Quote from: Bulat Ziganshin
PS: do you know about http://www.ffmpeg.org/~michael/git/noe/ and https://github.com/catid/wirehair/ ?

 Thank you for information. That author of ffmpeg is the Michael Niedermayer, who gave us new technology of FFT for Reed-Solomon Codes. Though RSC32 doesn't use his libnoe, the idea was based on Michael's work. Because I have no skill to write my own FFT code, I just use his library (with some optimization) in my sample applications, hehe. The "wirehair" 's Rateless Codes is new for me. I will check later.


Quote
the minimum size of ECC file would be 50% of source data that is rather serious limit for general-purpose ECC utility

 There will be no problem in practical application. That limit can be hided from users. While the same theoritical limit exists in FFT (or FWT maybe), it can be avoided by filling sapce by zeros or ignoring non-required values. For example; (n, k) = (36000, 30000) for users = (65536, 57344) internally. In the case, it adds 27344 empty blocks to source blocks, and ignores 2192 meaningless parity blocks. (so 30000 source blocks and 50000 source blocks gives same speed !) Even though FFT based RS codes wastes machine power by those extra blocks, it's much faster than PAR2 style RS codes.


Quote from: Nyan
I can't find the Windows API reference in there.

 Windows SDK (library for Windows API) is different product from Visual Studio. While some expensive Visual Studio versions may contain Windows SDK, free/sample versions don't include it. As JDlugosz wrote above, a version of SDK support some versions of Visual Studio. I installed Windows SDK for Windows 7, which supports VS2005 and VS2008. I don't like Windows 8 or 10 's flat design. I do programming on PC with keyboard and mouse, not on smart-phone or tablet...
Logged
Pages: 1 2 [3]  All   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.024 seconds with 19 queries.