Been looking into error detection for software loads. The usual approach is to do a CRC over the data, particularly as a check to see if the software load in Flash is valid before using it.
Not surprisingly, engineers, programmers, and mathematicians had their mutually inconsistent ways with things, so it can take a day or more to come to grips with what's going on. It's really more obfuscated than complicated, and the fact that hex dumps and C strings have a rather jumbled display order just adds to the fun. Surprisingly, the standard CRC-32 polynomial is sub-optimal, it doesn't even flag all odd numbers of errors. Koopman performs exhaustive computational searches for better polynomials and apparently hasn't finished for CRC-32. I find it incredible that the world relies on CRC for so much yet Koopman is doing this in his off time. If anything should get funding it's this kind of stuff.
The clearest explanation I could find was in Hacker's Delight, 2nd edition, though the simplest implementation is not shown in code form there. Warren's hardware view sidesteps all the endian nonsense and language ambiguity, though his diagram is that of a left shifting, non flipped CRC & residue type. For me (granted, a HW engineer) it helped to initially approach CRC implementation as an LFSR-based serial data scrambler, rather than a byte and table (or no table) arrangement. The concept of parallel input such as bytes can then be pulled in later, but all the byte and/or word flipping can be confusing without an understanding of the underlying serial process, which has nothing to do with bytes, just bits and 32 bit values. The byte & table approach is just a bunch of precomputed xoring, and a hardware implementation of the table could be easily replaced with a sea of xor gates, which conveniently factors down to something fairly manageable.
Excel spreadsheet: http://www.mediafire.com/download/yxfyu871wf4yb08/CRC32_2015-11-20.xls
I wrote a Hive subroutine today that does one round on 32 bit input data. 5 cycles through the loop with one loop per bit. I'll probably use this with the SPI Flash device that will be holding the software load and presets.