Interesting properties of the entropy function
Update 2017-07-03: Corrected equation for associative definition, thank you /u/Syrak.
This may not be the first time someone recognized this, but I have recently discovered some interesting and useful properties of the entropy function and now share them.
First a definition:
Entropy — H(p1, p2, ..., pn) — is a function that quantifies surprise in selecting an object from a set where the probability of selecting each object is given: {p1, p2, ..., pn}. This has utility in communications, information theory and other fields of math.
Other definitions of H() use expected values and random variables. As an analog to the definition above, I will discuss entropy of a set of frequencies. pi = fi / Σ fi.
Entropy defined without bits:
A definition that doesn't use bits is:
A recursive (associative) definition of entropy:
The entropy function is associative. Or maybe it should be said that the entropy function can be applied recursively. Consider the function as applying to frequencies (rather than probabilities).
If we have the frequencies f = {17, 24, 16, 0, 9}, we can find the entropy H(f) if we have a bifurcation of the set fa + fb = f, entropy of the bifurcations H(fa), H(fb), and the total of each set's frequency, Σ fa, Σ fb.
One case of the above definition is when fb contains a single element, i. (We use t = Σ fa hereafter.)
And we can use this without bits as well
An iterative definition of entropy:
Entropy can be defined directly from the symbol frequencies fi, and not the probabilities pi. This is useful in computations where the entire symbol set is not know in advance and you want to avoid two passes (one to find the total, one to calculate), especially in parallel processing; or where arithmetic is expensive and precalculation will be used.
This may not be the first time someone recognized this, but I have recently discovered some interesting and useful properties of the entropy function and now share them.
First a definition:
Entropy — H(p1, p2, ..., pn) — is a function that quantifies surprise in selecting an object from a set where the probability of selecting each object is given: {p1, p2, ..., pn}. This has utility in communications, information theory and other fields of math.
Hb(p1, p2, ..., pn) = Σ(i..n)-pilogb(pi)where b is normally 2, to express entropy in bits.
Other definitions of H() use expected values and random variables. As an analog to the definition above, I will discuss entropy of a set of frequencies. pi = fi / Σ fi.
Entropy defined without bits:
A definition that doesn't use bits is:
H(p1, p2, ..., pn) = Π(i..n) pi-piAnd you can convert between the definitions with Hb = logb H. H() is also called perplexity.
A recursive (associative) definition of entropy:
The entropy function is associative. Or maybe it should be said that the entropy function can be applied recursively. Consider the function as applying to frequencies (rather than probabilities).
If we have the frequencies f = {17, 24, 16, 0, 9}, we can find the entropy H(f) if we have a bifurcation of the set fa + fb = f, entropy of the bifurcations H(fa), H(fb), and the total of each set's frequency, Σ fa, Σ fb.
Hb(fa + fb) = Hb(fa) Σ fa / (Σ fa + Σ fb) + Hb(fb) Σ fb / (Σ fa + Σ fb) + Hb({Σ fa, Σ fb})
One case of the above definition is when fb contains a single element, i. (We use t = Σ fa hereafter.)
Hb(fa + i) = Hb(fa) t/(t+i) + t/(t+i) logb[(t+i)/t] + i/(t+i) logb[(t+i)/i]
H(fa + i) = H(fa)t/(t+i) * (t+i)/ii/(t+i) * (t+i)/tt/(t+i)
Entropy can be defined directly from the symbol frequencies fi, and not the probabilities pi. This is useful in computations where the entire symbol set is not know in advance and you want to avoid two passes (one to find the total, one to calculate), especially in parallel processing; or where arithmetic is expensive and precalculation will be used.
Hb(f1, f2, ..., fn) = logb(t) - 1/t Σ(i..n) fi logb(fi).And, for completeness, the iterative definition of entropy without bits:
H(f1, f2, ..., fn) = t (Π(i..n) fifi)-1/t
Comments