Information theory this century has clarified the 19th century work of Gibbs, and has shown that the natural units for temperature kT, defined via 1/T = dS/dE, are energy per nat of information uncertainty. This means that, for any system, the total thermal energy E over kT is the log-log derivative of multiplicity with respect to energy, or (for all b) the number of base-b units of information lost about the state of the system per b-fold increase in the amount of thermal energy therein. For ``un-inverted'' (T>0) systems, E/kT may also be a temperature-averaged heat capacity, for quadratic systems equaling ``degrees-freedom over two''. In natural units, the work-free differential heat capacity C_v/k is a ``local version'' of this log-log derivative, equal to bits of information lost per 2-fold increase in temperature. This makes C_v/k (unlike E/kT) independent of the energy zero, and excellent for detecting both phase changes and quadratic modes. From UMStL-CME-94a09pf.
These simple connections between rates of information loss, and average/instantaneous heat capacity in dimensionless units (per molecule), provide deep insight into the usefulness and meaning of such concepts. However, they are so well hidden by the way we normally teach this subject that the act of drawing out their consequences (e.g. simply by converting ordinary heat capacities into dimensionless form) is likely to yield further insight and simplification. Any volunteers?
Information physics on the web.