An analysis of disk space usage of media disc images when compressed using 7zip. It was determined that it is probably cost-effective to compress such images, though this does require sacrificing convenience.
When ripping and storing media discs such as DVDs and Blu-rays, it is probably worthwhile to use 7zip to compress them and archive them, though this does depend somewhat on the CPU used and energy costs of the area. The energy cost associated with running 7zip was less than the value of the disk space saved between the input files and the compressed output files. Data discussed in this write-up comes from a selection of 325 media disc images saved as ISO files occupying 7.6 TB uncompressed.
Imaged disc types include DVDs containing MPEG-2 media, Blu-rays containing predominantly h.264 media in addition to some MPEG-2 and VC-1 media, and UHD Blu-rays containing HEVC media. Somewhat superior compression ratios were achieved while compressing DVD images compared to Blu-ray and UHD Blu-ray, but no particular disc type or codec was found to result in compression ratios significantly different than the average of all disc images. This is likely due to the ISO file format itself, since the compression ratios of the images are noticeably better than the compression ratios of extracted video files stored in a Matroska container format.
7zip was used with the following settings:
Other settings were not relevant. Number of CPU threads varied depending on system resources available, but it has been found that there is no meaningful difference between using a single thread and throwing all available hardware threads at a job.
The system used to compress these files features an Intel Core i9-10850K with stock voltage and clock settings, and I believe the default voltage settings of the ASRock Z490 Phantom Gaming-ITX/TB3 motherboard follow Intel's specifications properly. However, the short-duration power limit has been set to 300 W and the long-duration power limit has been set to 200 W.
For storage calculations, five hard drives were purchased from ServerPartDeals at a cost of $9.71 per terabyte. They were installed in a server running Unraid, which introduces filesystem overhead, and configured as three data disks and two parity disks for an effective cost of $16.48 per terabyte.
The average compression ratio of an ISO file was 92.9% with a standard deviation of 4.1%. This is the arithmetic mean and arithmetic standard deviation. The geometric equivalents are probably more appropriate here, but I could not be bothered to figure out how to calculate the geometric standard deviation (an online calculator gave me infinity and a formula in LibreOffice Calc gave me 0 so clearly some kind of IEEE 754 shenanigans are occurring) so get off my back. The geomean was off from the arithmetic mean by 0.1 percentage points it's FINE. Anyway. This means that the average file occupies 7.1% less disk space and thus costs 7.1% less to store.
A chart of compression ratios is presented below. Error bars represent one standard deviation. Statistics were calculated for specific codecs and media in addition to the overall compression ratios, but no significant differences were observed compared to the overall average.
Power costs were estimated by assuming 12 cents per kilowatt-hour and a 250 W average load increase measured from the wall compared to an idle system. This average load is consistent with the CPU's power target, increased load on other components including hard drives and memory, and inefficiency from the power supply at about half load. This translates to a cost of 3 cents per hour to run 7zip.
The CPU was observed to compress ISO files at a rate of approximately 25 MByte/s, which converts to 90 GByte/hour. Since power was determined to cost 3 cents per hour, this means 7zip costs 3 cents per 90 GB to run or 33 cents per terabyte. As mentioned in the previous section, individual hard drives were purchased for $9.71 per terabyte and, once configured, usable hard drive space was worth $16.48 per terabyte. Thus, a compression ratio of at least 96.7% was required for saved hard drive space if configured with no redundancy to be worth more than power, and a compression ratio of at least 98.0% was required for the hard drives as configured.
A chart of the power costs per terabyte saved is presented below. All values to the left of the dashed line are less than the cost of raw storage purchased for the server; the cost of configured storage is also presented for reference.
Compressing any combination of disc type and codec saved enough disk space to justify continuing this practice in the future. While the cutoff of 96.7% for raw storage is within one standard deviation (which, remember, is the somewhat-inappropriate arithmetic standard deviation), only 13 disc images out of 325 had compression ratios worse than this. Meanwhile, 0 disc images out of 325 were equal to or greater than the cutoff of 98.0% for the array as configured.
Note that the CPU used here is quite inefficient by today's standards. A Ryzen 9 9950X, for example, is approximately 150% faster in 7zip than the i9-10850K while consuming similar amounts of power. Using this CPU would reduce the cost of compression by about 60%, which would allow compression ratios so bad that it may even be cost-effective to compress h.264 or HEVC video files flat out.
Meanwhile, cheaper disk space is not really available. Particularly good sales on HDDs are generally no less than $9 at this point. The cheapest HDDs I've ever seen were 8 TB external Seagate HDDs on clearance at Costco for $6.63 per terabyte with sales tax. This is still slightly more expensive than the value of disk space saved by compressing UHD Blu-ray ISOs ($6.40), which were the least compressible images. Any backups, parity, redundancy, etc. will increase the effective cost far beyond $6.63 per terabyte.
The main downside to this process is time. At 25 MByte/s, it took over three days of CPU time to compress the files in question and around two weeks of queuing jobs and managing files (note that a signficant portion of this time was spent converting Blu-ray folder structures to ISO images and tagging media). In addition, the ISO files are not available on demand and must be decompressed before use. Even if files are stored at the edge of an HDD platter and the CPU, network, etc. are not bottlenecks, it still takes nearly three minutes to access a Blu-ray image. Even DVDs take around thirty seconds.
The other reason to consider not doing this is simply the lack of granularity in hard drive capacities. Except for the smallest capacities, HDDs are only available in even numbers of terabytes, so an HDD contains at least 2 TB additional capacity compared to the next-smallest size. A RAID 5 or Z1 array with the minimum three HDDs increases in steps of 4 TB of usable space, as do RAID 6 or Z2 arrays with the minimum four. Unraid is designed to allow the addition of individual HDDs to an existing array, and it is generally best practice to keep sizes consistent, so an Unraid array increases in steps equal to the HDD chosen. The capacity of my server, for example, increases in steps of 18 TB if I follow best practices and only use HDDs of the same capacity.
Although compressing the files discussed here did save about 500 GB, the storage saved was much less than one tier of HDD capacity, i.e., 2 TB. More than half of my media disc archive is omitted from the data discussed here, and it is predominantly sourced from DVDs (i.e., the format with the lowest average compression ratios), but napkin math suggests that a full terabyte of space savings from compressing this portion of the library is very optimistic.
For the time being, I intend to continue compressing ISO images. It is worthwhile for now because my new server currently has 6.6 TB of free space after migrating from the old server, and without file compression, it would probably be sitting between 5.0 and 5.5 TB. Eventually, I will need to expand the array, and when I do, I may decide it is no longer worth the time or—literally—energy. If a new hard drive is dedicated solely to my media library and archive, then I estimate that file compression on an 18 TB drive will allow me to store approximately 250 Blu-ray images and their extracted media files rather than 240.
4% for free isn't bad, but it's not compelling either. That said, this job becomes much easier in the future because I will not have to juggle dozens of terabytes of files on a scratch disk (55 TB of total writes according to S.M.A.R.T. data!) with 1 TB free.