Really good TIFF compression?
I have about half a gig of large compressed TIFF files we'd like to load onto an SD card and view inside a Pocket PC. Good in theory, but the files are so large when uncompressed we can't reliably open them.
An average file is about 10000x7000 pixels, about 8 or 9 MB uncompressed, and somehow they've been compressed to about 300 or 400 KB each.
What I need to do is shrink the file resolution (half should do, to 5000x3500) and then put the smaller files in the Pocket PC. Except even if I shrink the files to 2500x1750, I can't find a way to save them such that the file size is less than about 500 KB. 500 KB is too big - I can no longer fit everything on a single SD card. I've tried GIF, PNG and all the TIFF variants which IrfanView supports. JPEG isn't acceptable; they're line drawings and detail is vital.
Does anyone know of a really good lossless compression algorithm for TIFF files, and a program which implements it? A suitable algorithm obviously exists and was used to create the original files, but I have no idea what it is.
Image resize / LZW compression / GZip / Vectorization
I am a software developer (and slightly knowledgeable in math).
In most of these compression algorithms / schemes for lossless compression there is a parameter for dictionary size limit. Also there are two or three choices for the data structure tree optimizations that can be done to organize / optimize compression for things like palettes. It also may be possible to perform some kind of multi pass version of LZW in order better to know ahead of time the optimal content of the compression dictionaries, rather than the usual single pass version? In this way the compression dictionary is provided first so that the coded file can be expanded properly.
In the very "advanced user"-level interfaces, the user could be offered a choice of what all these parameters are tuned to be. For example, huge dictionary size slows speed, but compresses more. I believe that "Octree" variety of palette type choice often optimizes the colors better / closer to the actual 24-bit true-color and produces a smaller palette, but is a more computationally intensive algorithm that will also slow your compression processing down a bit.
Reference: In Paint Shop Pro 7 with an 8-bit image loaded, select Save As, GIF file type, Select Options button, Select GIF 87, Select Run Optimizer button, Select Colors Tab, select radio button for Octree Palette or Median Cut or Standard Web. Google search Octree palette image OR graphics
The open source GZip algorithm last I checked in 1998 offers, for example, integer settings from 0 to 9 for the size of the dictionary used during compression. These settings correspond to 2 raised to the power of x for the size of the dictionaries. With "modern day" computers eventually having 8-dual core (16-way) processors and 64 Gigabytes of RAM and 64 Terabytes of disk or more on-line, the dictionary size limit could be increased dramatically. The range of values could easily go much higher than 9. Google search man-page gzip compression-level
If Adobe or Irfan View software designers (their managers) chose not to expose in their user interfaces all of these parameter options so as to simplify the UI and make their software easier to use, then you must live with that design choice (blame those responsible for the design choices).
The same thing goes for the lossy JPEG file type. I believe that there are matrices for quantization. These matrices can be set as the standard ones, or they can be optimized for the given image (on a per image basis). The DCT moire pattern noise and compression level for the same sized JPEG file can be better or worse depending on the methods used by the software vendors.
Google search moire
Google search DCT jpeg
Google Search quantization-matrices jpeg optimization wiki
JBIG2 is a more recent image compression development. Instead of having a dictionary entry for stripes of pixels for compression, I believe that it uses 2-dimensional glyphs / symbols for compression. In this way monochrome images like facsimile or images of text or line art can be extremely compressed without losses. Google search JBIG2 for details.
Now for the image re-dimensioning issue...
If you are going to make the images smaller, the more narrow line thickness may have an impact on resizing the images.
If a simplistic / dumb algorithm for resize is used like a 1/N decimation (sampling every N-pixels, where N is an integer) then you could lose lines altogether, or make the lines nearly invisible. A smart resampling uses pixel color interpolation / weighting to determine the output image's pixel colors. Interpolation can be done using convolution (some FFTs and a filter kernel), or using bilinear, or bicubic spline methods. This is done because you want to sample pixels from image co-ordinates that are somewhat fractional in value like 75% (3/4) between some of the original image pixels, and even if you were doing integer co-ordinates, you do not want to do decimation resampling that might skip over smaller details that you don't want left out of the final smaller image. Google search decimation resampling
Even if you use a good smart resize (one where you gradually reduce image size to 80% of original, then that new image by 80% of previous size, and so on) the line thickness may also shrink to nearly invisible.
Because of this you will probably have to
a) convert to 24-bit color, thereby increasing the image pixel's dynamic range, then
b) smart resize by 80% or so,
c) perform an erode filter to brighten / thicken the now thinner and lighter colored lines (because of the dynamic range increased to 24-bit, you don't lose the information as much as you would at lower 8-bit range),
d) repeat steps b and c until the desired size is achieved, then
e) convert back to a lower palette size such as 8-bit, 6-bit, 5-bit, 4-bit, 3-bit, 1-bit.
Note that you may even have to apply c - the erode filter twice or more to brighten / thicken the 24-bit color lines before resampling again (depending on the graphics applications erode filter strength)--apply to suit the image.
This process of lowering the resolution, while maintaining even the thin lines might serve to really badly thicken the already thick lines, messing up the image a bit--so some experimentation may be needed for suitable processing on a per image basis.
Google Search erode filter
Google search smart-resample image-size
Of course, I could be remembering this resize processing wrongly. Someone please try it to see.
In theory, another way would involve a vector graphics conversion algorithm from within a more advanced Drawing application like Corel Draw!, or Adobe Illustrator, or Xara. Once vectorized, page resizing should work fine. The drawing is then exported with suitable dpi and pixel dimensions to suit the PDA application limitations.
Quote:
Originally Posted by Tuttle
I have about half a gig of large compressed TIFF files we'd like to load onto an SD card and view inside a Pocket PC. Good in theory, but the files are so large when uncompressed we can't reliably open them.
An average file is about 10000x7000 pixels, about 8 or 9 MB uncompressed, and somehow they've been compressed to about 300 or 400 KB each.
What I need to do is shrink the file resolution (half should do, to 5000x3500) and then put the smaller files in the Pocket PC. Except even if I shrink the files to 2500x1750, I can't find a way to save them such that the file size is less than about 500 KB. 500 KB is too big - I can no longer fit everything on a single SD card. I've tried GIF, PNG and all the TIFF variants which IrfanView supports. JPEG isn't acceptable; they're line drawings and detail is vital.
Does anyone know of a really good lossless compression algorithm for TIFF files, and a program which implements it? A suitable algorithm obviously exists and was used to create the original files, but I have no idea what it is.
Packbits / RLE is lossless. & Huffman encoding too.
I think that have also seen something named "packbits" and perhaps even Huffman encoding, and there are several CCITT Group-X Facsimile compressions too? Reference: Jasc Paint Shop Pro 7, File-SaveAs, choose TIF, click options button, Select RLE / Packbits or CCITT Group 3 Fax, Huffman Encoding, LZW compression, or Uncompressed.
Google search Packbits+CCITT+3+Fax+Huffman+Encoding+LZW+compression+Uncompressed This search turns up a link to a LEAD Tools imaging tools / software development vendor page, among other things. Keywords that turn up seem to say that Group 4 Fax and Arithmetic encoding is also possible? I remember reading that arithmetic encoding is pretty good because it completely analyses the entire file's bit strings and produces an optimal dictionary on a per file basis. They don't say if this compression format is applied to any one specific format such as TIF though.
Pdf file fullext.pdf file contains the following entry:
TIFF Tagged-Image File. The standard, originally developed
by Microsoft and Aldus, is so flexible there is an infinite
number of ways to store images. (Raster Format)Below
are the main common formats:
TIFF No Compression (1, 4, 8, 24 Bits)
TIFF Huffman (1 Bit)
TIFF Pack Bit (1, 4, 8, 24 Bits)
TIFF LZW compression (1,4,8,24 Bits)
TIFF Fax Group 3 RLE (1 Bit)
TIFF Fax Group 4 RLE (1 Bit)
CCITT RLE
Multi-Image
Tiled
Motorola (MM)
JPEG
ZIP
CMYK
--------------
Certainly of these Packbits / RLE is lossless. It is used in palletized BMP files that are lossless. RLE stands for run length encoding. Arithmetic encoding is also lossless. Zip is lossless too.
Huffman encoding is also lossless. Huffman was a math guy. He found that if a binary tree was made of the dictionary of bit patterns found in a binary stream / file, that an optimal variable length code could be used to access this dictionary tree. These tree traversal codes just mean 0=left sub-tree/node, and 1=right sub-tree/node. When you reach a leaf node, you have the dictionary node with the expansion bit pattern stored. The most frequently used bit pattern is then placed in the tree at the shortest traversal pattern for access, resulting in a very short code to replace a highly repeating pattern. Similarly for the second-most repeated bit pattern stored to the next closest binary tree node. In data compression, Huffman is sometimes applied after using RLE to super compress the data.
Some of the other compressions in the above list are not, like JPEG that uses DCT and quantization matrices.
Quote:
Originally Posted by falcon2000
So they are 1-bit files ... now it makes perfect sense.
In which case don't even bother with Lossless JPEG b/c it requires at least a 8-bit file.
Sorry, LZW is the only lossless compression you can save TIFF as ... that I know of.
If I find out more I'll let you know.