Using FFmpeg to encode videos for the iPod

In the previous post I went through how to rip a DVD and convert it to watch it on the iPod touch. This time I wanted to convert a simple AVI video to an MP4 file with H.264 encoding, while trying to keep the file as small as possible.

Most of free (and open-source) tools you can find online to convert videos from one format to the other are based on FFmpeg, an open-source command line program that can encode lots of video files to a huge set of other formats.

WinFF logo. However, command-line tools are usually not known for their user-friendliness. Therefore I'd suggest you check out WinFF, a very simple GUI interface to FFmpeg that lets you pick some conversion options from a list of presets (or define your own presets) and batch encode a set of files using them.

My iPod video preset

In order to define your own preset, select "Presets" from the "Edit" menu. You'll see a dialog like this:

The WinFF Edit Presents dialog.

Set a custom name for the preset and then write your own command-line options list for FFmpeg. In my case, I will encode the video using the x264 encoder, ensuring that the output is conform to the H.264 baseline profile (the only H.264 profile the iPod is able to read correctly). Here is the string I used:

-r 29.97 -vcodec libx264 QUALITY_SETTING -flags +loop -cmp +chroma -deblockalpha 0 -deblockbeta 0 -refs 1 -coder 0 -me_method full -me_range 20 -subq 6 -partitions +parti4x4+parti8x8+partp8x8 -g 250 -keyint_min 30 -level 30 -trellis 2 -i_qfactor 0.71 -s SIZE -aspect ASPECT_RATIO -acodec libfaac -ab 56k -ar 48000 -ac 2

Most parameters are not standard FFmpeg parameters (you won't find them in the documentation) but are special parameters that work only with the x264 encoder. There is a website that explains all x264 parameters however. The most important settings are the following:

  • -r <float>: the output video's frame rate.
  • -flags +loop: enabled deblocking.
  • -refs <integer>: the number of previous frames that can be used to do motion estimation. Higher values yield better compression, but the target decoder is usually the limiting factor. In this case the iPod can only handle 1 reference frame.
  • -coder {0|1}: enables or disables "CABAC", the default entropy encoder used by x264. This encoder usually improves compression and quality, but is not allowed on the baseline profile.
  • -me_method {epzs|hex|umh|full}: these are different motion estimation techniques. They are listed in order of increasing complexity and estimation quality. If you want to encode using low bitrates I suggest to use an advanced technique to somewhat limit picture degradation.
  • -subq <integer>: determines the algorithm used for subpixel motion estimation and partitioning of the image. The allowed values span from 1 to 9, where 1 is a very crude and fast algorithm and 9 is the most complex technique. 6 is the default suggested value.
  • -keyint_min <integer>: defines the minimum distance between two I-frames. Higher values reduce the file size, but quality and "seekability" (the ability to quickly seek a position in the video) of the video may suffer.

And what about audio compression? The best option seems to be AAC audio, which gives you acceptable audio quality even with very low bitrates (56 kbps or so).

Details about video bitrate settings

The x264 encoder has lots of confusing video quality settings. First of all, the fastest settings are those that encode in a single pass:

  • Constant bitrate mode (-b <bitrate in kbps>): easiest mode, simply encodes the whole video using a constant bitrate, with no adaptation to scene complexity. Yields the worst results (except for very high bitrates), but allows you to exactly define in advance how big the file will be.
  • Constant quantizer mode (-cqp <quantizer value>): similar to constant bitrate, but uses a constant quantizer (the matrix used to compress the video, that also defines the final output video quality). Should not be used (doesn't even allow you to compute the video size in advance).
  • Constant quality mode (-crf <quality>): the output will be similar to that of a constant quantizer, but this mode will try to adapt the compression rate to reach a certain minimum quality level. This yields the best results in single-pass mode, but does generate files with unpredictable sizes. '1' is the highest quality value, higher values yield smaller videos. I tend to use a value of '31' which generates "good enough" videos for the iPod screen (in my opinion).

If you have time and wish both to have better compression rates and to be able to predict the resulting file size, you can select 2-pass encoding. To do this, you'll have to check the "2-pass" checkbox in WinFF options and use the "constant bitrate mode" option in your preset. The first compression pass will analyze the video, the second one will use the bitrate you assigned and distribute it correctly across the video (depending on scene complexity, scene changes, and so on). This will generate a video with the best possible compression for the given bitrate.

Resized videos (to the 480x320 format of the iPod touch) usually can go as low as 200 kbps (or even lower, depending on the video kind). You'll have to do some testing to find out the compression mode and bitrate that work best for you.