Graham Purnell

Graham Purnell

Last updated on 3 May 2019

Graham Purnell is a former photographer, web content and social media professional who began working as a Digital Preservation Assistant with the National Library of Scotland in early January 2019


The purpose of these tests: to use FFMPEG to transcode an uncompressed AVI file to FFV1, with identical ‘pixel-for-pixel’ output to the original source video, and to verify the results.

I’ve read a lot of information on the internet about the benefits of using FFV1/MKV as an archive video format/wrapper, but little empirical evidence about why it is suitable. It doesn’t help that, in the minds of many people, the word ‘compression’ is synonymous with ‘lossy’ (“if file sizes are smaller, surely some data must be missing.”) Archivists and digital preservation practitioners avoid much of the technical video information when writing, and video specialists address the subject in terms beyond the scope of many archivists. Thankfully, I have some previous video production experience and I hope I can help make some sense of it all.

Data doesn’t necessarily need to be removed from a video file to compress it. Much of an uncompressed file’s data stream may be zero padding, redundant information that can be removed because it holds no actual information. FFV1 also uses an efficient entropy coding system that decodes to the same raw video output as the original source; essentially, it’s a more efficient way of coding the same information. The codec name (coder/decoder ) is a good one, because what codecs do is synonymous with coding and decoding a cipher; different ciphers exist to encrypt information, some longer and more complex than others, but what really matters is that no information is lost on deciphering.  Please don’t ask me about FFV1’s entropy encoding of image information, it’s way beyond my ken, but it is in the FFV1 file specification, for those that are interested (https://www.ffmpeg.org/~michael/ffv1.html), and a concise explanation is also given on the FFV1 Wikipedia page (https://en.wikipedia.org/wiki/FFV1)

But how do we prove FFV1 is lossless? Conveniently, FFMPEG has a framemd5 function that allows us to run MD5 fixity checks on every single frame of a video. I transcoded a file from uncompressed AVI to FFV1 video plus FLAC audio in an MKV wrapper, and ran framemd5 checks for video and audio on both. The results can be seen in the linked spreadsheet, which has a colour coded explanation of what each tab means http://bit.ly/AVItoFFV1 (please download the file – it has conditional formatting that doesn’t display when viewed in a browser.)

The framemd5 output for video shows that the FFV1 transcoded file is frame-by-frame identical to the AVI source file. If you want to see how conditional formatting in the spreadsheet works, change a digit in one of the fields in the FFV1videoMD5 sheet and tab to the next field. You will see the amended data field change colour. That none of the md5 checksums in the FFV1videoMD5 sheet are coloured, shows they are identical to the checksums on the sheet it’s being compared to, i.e. AVIvideoMD5, the source video file information.

I also transcoded back to uncompressed AVI (v210 codec) from the FFV1 file and ran a framemd5 check to see if the reconverted AVI video output was identical to the source AVI. It was, and this is also included on the spreadsheet.

I chose to transcode audio to FLAC for research purposes, but wouldn’t recommend it as a video soundtrack format for preservation masters. Framemd5 decodes to raw video on-the-fly, from which it creates its checksums, but there is no standard ‘raw audio’ equivalent. By default, framemd5 pipes sound as 16 bit little endian PCM (CD quality audio) before creating checksums, which may not be the quality of the original sound. Different audio codecs can be forced in FFMPEG to match the original audio, but this is fairly complex and, since audio is (with the exception of subtitles or metadata) the smallest component of a video file, it is recommended to use the ‘-c:a copy’ command when transcoding; this means c(odec):a(udio) copy and creates a transcoded video file with identical audio to the source, eliminating the possibility of wrongly transcoding sound.

Something picked up by one of our video specialists (and the eagle eyed among you may spot) is that the FFV1 file is identified by MediaInfo as a ‘progressive’ file format (i.e. not interlaced.) This is literally true because two fields, each making up half the frame information in interlaced video, are combined (or interleaved) into a single frame in the FFV1 file, effectively making the file format progressive*. In interlaced video, odd lines and even lines are displayed consecutively, persistence of vision fooling the brain that a full frame has been displayed at once; this is why a 50Hz, 25 frames-per-second, interlaced video is, more literally, a 50Hz, 50 fields-per-second one. (Edit: the identification of the FFV1 transcode as 'progressive' possibly has more to do with the v210 codec** of the original source file than the ability of FFV1 to store interlacing information. See the Notes section on this page: https://www.loc.gov/preservation/digital/formats/fdd/fdd000353.shtml#notes)

Fortunately, FFMPEG has an interlace detection filter. You can read about it here: http://www.aktau.be/2013/09/22/detecting-interlaced-video-with-ffmpeg/

By running the ‘idet’ filter on the source and transcoded files, I was able to demonstrate that, when output as raw video, the FFV1 output removes the ‘progressive’ flag, correctly delivering the content as interlaced video, identical to the original. The idet filter only needs to be run on part of the video to detect interlacing, making a check very quick; I chose 1800 frames. Interlacing of this FFV1 file will, of course, be disregarded on modern, progressive displays but will display correctly on older, interlaced displays.

The FFMPEG ‘idet’ output, from source and transcoded video files, is in this text file: http://bit.ly/interlace_check - this file also has links to the page above, describing ‘idet’ & what its FFMPEG output means, and the FFV1 interlacing ‘bug report’ (that wasn’t really a bug.)

In the ‘idet’ information, you may notice that the FFV1 transcode INPUT video file is identified as yuv422p10le(progressive) i.e. ‘YUV 4:2:2 planar 10bit little endian (progressive)’. In the ‘idet’ raw video OUTPUT, it has been identified as yuv422p10le and the ‘progressive’ flag is no longer there. The interlacing detection output is at the bottom of each test (source and transcode) and is identical for both files (apart from the ‘Parsed_idet_0’ unique file numbers):

[Parsed_idet_0 @ 0602cb80] Repeated Fields: Neither:  1799 Top:     0 Bottom:     2

[Parsed_idet_0 @ 0602cb80] Single frame detection: TFF:  1076 BFF:    26 Progressive:    11 Undetermined:   688

[Parsed_idet_0 @ 0602cb80] Multi frame detection: TFF:  1800 BFF:     0 Progressive:     0 Undetermined:     1

High TFF or BFF figures (Top Field First or Bottom Field First) show that the video is interlaced. The idet filter tries to determine interlacing for individual frames first, then for multiple frames. False positives in the single frame detection have been ‘weeded out’ by multi-frame detection.

In summary, FFV1 transcoded files losslessly retain all video information and have been proven to retain interlacing. I think these tests show that FFV1 is truly lossless and can confidently be used as an archival preservation master format.

I hope my blog post has been useful, hasn’t confused anyone, and if you have any further questions, please email me at This email address is being protected from spambots. You need JavaScript enabled to view it.

 

* This isn't true of all FFV1 files, but is true of the v210 to FFV1 transcoded file under test. I also didn't force ffmpeg to interlace the transcoded file, because I wanted to simplify the command and see how many of the original file's attributes would be carried over automatically.

** Peter B. (Twitter handle @pjotrek_b) used an identical FFMPEG command to convert a v210 MOV file to FFV1 and MediaInfo recognised it as interlaced. This suggests that the misreporting of my FFV1 file as 'progressive' has more to do with the AVI container than the v210 encoding.

Comments   

#1 Jérôme Martinez 2019-04-06 10:24
Thanks for your feedback about FFV1.
Some notes:

"the FFV1 file is identified by MediaInfo as a ‘progressive’ file format (i.e. not interlaced.)" MediaInfo can read the "picture_struct ure" metadata from a FFV1 stream, and indicates "Interlaced" for a FFV1 stream if the IMO the issue is not in FFV1 but in the input, as FFV1 and MediaInfo both support interlaced flag.
I tried FFmpeg with " -top 1 -flags:v +ilme+ildct" from a progressive content from AVI but it set Matroska field info (not read by MediaInfo, will be fixed) and not the FFV1 picture_structu re flag, so something else to do with FFmpeg for forcing the FFV1 picture_structu re to interlaced when the input is progressive or unknown.

FLAC and MD5: actually FLAC bitstream itself stores the MD5 of the unencoded audio data, so you'll get a decoding error if there is a mismatch between source PCM and decoded PCM
Quote
#2 Peter B. 2019-04-06 19:41
Thanks Graham for independently confirming FFV1's mathematical losslessness!
The more the merrier.

btw, it is perfectly possible to store scantype and field order in FFV1 encoded files.
No need to do interlace detection.
Here's a Mediainfo screenshot: https://twitter.com/pjotrek_b/status/1114612758421618688

Nice greetings!
Peter
Quote
#3 Graham Purnell 2019-04-26 12:56
Thanks for the comments. Peter B suggested on Twitter that the incorrect identification of the FFV1 file as progressive probably had more to do with the AVI container than the v210 codec; he posted this along with his FFMPEG command, which was the same as mine (see here - https://twitter.com/pjotrek_b/status/1116353610743537670)

Peter B's MediaInfo screenshot of his FFV1 transcode, from a v210 MOV file, showed interlacing was detected (screen grab link in his comment.)
Quote
#4 Jérôme Martinez 2019-04-26 16:21
For reference, when input is configured as progressive (e.g. in AVI which is not intended for interlaced content), so when input metadata is wrong, you can change it before converting to FFV1 with:
- Container side (Matroska): " -top 1 -flags:v +ilme+ildct"
- bitstream side (FFV1): " -vf setfield=tff"

Both of them are detected by MediaInfo 19.04.

In summary: issue about lack of interlacement info is due to the input which is lacing of interlacement metadata + the transcoder settings which does not force them (so copy from input = progressive), not Matroska or FFV1 which both support interlacement metadata.
Quote
#5 Carl Eugen Hoyos 2019-05-11 23:56
@Jérôme: You get a decoding error on decoding flac for non-matching md5 of the decoded audio if (and only if) the decoder supports this feature.
FFmpeg contains: skip_bits(gb, 16); /* data crc */
Quote
#6 Ramon Coelho 2019-10-11 12:50
Thank you Graham for sharing your testing results and comments.
We have been testing the FFV1 in relation to a large video art archive which is preserved in the V210 codec.
Our testing was purely visual (on motion and noisy footage) and with making use of a multi burst and a 50x magnifying waveform monitor to see if there are any differences in the high frequencies between the V210 (captured with Media Express) and the FFV1 (captured with Virtual Dub 2).
With previous testing of JPG2000 codecs with this method the loss of higher frequencies was quit obvious and made us decide to stick to the uncompressed 4:2:2 YUV V210 codec as used on Black Magic Design hardware.
So I was quite convinced not to see any changes in the video outputs, hoping the term 'mathematically lossless' was no fairy tale. Your explanation of the Md5 results confirm our less scientific test results!
The only thing I want to know is what happens when some bits fail in a FFV1 stream.
Quote
#7 Graham Purnell 2019-10-11 14:17
Quoting Ramon Coelho:
...The only thing I want to know is what happens when some bits fail in a FFV1 stream.


Kieran O'Leary's blog says that FFV1 is good at error correction and problematic files may even play with no visually perceptible anomalies. The QCTools screen grab on the page below demonstrates this:

https://kieranjol.wordpress.com/

The corrupt slice of the video frame on the right has been replaced by the same slice area data from the previous frame, filling in the gap. The repaired frame is on the left. Obviously this will work best with static subjects.
Quote
#8 David Rice 2019-10-11 17:40
In addition to Graham's comment on how the ffv1 decoder, notes and substitute invalid frames, the granularity of ffv1's slice crcs enables the potential to use the crc as an error correction code, so some limited amounts of damage could be corrected back to validation against the embedded crc. This is documented in https://mediaarea.net/MediaConch/Documentation/Fixity.
Quote

Scroll to top