Video Enhancement with Task-Oriented Flow

IJCV 2019 Video Enhancement with Task-Oriented Flow

Tianfan Xue^3*

Baian Chen¹

Jiajun Wu¹

Donglai Wei²

William T. Freeman^1,3

¹MIT CSAIL ²Harvard University ³Google Research

* This work was done when Tianfan Xue was a student at MIT.

Many video processing tasks, eg, temporal frame-interpolation (top) and video denoising (bottom), rely on flow estimation. In many cases, however, precise optical flow estimation is intractable and could itself be sub-optimal in performance. For example, although EpicFlow predicts precise movement of objects (see how well the flow field aligns with object boundary), small errors in the flow field result in obvious artifacts in interpolated frames, like the obscure fingers in (I-c). With the task-oriented flow proposed in this work (I-d), those interpolation artifacts disappear as in (I-e). Similarly, in video denoising, our task-oriented flow (II-d) deviates from EpicFlow (II-b), but leads to a cleaner output frame (II-e).

Result

If you cannot access YouTube, please download our video here in 1080p.

Abstract

Many video processing algorithms rely on optical flow to register different frames within a sequence. However, a precise estimation of optical flow is often neither tractable nor optimal for a particular task. In this paper, we propose task-oriented flow (TOFlow), a flow representation tailored for specific video processing tasks. We design a neural network with a motion estimation component and a video processing component. These two parts can be jointly trained in a self-supervised manner to facilitate learning of the proposed TOFlow. We demonstrate that TOFlow outperforms the traditional optical flow on three different video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. We also introduce Vimeo-90K, a large-scale, high-quality video dataset for video processing to better evaluate the proposed algorithm.

@article{xue2019video, title={Video Enhancement with Task-Oriented Flow}, author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T}, journal={International Journal of Computer Vision (IJCV)}, volume={127}, number={8}, pages={1106--1125}, year={2019}, publisher={Springer} }

Downloads:

Paper: PDF, arXiv
Code: Github

Vimeo-90k Dataset

We also build a large-scale, high-quality video dataset, Vimeo90K. This dataset consists of 89,800 video clips downloaded from vimeo.com, which covers large variaty of scenes and actions. It is designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.

Sampled Frames (Full-resolution samples are here):

The list of original videos

The list of all full-length original videos can be found here, and youtube-dl can be used to batch download them. We reused some of utilities by AoT Dataset for scene detection/camera stabilization to generate these video clips and please refer to this repository for more details.

We further process these 89,800 video clips to generate the following two subsets.

Triplet dataset (for temporal frame interpolation):

The triplet dataset consists of 73,171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15K selected video clips from Vimeo-90K. This dataset is designed for temporal frame interpolation. Download links are

Testing set only (17GB): zip
Both training and test set (33GB): zip

Septuplet dataset (for video denoising, deblocking, and super-resoluttion):

Notice: we have recently updated our testing denoising dataset to fix a bug in denoising test data generation. The new quantitative result of our algorithm is reported in our updated paper

The septuplet dataset consists of 91,701 7-frame sequences with fixed resolution 448 x 256, extracted from 39K selected video clips from Vimeo-90K. This dataset is designed to video denoising, deblocking, and super-resolution.

The test set for video denoising (16GB): zip
The test set for video deblocking (11GB): zip
The test set for video super-resolution (6GB): zip
The original test set (not downsampled or downgraded by noise) (15GB): zip
The original training + test set (82GB): zip

Acknowledgement

This work is supported by NSF RI-1212849, NSF BIGDATA-1447476, Facebook, Shell Research, and Toyota Research Institute.