IJCV 2019 Video Enhancement with Task-Oriented Flow
Tianfan Xue3* Baian Chen1 Jiajun Wu1 Donglai Wei2 William T. Freeman1,3
1MIT CSAIL      2Harvard University      3Google Research
* This work was done when Tianfan Xue was a student at MIT.


Many video processing tasks, eg, temporal frame-interpolation (top) and video denoising (bottom), rely on flow estimation. In many cases, however, precise optical flow estimation is intractable and could itself be sub-optimal in performance. For example, although EpicFlow predicts precise movement of objects (see how well the flow field aligns with object boundary), small errors in the flow field result in obvious artifacts in interpolated frames, like the obscure fingers in (I-c). With the task-oriented flow proposed in this work (I-d), those interpolation artifacts disappear as in (I-e). Similarly, in video denoising, our task-oriented flow (II-d) deviates from EpicFlow (II-b), but leads to a cleaner output frame (II-e).

Result

If you cannot access YouTube, please download our video here in 1080p.
Abstract

Many video processing algorithms rely on optical flow to register different frames within a sequence. However, a precise estimation of optical flow is often neither tractable nor optimal for a particular task. In this paper, we propose task-oriented flow (TOFlow), a flow representation tailored for specific video processing tasks. We design a neural network with a motion estimation component and a video processing component. These two parts can be jointly trained in a self-supervised manner to facilitate learning of the proposed TOFlow. We demonstrate that TOFlow outperforms the traditional optical flow on three different video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. We also introduce Vimeo-90K, a large-scale, high-quality video dataset for video processing to better evaluate the proposed algorithm.

@article{xue2019video, title={Video Enhancement with Task-Oriented Flow}, author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T}, journal={International Journal of Computer Vision (IJCV)}, volume={127}, number={8}, pages={1106--1125}, year={2019}, publisher={Springer} }


Downloads:
Vimeo-90k Dataset

We also build a large-scale, high-quality video dataset, Vimeo90K. This dataset consists of 89,800 video clips downloaded from vimeo.com, which covers large variaty of scenes and actions. It is designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.

Sampled Frames (Full-resolution samples are here):



The list of original videos

The list of all full-length original videos can be found here, and youtube-dl can be used to batch download them. We reused some of utilities by AoT Dataset for scene detection/camera stabilization to generate these video clips and please refer to this repository for more details.

We further process these 89,800 video clips to generate the following two subsets.

Triplet dataset (for temporal frame interpolation):

The triplet dataset consists of 73,171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15K selected video clips from Vimeo-90K. This dataset is designed for temporal frame interpolation. Download links are
  • Testing set only (17GB): zip
  • Both training and test set (33GB): zip
Septuplet dataset (for video denoising, deblocking, and super-resoluttion):

Notice: we have recently updated our testing denoising dataset to fix a bug in denoising test data generation. The new quantitative result of our algorithm is reported in our updated paper

The septuplet dataset consists of 91,701 7-frame sequences with fixed resolution 448 x 256, extracted from 39K selected video clips from Vimeo-90K. This dataset is designed to video denoising, deblocking, and super-resolution.
  • The test set for video denoising (16GB): zip
  • The test set for video deblocking (11GB): zip
  • The test set for video super-resolution (6GB): zip
  • The original test set (not downsampled or downgraded by noise) (15GB): zip
  • The original training + test set (82GB): zip
Acknowledgement

This work is supported by NSF RI-1212849, NSF BIGDATA-1447476, Facebook, Shell Research, and Toyota Research Institute.