Video Enhancement with Task-Oriented Flow
Tianfan Xue3* Baian Chen1 Jiajun Wu1 Donglai Wei2 William T. Freeman1,3
1MIT CSAIL      2Harvard University      3Google Research
* This work was done when Tianfan Xue was a student at MIT.

Many video processing tasks, eg, temporal frame-interpolation (top) and video denoising (bottom), rely on flow estimation. In many cases, however, precise optical flow estimation is intractable and could itself be sub-optimal in performance. For example, although EpicFlow predicts precise movement of objects (see how well the flow field aligns with object boundary), small errors in the flow field result in obvious artifacts in interpolated frames, like the obscure fingers in (I-c). With the task-oriented flow proposed in this work (I-d), those interpolation artifacts disappear as in (I-e). Similarly, in video denoising, our task-oriented flow (II-d) deviates from EpicFlow (II-b), but leads to a cleaner output frame (II-e).


If you cannot access YouTube, please download our video here in 1080p.

Many video processing algorithms rely on optical flow to register different frames within a sequence. However, a precise estimation of optical flow is often neither tractable nor optimal for a particular task. In this paper, we propose task-oriented flow (TOFlow), a flow representation tailored for specific video processing tasks. We design a neural network with a motion estimation component and a video processing component. These two parts can be jointly trained in a self-supervised manner to facilitate learning of the proposed TOFlow. We demonstrate that TOFlow outperforms the traditional optical flow on three different video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. We also introduce Vimeo-90K, a large-scale, high-quality video dataset for video processing to better evaluate the proposed algorithm.

@article{xue17toflow,   author = {Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},   title = {Video Enhancement with Task-Oriented Flow},   journal = {arXiv},   year = {2017} }

Vimeo-90k Dataset

We also build a large-scale, high-quality video dataset, Vimeo90K. This dataset consists of 89,800 video clips downloaded from, which covers large variaty of scenes and actions. It is designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.

Sampled Frames (Full-resolution samples are here):

We further process these 89,800 video clips to generate the following two subsets (the orignal video clips will come out soon).

Triplet dataset (for temporal frame interpolation):

The triplet dataset consists of 73,171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15K selected video clips from Vimeo-90K. This dataset is designed for temporal frame interpolation. Download links are
  • Testing set only (17GB): zip
  • Both training and test set (33GB): zip
Septuplets dataset (for video denoising, deblocking, and super-resoluttion):

The septuplet dataset consists of 91,701 7-frame sequences with fixed resolution 448 x 256, extracted from 39K selected video clips from Vimeo-90K. This dataset is designed to video denoising, deblocking, and super-resolution.
  • The test set for video denoising (16GB): zip
  • The test set for video deblocking (11GB): zip
  • The test set for video super-resolution (6GB): zip
  • The original test set (not downsampled or downgraded by noise) (15GB): zip
  • The original training + test set (82GB): zip

This work is supported by NSF RI-1212849, NSF BIGDATA-1447476, Facebook, Shell Research, and Toyota Research Institute.