Ray Conditioning:

Trading Photo-Consistency for Photo-realism in multi-view image generation

Eric Ming Chen 1 Sidhanth Holalkere1 Ruyu Yan1
Kai Zhang2 Abe Davis1

1 Cornell University 2 Adobe Research

ICCV 2023

We propose ray conditioning, a lightweight and geometry-free technique for multi-view image generation.

                 Novel Views
                                   Photoshop Blending

Ray conditioning enables photo-realistic multi-view image editing on natural photos via GAN inversion. The left half shows headshots of four individuals and their corresponding synthesized results from another viewpoint. The right half shows a portrait of two individuals (top row), the GAN inversion results of their faces (top row corners), and the resulting image (bottom row), in which their faces are replaced with synthesized faces looking in a different direction (bottom row corners).


Multi-view image generation attracts particular attention these days due to its promising 3D-related applications, e.g., image viewpoint editing. Most existing methods follow a paradigm where a 3D representation is first synthesized, and then rendered into 2D images to ensure photo-consistency across viewpoints. However, such explicit bias for photo-consistency sacrifices photo-realism, causing geometry artifacts and loss of fine-scale details when these methods are applied to edit real images. To address this issue, we propose ray conditioning, a geometry-free alternative that relaxes the photo-consistency constraint. Our method generates multi-view images by conditioning a 2D GAN on a light field prior. With explicit viewpoint control, state-of-the-art photo-realism and identity consistency, our method is particularly suited for the viewpoint editing task.


For certain classes of images with shared canonical structure, e.g. faces, we observe that it is possible to achieve viewpoint control without optimizing explicitly for 3D structure. The result is a modified 2D GAN that offers precise control over generated viewpoints without sacrificing photo-realism. Furthermore, we are able to train on data that does not contain multiple viewpoints of any single subject, letting us leverage the same diverse and abundant data used for regular GANs. Our method combines the photo-realism of existing GANs with the control offered by geometric models, outperforming related methods in both generation and inversion quality. This makes it particularly well-suited for viewpoint editing in static images.

Ray Conditioning

Ray conditioning is a technique to condition an image generator on the ray bundle of a camera for explicit viewpoint control. The spatial inductive bias of ray conditioning enables the image synthesizer to learn multi-view consistency from only single-view posed image collections.

Photo Collections as Unstructured Light Fields

Imagine we have used a dense camera grid to capture several light fields for different subjects, but then we accidentally lost access to all the images except only one randomly-chosen image per light field. More specifically, we are left with an unstructured collection of light field observations from a diverse set of scenes. The goal of our generative model is then to picture what those missing images may have looked like.


Latent samples of humans and cats. Ray conditioning enables a user to map images to set view points.

Compared to methods such as EG3D which explicitly generate a 3D model, the results from ray conditioning are more photorealistic, and do not suffer from the geometry artifacts which may be present from 3D reconstruction. For the individuals below, we use GAN inversion to create novel views. We compare the quality of novel views from ray conditioning with PTI to the novel views from EG3D with HFGI3D. Ray conditioning synthesizes more life-like eyes and a more realistic nose.

Ray Conditioning + PTI
               Novel Views
               Novel Views

When trained on a multi-view dataset such as SRN Cars, ray conditioning is able to generate and render view consistent videos at the cost of a 2D GAN. Results are pictured with a truncation of 0.6.


We thank Noah Snavely for helpful discussions, and the authors of EG3D and HFGI3D for kindly sharing datasets with us.


  author = {Eric Ming Chen and Sidhanth Holalkere and Ruyu Yan and Kai Zhang and Abe Davis},
  title = {Ray Conditioning: Trading Photo-realism for Photo-Consistency in Multi-view Image Generation},
  booktitle = {ICCV},
  year = {2023}