[논문리뷰] Semantic Pyramid for Image Generation, CVPR 2020

[Problem Definition]

Versatile & Flexible framework in the generation task

[Previous Method]

- Inversion problem: 1) Optimization of feature 2) train CNN

- Limitation: blurry output, diversity of output

[Proposed Method]

0. Overall View

- Leveraging features from different semantic levels: semantic information을 조금 더 continous한 느낌으로 이용하기

- Flexibility & Controllability: 다양한 이미지에서, 다양한 level의 변환

- Diversity: 다양한 output

1. Architecture

- Encoder

Pretrained VGG-16 사용.

Coarse-to-fine generation(Semantic generation pyramid)

- Decoder

Input: 1) Deep Feature 2) Noise vector 3) Mask

2. Training

- 학습시 Random Layer, Random Mask로 선정

- 생성시 Encoder(Classification network)의 원하는 Level만 사용됨.

3. Loss

- 1) Adversarial Loss: LSGAN 사용.

- 2) Reconstruction Loss: Perceptual Loss 유사. 생성시 사용된 feature level만 비교.

Maxpooling을 사용하여 비교하면, pixel-level 보다 high-level에서 비교가 가능해짐.

- 3) Diversity-loss: 전형적인 Diversity Loss (다른 거라면 분자 분모가 바뀐 정도?)

- 4) 종합

[Experiment]

- Internal Experiment (Ablation + etc)

- External Experiment (Comparison) - 논문에 없음

- Application

Re-labeling: Class-label이 Generator에 들어가기 때문에, 이 Label을 변형하면 다른 결과가 나옴

[추가 의문점]

- Mask를 씌어서 부분적인 변환을 가능하게 하는 아이디어는, StyleGAN2에 들어가는 Modulation과정처럼 비슷하게 이용될 수 있을 것 같다.

- 전형적인 encoder-decoder 구조이고, pretrained된 network를 쓰는 것도 새로운 idea가 아닌데, 왜 이런 논문이 이렇게 늦게 나왔는가? 비슷한 논문은 없었는가?

- 이미지 사이즈가 몇인지? 더 크게 했을 때에도 가능할까?

- 다른 논문과의 비교가 없는 이유는 무엇일까?

GANs for Good- A Virtual Expert Panel by DeepLearning.AI 요약 (0)	2020.10.05
[논문리뷰] Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling (0)	2020.09.15
[논문리뷰] A U-Net Based Discriminator for Generative Adversarial Networks, CVPR 2020 (0)	2020.09.04
[논문리뷰] Gradient Free Optimizer: BasinCMA, NeverGrad (0)	2020.09.04
[논문리뷰] Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation (0)	2020.04.01

'기술예술' Related Articles