[논문리뷰] MonkeyNet (MOviNg KEYpoints Network)

문제정의

Still Image를 Driving Video에 따라서 새롭게 동영상을 생성

(즉, [(1)Still Image와 Driving Video와 움직임 비교] + [(2)영상 생성] 두 부분으로 나눠짐)

((1) 부분에서 Keypoint와 Optical Flow를 만드는 방법에 Contribution이 있는 논문임)

- 목적: Sparse Keypoint 추출

- 특징: Unsupervised (How?)

- 흐름: (0) 이미지 입력 (1) Heatmap 추출 (2)Softmax Activation (3)Keypoint 출력

- 네트워크 구조: [Unet] + [Softmax Activation(Confidence Map으로)]

- (3) Keypoint 구조: K개 Heatmap channel -> K개 Keypoint

- (3) 수식: coordinate h_k + covariance (orientation을 캐치하게하려고)

- 참고 논문 (Heatmap Generation관련)

(위 수식들이 어떻게 코드 상에서 이용되는지는 확인을 못함, gaussian2kp부분일 것이라 생각)

(heatmap은 어떻게 구하는거지? probability가 높은게 heatmap이 높은거지? 높고 낮음은 unsupervised이거나 self-supervised면 어떻게 알 수 있는거지? 정답이 없는데)

- 목적: Generator 통해 Reconstruct할 때 Alignment를 해줌

- 네트워크 구조: [Unet]

- Deformation Module

문제: Unet은 misalignment를 발생시킴

해결: Optical Flow를 활용하여 Feature를 Target에 맞춤

방법: Bilinear Sampler를 활용(Fully Differentiable하기 위해)

문제: Bilinear Sampler는 Receptive Field가 작음

해결: Difference를 예측하는 방식으로 해결 (?)

- 목적: [sparse keypoint,이미지]를 통해서 [dense optical flow]를 추정

- 가정: keypoint는 rigid(딱딱한/변하지않는 느낌/변형이 없으니 문제가 쉬워짐)한 point에 위치

- Coarse Estimation

- 학습할 때, Discriminator만 Keypoint Location을 제공함. -> Moving Parts에 집중하도록 하기 위함

기타 의문점

- Temporal coherency가 따로 맞추는부분이 없는 듯? 어떻게 유지가 되는걸까?

[논문리뷰] Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling (0)	2020.09.15
[논문리뷰] Semantic Pyramid for Image Generation, CVPR 2020 (0)	2020.09.04
[논문리뷰] A U-Net Based Discriminator for Generative Adversarial Networks, CVPR 2020 (0)	2020.09.04
[논문리뷰] Gradient Free Optimizer: BasinCMA, NeverGrad (0)	2020.09.04
[논문리뷰] Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation (0)	2020.04.01