VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
· 9 min read
Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS).
VAE related posts
View All TagsVariational Inference with adversarial learning for end-to-end Text-to-Speech (VITS).
VAE(Variational Auto-Encoder)는 seqeunces를 사용하는 long-term 구조에 사용하기어렵다.
그래서 outputs의 embeddings을 각 subsequence 에 독립적으로 생성하기 위하여 사용하는 hierarchical decoder 구조를 제안한다.