Machine learning is the science of finding patterns and making predictions from data. However, not all patterns are equally useful or meaningful. Some patterns may reflect spurious correlations, confounding factors, or irrelevant variations that do not capture the underlying structure or causal relationships of the data.

For example, suppose we want to learn a model that can recognize faces from images. A naive model may learn to associate certain features with faces, such as the presence of eyes, nose, mouth, etc. However, these features are not invariant to changes in pose, lighting, expression, or identity of the face. A more robust model would learn to separate these factors of variation and represent them as independent dimensions in a latent space.

This is the idea behind disentanglement in machine learning: learning to break down, or disentangle, each feature into narrowly defined variables and encode them as separate dimensions. The goal is to mimic the quick intuition process of a human, using both “high” and “low” dimension reasoning.

Disentanglement has many benefits for machine learning applications, such as:

Disentanglement is often achieved by using the unsupervised or self-supervised learning techniques, such as variational autoencoders (VAEs), generative adversarial networks (GANs), or contrastive learning methods. These techniques aim to learn a low-dimensional latent representation of the data that captures the most salient and independent factors of variation.

However, disentanglement is not a well-defined concept and there is no consensus on how to measure it. Many metrics have been proposed to quantify disentanglement, but they often rely on assumptions or heuristics that may not hold in general. Moreover, there is no clear evidence that disentanglement leads to better performance or generalization in downstream tasks.

Overall, disentanglement still remains a challenging and active research area, with many promising directions and applications. By disentangling the relevant factors of variation in data, machine learning models can become more powerful and flexible, and enable new forms of human-machine interaction and creativity. Next week, I will discuss some of the papers in detail which have used the concept of disentanglement to improve the performance of their models.

References

[1] Burgess, Christopher P., et al. “Understanding disentangling in \(\beta\)-VAE.” arXiv preprint arXiv:1804.03599 (2018).

[2] Kim, Minyoung, et al. “Bayes-factor-vae: Hierarchical bayesian deep auto-encoder models for factor disentanglement.” Proceedings of the IEEE/CVF international conference on computer vision. 2019.

[3] Zhao, Shengjia, Jiaming Song, and Stefano Ermon. “Infovae: Balancing learning and inference in variational autoencoders.” Proceedings of the aaai conference on artificial intelligence. Vol. 33. No. 01. 2019.

[4] Choi, Yunjey, et al. “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[5] Kim, Taeksoo, et al. “Learning to discover cross-domain relations with generative adversarial networks.” International conference on machine learning. PMLR, 2017.

[6] Chen, Ting, et al. “A simple framework for contrastive learning of visual representations.” International conference on machine learning. PMLR, 2020.

[7] Kahana, Jonathan, and Yedid Hoshen. “A contrastive objective for learning disentangled representations.” Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI. Cham: Springer Nature Switzerland, 2022.

[8] Grill, Jean-Bastien, et al. “Bootstrap your own latent-a new approach to self-supervised learning.” Advances in neural information processing systems 33 (2020): 21271-21284.