How to Use t-SNE Effectively

Wattenberg, Martin; Viégas, Fernanda; Johnson, Ian

doi:10.23915/distill.00002

Acknowledgments

We are grateful to Chris Olah and Shan Carter for creating this platform, and for excellent design and editorial help from Shan Carter. Daniel Smilkov, James Wexler, and Chi Zeng provided many helpful comments. We also thank Andrej Karpathy for creating the tsnejs library used in the interactive diagrams.

This work was made possible by the support of the Google Brain team.

Edited on Oct. 18, 2016 to describe and correct issues when perplexity is defined to be larger than the number of points. Thanks to Laurens van der Maaten for pointing this out.

References

Visualizing data using t-SNE [PDF]
Maaten, L.v.d. and Hinton, G., 2008. Journal of Machine Learning Research, Vol 9(Nov), pp. 2579—2605.

Updates and Corrections

View all changes to this article since it was first published. If you see a mistake or want to suggest a change, please create an issue on GitHub.

Citations and Reuse

Diagrams and text are licensed under Creative Commons Attribution CC-BY 2.0, unless noted otherwise, with the source available on GitHub. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: “Figure from …”.

For attribution in academic contexts, please cite this work as

Wattenberg, et al., "How to Use t-SNE Effectively", Distill, 2016. http://doi.org/10.23915/distill.00002

BibTeX citation

@article{wattenberg2016how,
  author = {Wattenberg, Martin and Viégas, Fernanda and Johnson, Ian},
  title = {How to Use t-SNE Effectively},
  journal = {Distill},
  year = {2016},
  url = {http://distill.pub/2016/misread-tsne},
  doi = {10.23915/distill.00002}
}

How to Use t-SNE Effectively

Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading. By exploring how it behaves in simple cases, we can learn to use it more effectively.

1. Those hyperparameters really matter

2. Cluster sizes in a t-SNE plot mean nothing

3. Distances between clusters might not mean anything

4. Random noise doesn’t always look random.

5. You can see some shapes, sometimes

6. For topology, you may need more than one plot

Conclusion

Acknowledgments

References

Updates and Corrections

Citations and Reuse