Machine Learning Methods in Visualisation for Big DataTutorial co-located with EuroVis 2016, June 2016, Groningen, the Netherlands
Monday June 6, 2016, 14:00-18:00
In order to handle big data challenges, machine learning techniques can be advantageous in simplifying and summarising large data sets for visualisation. Machine learning provides methods that allow the summarisation of very large data sets whereas visualisation leverages the human visual system to help find unanticipated patterns. In this tutorial, we cover machine learning methods relevant to the area of visualisation. In addition to an exploration of the applicability, strengths, and weaknesses of such approaches, we provide links to available software tools that can help provide solutions to machine learning problems.
Registration for this tutorial is handled through the Eurovis 2016
registration system. Mark the checkbox of the tutorial during registration to
indicate you wish to attend.
Participation in this tutorial is limited to 50 and registration will
happen on a first come first serve basis.
If you have any further queries please contact the tutorial organizers at: firstname.lastname@example.org
Machine Learning (ML) approaches provide powerful tools for the classification and summarisation of large quantities of data. These automated or semi-automated approaches allow the systems of today to scale to large data sets. The methods are critical for the big data problem and can provide valuable benefits for the field of information visualisation through increased scalability. Methods from ML can be used to simplify data to render it accessible to visualisation systems. Parametric models have the advantage that they are also useful for big data, because they can be trained on a representative subset and afterwards applied to all the data. This makes them much more scalable and also allows the user to test generalisation, which matters as all analytics is fundamentally statistical.
Information Visualisation (IV) provides interactive methods for the visual representation of data. The tools and techniques of our field leverage the human perceptual system and the ability of the user to explore and explain patterns in data. Our systems can provide a means to discover unanticipated patterns in data sets that can be subsequently investigated quantitatively. However, the visual system has its limitations. The human vision is intrinsically limited to two or three dimensions and only a few combined features can be handled in a comprehensible way. For high-dimensional data exhaustive human analysis of all data features and their combinations can become arduous or infeasible.
Machine learning can help IV by providing methods to summarise and reduce complex data to levels that can be understood by humans; such summarised representations can then be integrated into visualisation systems, complementing their existing capabilities. ML is particularly powerful in this context because ML algorithms are well adapted to extracting relevant information from high-dimensional data sets following mathematical objectives. The challenge of applying machine learning to information visualisation is that it is an unsupervised task: there is no target variable providing a correct answer for the model to aim at. This is because the goal is often to explore data beyond what is known through existing annotation or hypotheses. This challenge has required innovation from the ML community, particularly in devising effective optimisation criteria (the so-called 'cost function'). It also leads to challenges in comparing the results from different methods to determine which are most effective.
In order to tackle big data problems our two communities need to leverage the advantages that
the two fields can provide to each other. However, our fields are only beginning to work
together. This tutorial is designed to cover relevant machine learning methodologies for
visualisation and provides some practical resources that participants can use for the
visualisation techniques and systems that they design. Our tutorial assumes a visualisation
audience and covers the relevant tools and techniques for machine learning from this
perspective. In addition to the scientific content, we present existing software solutions that
researchers and practitioners can use in order to apply these techniques immediately.
Course notes and materials
The slides and materials will be made available online before the tutorial.
Ian Nabney is the Director of the System Analytics Research Institute and Head of both the Computer Science and Mathematics departments at Aston University. He received his BA in Mathematics from Oxford University and a PhD in Mathematics from Cambridge University. He has over 20 years’ experience in machine learning research, has published more than 80 papers (1900 citations), and is the system architect for the Netlab pattern analysis toolbox, which has been downloaded more than 40,000 times since 1999 (the accompanying book has been through three reprints), and the Data Visualisation and Modelling System (DVMS) which integrates data projection and information visualisation techniques to provide a rich interactive environment for data exploration and visual analytics. DVMS will be used for the demonstrations of generative models. He has won grants worth more than 3M GBP from EPSRC, the EU, TSB, and industry and has supervised 11 PhD students to completion. He is the Chair of the Natural Computing Applications Forum, a principal mechanism in the UK for exchange of ideas between academics and industry on natural computing technology and practical applications.
Jaakko Peltonen is an associate professor of statistics (data analysis) at the School of Information Sciences, University of Tampere; he is also currently academy research fellow at Aalto University where he is a PI of the Statistical Machine Learning and Bioinformatics research group. He received his D.Sc. from Helsinki University of Technology in 2004. He is an associate editor of Neural Processing Letters and an editorial board member of Heliyon. He has served in organising committees of seven international conferences and one international summer school and in program committees of 24 international conferences/workshops, and has referee duties for numerous international journals and conferences. He is an expert in statistical machine learning methods for exploratory data analysis, visualisation of data, and learning from multiple sources.
received his PhD in Computer Science from the University of British
Columbia, Canada in 2008. He is currently a Lecturer of Computer Science at Swansea University
in the United Kingdom. During his post-doctoral studies at University College Dublin, he applied his
expertise in information visualisation to help visualise the results of machine learning
approaches, particularly in the area of social media visualisation. This work inspired him to
co-chair the AAAI ICWSM Workshop on Social Media Visualisation (SocMedVis 2012 and 2013). His
other areas expertise primarily lie in graph visualisation and drawing as well as perceptual
factors in information visualisation.
This tutorial was first discussed at the Dagstuhl seminar 15101 Bridging Information Visualization with Machine Learning.