Machine Learning with Point Patterns
The fundamental tasks of machine learning include classification, regression, clustering, density estimation, dimensionality reduction, and novelty detection [1]. Classification, and regression, belong to the class of supervised learning which aims to find a function mapping between given inputs and outputs (called training data and labels, respectively). When the outputs are discrete and finite, the task is called classification, when the outputs are continuous, it is called regression. Clustering, density estimation, and dimensionality reduction, are unsupervised learning tasks which, are not provided with training outputs. Clustering aims at finding the structure of input data, whereas density estimation learns the distribution representing the input data, and dimensionality reduction compresses highdimensional data for the sake of visualization or feature extraction. Novelty detection is a semisupervised learning task, in which only a part of the training data is labelled. Specifically, the training data consist of only normal data and the task is to infer some ‘characterization’ of these data which is then used to identify novel data.
Point patterns–sets or multi-sets of unordered points–arise in numerous data analysis problems where they are commonly known as ‘bags’, e.g. in multiple instance learning [1], natural language processing and information retrieval (‘bag-of-words’), image and scene categorization (‘bag-of-visual-words’), and in sparse data (‘bag-of-features’), see e.g. [3] and references therein.
This project investigates machine learning for point patterns (PP). There are two main differences between conventional learning and PP learning. First, the input of PP learning are multi-sets of dependent points. The output of PP learning are labels of whole PPs, not individual points as in conventional learning. There are some interesting theoretical difference in learning with sets. It is fundamental to study methods for learning with PP via Bayesian methods or Deep Learning.
Generalisations of model-based leaning such as classification, novelty detection, and clustering have been developed in [3]. This methodology can also be used to generalise regression and density estimation to PP, and makes a good masters research topic. Such an investigation involves mastering the area of point processing modelling and estimation.
Another research topic is the generalisation of Deep Learning techniques for PP learning. A good Masters topic is to extend the key result of universally approximating continuous functions on Euclidean spaces with Neural Network to spaces of finite sets. Such a result can be developed by considering the Matheron topology on the space of finites set to define continuity of mappings from finite sets to finite sets.
References
[1] C.M Bishop, Pattern recognition and machine learning. Springer, 2006
[2] J. Amores, “Multiple instance classification: Review, taxonomy and comparative study,” Artificial
Intelligence, 201(81–105), 2013.
[3] B.-N. Vo, N. Dam, D. Phung, Q. Tran, and B.-T. Vo, “Model-Based Learning for Point Pattern
Data”, Pattern Recognition, 84(12):136-151, 2018.