Statistics Seminars 2018 - 2019

Statistics Seminars will be in Room 125 Science North on Thursdays at 3.00pm.

Speaker: Paul McNicholas - Canada Research Chair in Computational Statistics, Professor, Department of Mathematics and Statistics, McMaster University

Title: Clustering, classification and data science

Date: Thursday 11th October

Abstract: An overview of clustering and classification, and where they fit within data science and statistics, will be presented. A statistical, model-based, framework for clustering and classification will be discussed. Several real data examples will be used to illustrate different approaches and/or subtleties, including high-dimensional data, three-way data, and handling outliers. The talk concludes with a discussion about ongoing and future work.

Title:    The Sparse Latent Position Model for nonnegative weighted networks

Speaker:    Riccardo Rastelli (UCD)

Date:    Thursday 18th October 2018

Time:    3pm

Location:     Room 1.25, O’Brien Centre for Science (North)

Abstract:
The Latent Position Model (LPM) is one of the fundamental models used in statistical network analyses. The LPM postulates that the nodes of a graph are characterised by a latent position in a Euclidean space, and that edges are created using the pairwise latent distances. The main difficulty in using these models is scalability, since estimation algorithms are generally characterised by a quadratic complexity in the number of nodes. In addition to the high computational requirements, model selection has not been addressed so far: in general, the latent number of dimensions is arbitrarily chosen. In this talk, I will introduce a new type of LPM that can be used to analyse bipartite and unipartite networks with nonnegative edge values. The proposed approach combines and adapts a number of ideas from the literature on latent variable network models. The resulting framework is able to capture important features that are generally exhibited by observed networks, such as sparsity and heavy tailed degree distributions. A fast variational Bayesian algorithm is proposed to estimate the parameters of the model. An advantage of the proposed method is that the number of latent dimensions can be automatically deduced in one single algorithmic framework, hence addressing both main research questions. Finally, applications of the proposed methodology are illustrated on artificial and real datasets, and comparisons with other existing procedures are provided.

Coffee and tea will be available in the School Common Room afterwards

Title:    Exploring Fuzzy clustering of multivariate skew data

Speaker:    Francesca Greselin (University of Milano Bicocca)

Date:    Thursday 25th October 2018

Time:    3pm

Location:     Room 1.25, O’Brien Centre for Science (North)

Abstract:

With the increasing availability of multivariate datasets, asymmetric structures in the data ask for more realistic assumptions, with respect to the incredibly useful paradigm given by the Gaussian distribution. Moreover, in performing ML estimation we know that a few outliers in the data can affect the estimation, hence providing unreliable inference.

Challenged by such issues, more flexible and solid tools for modeling heterogeneous skew data are needed.  Our fuzzy clustering method is based on mixtures of Skew Gaussian components, endowed by the joint usage of impartial trimming and constrained estimation of scatter matrices, in a modified maximum likelihood approach. The algorithm generates a set of membership values, that are used to fuzzy partition the data and to contribute to the robust estimates of the mixture parameters.

The new methodology has been shown to be resistant to different types of contamination, by applying it on artificial data. A brief discussion on the tuning parameters has been developed, also with the help of some heuristic tools for their choice. Finally, synthetic and real datasets are analyzed, to show how intermediate membership values are estimated for observations lying at cluster overlap, while cluster cores are composed by observations that are assigned to a cluster in a crisp way.

We will also show the advantages of the fuzzy approach with respect to classical model-based clustering via finite mixtures. In the former we can set the level of fuzziness and/or the percentage of units to be crisply assigned to groups, as well as the relative entropy of the solution, while in the mixture approach they arise as an uncontrolled byproduct of the estimated model.

(Joint work with Agustin Mayo Iscar and  Luis Angel Garcia Escudero)

Coffee and tea will be available in the School Common Room afterwards

Title:    Forward-stagewise clustering: An algorithm for convex clustering

Speaker:    Mimi Zhang (TCD)

Date:    Thursday 22nd November 2018

Time:    3pm

Location:     Room 1.25, O’Brien Centre for Science (North)


Abstract:

This talk presents an exceptionally simple algorithm, called forward-stagewise clustering, for convex clustering. Convex clustering has drawn recent attention since it nicely addresses the instability issue of traditional non-convex clustering methods. While existing algorithms can precisely solve convex clustering problems, they are sophisticated and produce (agglomerative) clustering paths that contain splits. This motivates us to propose an algorithm that only produces no-split clustering paths. The approach undertaken here follows the line of research initiated in the area of regression. Specifically, we apply the forward-stagewise technique to clustering problems and explain both theoretically and practically how the algorithm produces no-split clustering paths. We further suggest rules of thumb for the algorithm to be applicable to cases where clusters are non-convex. The performance of the proposed algorithm is evaluated through simulations and a real data application.

Coffee and tea will be available in the School Common Room afterwards

 

Title:    Getting the message across: natural history films and public engagement

Speaker:    Professor Adam Kane UCD School of Biology and Environmental Sciences


Date:    Thursday 7th February 2019

Time:    3pm

Location:     Room 1.25, O’Brien Centre for Science (North)

Abstract:

We live in the Anthropocene, a critical time when our actions as a species are having a destructive effect on our planet and its biodiversity. The urgency of this global crisis contrasts with our “extinction of experience” with nature in an increasingly urbanised world. The importance of raising awareness and support from public opinion has long been recognised by scientists and conservationists. Indeed, campaigns aimed at raising awareness on environmental topics and changing public opinions have highlighted the pivotal role of emotional and entertainment values in triggering engagement. However, many natural history films are often criticised for portraying an unrealistic view of nature, showing only pristine environments and side-stepping conservation issues for the sake of entertainment.

Using a big data approach, we measure the effect of a hugely popular BBC nature documentary, Planet Earth 2, on public interest and engagement with nature. We analyse tweets with the hashtag #PlanetEarth2 to evaluate audience reaction to the film, and visits to Wikipedia pages to assess audience engagement for information. Finally, we investigate the existence of long-term changes in public awareness and of potential proactive actions which could be attributed to the show.

Our findings suggest that blue chip natural history documentaries like Planet Earth 2 can influence public interest in nature at both long and short timescales, an effect that is comparable to that generated by dedicated species awareness days. Despite this positive effect of a show that consistently frames nature in spectacular terms we wonder whether a more realistic portrayal of the plight of the natural world would be even more beneficial to its conservation.

Coffee and tea will be available in the School Common Room afterwards


Title:    Forecasting with FAVAR: Macroeconomic versus financial factors


Speaker:    Alessia Paccagnini - (Smurfit Business School, UCD)

Date:    Thursday 21st February 2019

Time:    3pm

Location:     Room 1.25, O’Brien Centre for Science (North)

Abstract:

We assess the predictive power of macroeconomic and financial latent factors on the key variables for the US economy before and after the recent Great Recession. We implement a forecasting horserace among Factor Augmented VAR (FAVAR), Classical, and Bayesian VAR models. Bayesian FAVAR models outperform others. Focusing only on macroeconomic or on financial latent factors, we find how the financial factors are the main drivers in the short run. Meanwhile, the macro factors aspect production in the medium and long run.


Coffee and tea will be available in the School Common Room afterwards

Speaker: Andrew Smith (UCD)

Title:       "Designing Homework Assignments to Inhibit Plagiarism"

Date:       Monday 4th March 2019

Time:       12 noon

Location: Room 1.25, O’Brien Centre for Science (North)

Abstract: Many of us teach or tutor modules with continuous assessment. Students may be tempted to copy each others homework scripts, even when this inhibits their own learning experience. I have in the last year developed homework problems which are special to each student depending on their unique student number. Students may still cooperate on problem approaches, but the unique homework approach discourages mindless copying, while originators of copied work can be traced. Setting unique homeworks poses additional challenges in devising appropriate problems and grading student scripts. I show examples of problems I have used in statistics modules, discussing pitfalls and successes.


Title:    Count data: over/under-dispersion, zero-inflation and some extended models


Speaker:    John Hinde - Professor of Statistics, NUI Galway

Date:    Thursday 7th March 2019

Time:    3pm

Location:     Room 1.25, O’Brien Centre for Science (North)

Abstract:

The standard distribution for the analysis of count data is the Poisson. Frequently, in practice, it is too restrictive in that the variability in the data is either significantly greater (overdispersed) or less (underdispersed) than that implied by the model’s variance function.  For the analysis of count data, Nelder and McCullagh (1989) says that overdispersion is the norm and not the exception and this has been well studied, see Hinde and Dem ́etrio (1998) and many subsequent articles presenting a wide range of distributions.  An associated phenomena is zero-inflation where the data exhibit more zeros than expected under the Poisson model. Models allowing for zero-inflation include the zero-inflated mixture model and the two-stage hurdle model.

In this talk I will discuss these basic ideas and consider some generalisations of the Poisson distribution that provide greater flexibility, including COM-Poisson, discrete Weibull, and Poisson-Tweedie models. These will be illustrated with various datasets and I will attempt to make some general points about modelling and computation.

Coffee and tea will be available in the School Common Room afterwards

 

Title:    Modelling ancestry on genome data with low coverage

Speaker:    Michael Salter-Townshend (UCD)

Date:    Thursday 28th March 2019

Time:    3pm

Location:     Room 1.25, O’Brien Centre for Science (North)

Abstract:

Recent advances have allowed accurate inference of the timings of and contributors to admixture events between human populations (wherein genetically diverged populations come together) from Single Nucleotide Polymorphism data. It has been shown that statistical methods based on Hidden Markov Models can accurately assign chromosomal segments of the admixed individuals to unseen ancestral populations, and simultaneously infer how these ancestral populations relate to observed modern populations. This has been accomplished in the context of high quality genotype data on many individuals, sampled from both the admixed target population and multiple extant reference populations. The most recent methods do not need to assume prior knowledge of the relationship between the unseen ancestral mixing groups and the reference panels, however, such methods have thus far not been extended to non-model organisms, for which high quality data are unavailable. Researchers in this area are increasingly reliant on low-coverage whole-genome sequencing data, for which genotypes are not "called" and a high rate of missingness occurs. We propose and outline an extension to existing models for multiway admixture that accounts for uncertainty in the genotypes and allows for a large rate of missing data. We demonstrate and assess the method by downsampling high coverage data. The approach will be especially useful in conservation genetics studies.

Coffee and tea will be available in the School Common Room afterwards


Title:           How do birds compose their music? Modelling ecological patterns in bird song data

Speaker:     Rafael de Andrade Moral - Lecturer in Statistics, Maynooth University

Date:           Thursday 25th April 2019

Time:           3pm

Location:    Room 1.25, O’Brien Centre for Science (North)

 

Abstract: Biomusicology is the study of animal singing in biological populations. It is an increasingly growing interdisciplinary area of science, especially as a new branch of ecological studies. Sound traits such as frequency, amplitude, period (among many others) can be used to study evolutionary and ecological processes related to the emission and reception of acoustic signals. In this work, we model the mean and dispersion of the fundamental frequency of perching bird songs to study how their phylogeny and functional ecology influence their singing. We then propose a joint model for the duration, minimum and maximum frequencies of the bird songs, based on the multivariate covariance generalized linear modelling framework. Our results suggest that modelling the mean alone would not reveal the contribution of musical pitch variability to microevolutionary and ecological processes of Neotropical perching birds. We discuss model implementation problems and present ideas for further studies. 

This is work in collaboration with Wagner Bonat (Federal University of Paraná, Brazil), Joe Timoney (Maynooth University), Mateus Mendes (University of Campinas, Brazil), and Luciano Verdade (University of São Paulo, Brazil).

 

Coffee and tea will be available in the School Common Room afterwards