Seminars this semester


   Series:

 
Oct 5 Thu John Fry (Sheffield) Statistics Seminar
14:00 The Mathematics of Financial Crashes
Click here to insert in your Google calendar
 
Oct 5 Thu Keith Harris (Sheffield) Statistics Seminar
14:00 Statistical Modelling and Inference for Radio-Tracking.
Click here to insert in your Google calendar
 
Nov 2 Thu Nancy Nicholls (Reading) Statistics Seminar
14:00 Getting Started: Data Assimilation for Very Large Inverse Problems in Environmental Science
Click here to insert in your Google calendar
 
Nov 9 Thu Clive Anderson (Sheffield) Statistics Seminar
14:00 Some Extreme Value Problems in Metal Fatigue
Click here to insert in your Google calendar
 
Nov 16 Thu David Scott (Auckland) Statistics Seminar
14:00 The hyperbolic and related distributions: problems of implementation
Click here to insert in your Google calendar
 
Nov 23 Thu Stuart Barber (Leeds) Statistics Seminar
14:00 Signal processing using complex Daubechies wavelets
Click here to insert in your Google calendar
 
Nov 30 Thu Goran Peskir (Manchester) Statistics Seminar
14:00 Optimal stopping
Click here to insert in your Google calendar
 
Dec 7 Thu Raj Bhansali (Liverpool) Statistics Seminar
14:00 Frequency Analysis of Chaotic Intermittency Maps with Slowly Decaying Correlations
Click here to insert in your Google calendar
 
Dec 14 Thu Stefanie Biedermann (Southampton) Statistics Seminar
14:00 Robust optimal designs for dose-response experiments
Click here to insert in your Google calendar
 
Feb 8 Thu Elke Thonnes (University of Warwick) Statistics Seminar
14:00 Statistical analysis of pore patterns in fingerprints
Click here to insert in your Google calendar
 
Feb 22 Thu Ed Cripps (Sheffield) Statistics Seminar
14:00 Variable selection and covariance selection in multivariate Gaussian linear regression
Click here to insert in your Google calendar
 
Mar 22 Thu Søren Asmussen (Aarhus) Statistics Seminar
14:00 Tail Probabilities for a Computer Reliability Problem
Click here to insert in your Google calendar
 
May 3 Thu Chris Williams (Edinburgh) Statistics Seminar
14:00 Gaussian processes and machine learning
Click here to insert in your Google calendar
 
May 10 Thu Simon Tavaré (Southern California) Statistics Seminar
14:00 Stochastic processes in stem cell evolution
Click here to insert in your Google calendar
 
May 31 Thu Mark Davis (Imperial) Statistics Seminar
14:00
Click here to insert in your Google calendar
 
Oct 11 Thu Richard Jacques (University of Sheffield) Statistics Seminar
14:00 Classification Methods for the Analysis of High Content Screening Data
Hicks Room K14
  Abstract:
The current paradigm for the identification of candidate drugs within the pharmaceutical industry typically involves the use of high throughput screens. A high throughput screen allows a large number of compounds to be tested in a biological assay in order to identify any activity inhibiting or activating a biological process. From each of the assays run through a high throughput screen a high content screen image is produced which can be analysed using advanced imaging algorithms to produce a set of variables which reflect the observed activity of the cells within the image. Classification methods have important applications in the analysis of high content screening data where they are used to predict which compounds have the potential to be developed into new drugs. Statistical approaches have been developed that enable classification using a single parameter. However, approaches for multi-parametric selection are still in their infancy. Furthermore, proper exploitation of the information contained within each high content screen image will enable more refined compound selection. A new classification technique for the analysis of data from high content screening experiments will be presented and the methodology illustrated on an example data set using a random forest classifier.
Click here to insert in your Google calendar
 
Oct 11 Thu Michailina Siakalli (University of Sheffield) Statistics Seminar
14:00 Stochastic Stabilization
Hicks Room K14
  Abstract:
In simple words stability of a dynamic system means sensitivity of the system to changes. Consider a first order non-linear differential equation system dx(t)\dt=f(x(t)). Investigating what happens when noise is added, it has so far been observed that Brownian motion noise can stabilize an unstable system or destabilize it in the case that is stable. In my talk I will describe what is happening when the given non-linear system is perturbed by different types of Poisson noise.
Click here to insert in your Google calendar
 
Nov 8 Thu Markus Riedle (University of Manchester) Statistics Seminar
14:00 Introduction to stochastic delay differential equations
Hicks Room K14
  Abstract:
In the last years stochastic functional differential equations or stochastic differential equations with delay have gained increasing attention in several scientific areas such as economy, biology, physics and medicine. The reason can be found in the observation that in a huge variety of models the evolution of the process describing the dynamics in the model under consideration not only depends on the current state of the process but also on its former states. This effect is due to various reasons such as time to maturity, incubation time, time to build, time to transport, hysteresis, delayed feedback and past dependent volatility. In the beginning of the talk we present some of these applications of stochastic functional differential equations. We introduce the basic ideas of ordinary stochastic differential equations not depending on the past and explain how these equations can be generalised to functional equations covering the examples presented before. The fundamental theory of stochastic functional differential equations are introduced and in particular compared with the situation of ordinary stochastic differential equations. In the remaining part of the talk we distinguish several cases how the random noise and past dependence enter the equation and we focus here on asymptotic aspects of the solution. We present some phenomena only known from delay equations. We also introduce some results which explain the relation of functional and partial stochastic differential equations.
Click here to insert in your Google calendar
 
Nov 14 Wed Alexander J McNeil (Heriot-Watt University) Statistics Seminar
14:00 A New Perspective on Archimedean Copulas
Hicks Room K14
  Abstract:
The Archimedean copula family is used in a number of actuarial applications, ranging from the construction of multivariate loss distributions to frailty models for dependent lifetimes. We present some new results that contribute to a greater understanding of this family and point the way to improved simulation and estimation procedures. We derive necessary and sufficient conditions for an Archimedean generator function (a continuous, decreasing mapping of the positive half-line to the unit interval) to generate a copula in a given dimension d. We also show how the Archimedean family coincides with the class of survival copulas of L1-norm symmetric distributions. These results allow us to construct a rich variety of new Archimedean copulas in different dimensions and to solve in principle the problem of generating samples from any Archimedean copula. The practical consequences include new models for negatively dependent risks, simple formulas for rank correlation coefficients and diagnostic tests for Archimedean dependence.
Click here to insert in your Google calendar
 
Nov 22 Thu Qiwei Yao (London School of Economics) Statistics Seminar
14:00 Modelling Multiple Time Series via Common Factors
Hicks Room K14
  Abstract:
We propose a new method for estimating common factors of multiple time series. One distinctive feature of the new approach is that it is applicable to nonstationary time series. The unobservable (nonstationary) factors are identified via expanding the orthoganal complement of the factor loading space step by step; therefore solving a high-dimensional optimization problem by many low-dimensional sub-problems. Asymptotic properties of the estimation were investigated. The proposed methodology was illustrated with both simulated and real data sets.
Click here to insert in your Google calendar
 
Nov 29 Thu Boris Mitavskiy (University of Sheffield) Statistics Seminar
14:00 Complexity of Evaluating the Probability Distribution of State Cycles in Finite State Update Networks
Hicks Room K14
  Abstract:
In many situations in biology (gene interactions, metabolic pathways, etc) and communications (mobile phones, WWW) an appropriate model is provided by a digraph in which the nodes (genes, metabolites, phones, computers) are in various states, and these states are updated (at times $t=0, \, 1, \, 2, \ldots$) as a response to the states of the ``incoming nodes". Assuming synchronous updating then the state of the system as a whole $U(t)$ say is some function of $U(t-1)$. The dynamics of the system (i.e. the sequence of $U(t)$) can then be described by a directed graph over the possible states, where two states $\mathbf{x}$ and $\mathbf{y}$ are joined if $U(t-1)=\mathbf{x}$ implies $U(t)=\mathbf{y}$. Since the system is finite this directed graph consists of a set of cycles, and a set of trees each rooted (the edges of each tree pointing towards the root) on the cycles. There is much known (but little understood) about these dynamics. In this talk I'll introduce a rigorous simplified model of this scenario and study its basic properties with respect to the distribution of cycle lengths. It turns out that the distribution of fixed points is rather straightforward to compute (and it is the uniform distribution regardless of the network topology!) while the distribution of cycles of length $k$ for any fixed $k \geq 2$ is already an NP-hard question with respect to the size of the underlying digraph. I will provide a brief introduction to the theory of NP-completeness which is sufficient to understand the proofs. If time allows, I will also discuss a constant time algorithm to solve the subproblem where the underlying digraph is an $r$-input regular one.
Click here to insert in your Google calendar
 
Feb 7 Thu John Haslett (Dublin Trinity College) Statistics Seminar
14:00 Monotone smoothing: application of a compound Poisson- Gamma process to modelling radiocarbon-dated depth chronologies
Hicks Room K14
  Abstract:
We propose a new and simple continuous Markov monotone stochastic process and use it for Bayesian monotone smoothing. The process is piece-wise linear, based on additive independent Gamma increments arriving in a Poisson fashion. A special case allows very simple conditional simulation of sample paths given known values of the process. We take advantage of a re-parameterisation involving the Tweedie distribution to provide efficient MCMC computation. The motivating problem is the establishment of a chronology for samples taken from lake sediment cores; that is, the attribution of a set of dates to samples of the core given their depths, knowing that the age-depth relationship is monotone. The chronological information arises from radiocarbon (14C) dating at a subset of depths. We use the process to model the stochastically varying sedimentation rate.
Click here to insert in your Google calendar
 
Feb 14 Thu Rita Zapata-Vasquez (University of Sheffield) Statistics Seminar
14:00 Bayesian cost-effectiveness analysis based on a decision analytic model
Hicks Room K14
  Abstract:
The purpose of economic evaluations relating to cost-effectiveness analysis is to provide decision-makers with sufficient evidence to establish the relevance or pertinence of one treatment or strategy over another, or to adjust the results to his/her location of interest. Cost-effectiveness studies based on decision models involve highlighting specific features of previously published studies. However, the lack of evidence or of consistent reports is common in many fields. In medicine this is complicated by the fact that it is ethically unacceptable to implement clinical trials that put patients under a high risk, or because the cost of such trial is not affordable. Apart from the specialized literature, another source of information is that which can be obtained from experts through the use of elicitation. Regardless of the origin, from this knowledge judgements are established to represent the uncertainty of the data through the use of probability distributions. A model for assessing the cost-effectiveness of two management strategies for the treatment of intracranial hypertension in children with severe traumatic brain injury is outlined. Some parts of the model structure will be presented, but I will focus on the way that the uncertainty of the parameters (inputs) of the model were formulated as probability distributions, based on the corresponding judgements. Certain dependence relations among inputs will be shown, and how learning from one aspect may change our beliefs. Further, I will comment on how the dependence can be conceived when cost and effects come from different sources.
Click here to insert in your Google calendar
 
Feb 14 Thu Theresa Cain (University of Sheffield) Statistics Seminar
14:00 Bayesian Inference for health state utilities using pairwise comparison data
Hicks Room K14
  Abstract:
The National Institute for Health and Clinical Excellence (NICE) makes recommendations about which drugs should be available on the NHS. An important part of this decision is performing a cost-effectiveness analysis. When evaluating the cost-effectiveness of a treatment, it is important to consider the quality of life a patient experiences. The quality of life is described by utility, a measure of preference for a particular health condition. Conventional methods of eliciting utilities such as the Standard Gamble and Time Trade-off involve questions that some respondents might find difficult to answer. An alternative method is to collect discrete choice data, in which respondents simply state which health state they prefer from two alternatives, rather than provide actual utilities. The underlying utilities must be determined given these pair-wise choices. We consider Bayesian approaches for inference about population utilities given such pair-wise choice data.
Click here to insert in your Google calendar
 
Feb 28 Thu Michael Papathomas (Imperial College London) Statistics Seminar
14:00 Obtaining proposal distributions for reversible jump MCMC
Hicks Room K14
  Abstract:
A major difficulty when implementing the reversible jump Markov chain Monte Carlo methodology lies in the choice of good proposals for the parameters of the competing statistical models. We focus on the comparison of non-nested log-linear models and present a novel approach for the construction of proposal distributions.
Click here to insert in your Google calendar
 
Mar 13 Thu Robert Gramacy (University of Cambridge) Statistics Seminar
14:00 Importance Tempering
Hicks Room K14
  Abstract:
Simulated tempering (ST) is an established Markov Chain Monte Carlo (MCMC) methodology for sampling from a multimodal density $\pi(\theta)$. The technique involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say $\pi_k(\theta) = \pi(\theta)^k$. Small values of k encourage better mixing, but samples from $\pi$ are only obtained when the joint chain for $(\theta,k)$ reaches k=1. However, the entire chain can be used to estimate expectations under pi of functions of interest, provided that importance sampling (IS) weights are calculated. Unfortunately this method, which we call importance tempering (IT), has tended not work well in practice. This is partly because the most immediately obvious implementation is naïve and can lead to high variance estimators. We derive a new optimal method for combining multiple IS estimators and prove that this optimal combination has a highly desirable property related to the notion of effective sample size. The methodology is applied in two modelling scenarios requiring reversible-jump MCMC, where the naïve approach to IT fails: model averaging in treed models, and model selection for mark--recapture data.
Click here to insert in your Google calendar
 
Apr 10 Thu Oliver Johnson (University of Bristol) Statistics Seminar
14:00 Maximum entropy and Poisson approximation
Hicks Room K14
  Abstract:
I will show that the Poisson distribution maximises entropy in the class of ultra log-concave distributions (a class which includes sums of Bernoulli variables). I will also explain how this result relates to bounds in Poisson and compound Poisson approximation.
Click here to insert in your Google calendar
 
Apr 17 Thu Adam Butler (BioSS Edinburgh) Statistics Seminar
14:00 A latent Gaussian model for compositional data with many zeros
Hicks Room K14
  Abstract:
Compositional data record the relative proportions of different components within a mixture, and arise frequently in many fields, including geology, ecology and human health. Standard statistical techniques for the analysis of such data assume the absence of proportions which are genuinely zero, but real data may contain a substantial number of zero values. In this talk I will present a latent Gaussian model for the analysis of compositional data which contain zero values, based on assuming that the data arise from a (deterministic) Euclidean projection of a multivariate Gaussian random variable onto the unit simplex. A simulation study is used to compare three difference methods of inference - maximum likelihood estimation, MCMC and approximate Bayesian computation - and the methodology is illustrated using real data on dietary intake.
Click here to insert in your Google calendar
 
Apr 24 Thu Leszek Roszkowski (University of Sheffield) Statistics Seminar
14:00 Bayesian Statistics in Cosmology and Particle Physics
Hicks Room K14
  Abstract:
I will describe two recent applications of Bayesian statistics. In one, main features of our Universe are extracted from studies of cosmic background radiation. In the other, current data is used to speculate about properties of ``new physics'' models based on supersymmetry that will soon be tested in particle physics experiments at the Large Hadron Collider (LHC) at CERN near Geneva.
Click here to insert in your Google calendar
 
May 8 Thu Owen Jones (University of Melbourne) Statistics Seminar
14:00 Looking for continuous local martingales
Hicks Room K14
  Abstract:
Continuous local martingales, or equivalently time-changed Brownian motion, are a popular class of models in finance. We present a set of statistical tests for whether or not an observed process is a continuous time-changed Brownian motion, based on the concept of the crossing tree. We apply our methodology to five currency exchange rates---AUD-USD, JPY-USD, EUR-USD, GBP-USD and EUR-GBP---and show that in each case, when viewed at a moderately large time scale, the log-transformed series is consistent with a continuous local martingale model.
Click here to insert in your Google calendar
 
May 22 Thu Neil O'Connell (University of Warwick) Statistics Seminar
14:00 Exponential functionals of Brownian motion and class one Whittaker functions
Hicks Room K14
  Abstract:
Motivated by a problem concerning scaling limits for directed polymers, and recent extensions of Pitman's `2M-X' theorem including an analogue, due to Matsumoto and Yor, for exponential functionals of Brownian motion, we consider (multi-dimensional) Brownian motion conditioned on the asymptotic law of a family of exponential functionals and identify which laws give rise to diffusion processes. For particular families (with a lot of symmetry) these conditioned processes are related to class one Whittaker functions associated with semisimple Lie groups. The work of Matsumoto and Yor corresponds to the group GL(2,R) and the class one Whittaker function in this case is essentially the Macdonald function (or modified Bessel function of the second kind). For the group GL(3,R) many explicit formulae are available for understanding the behaviour of these processes. The directed polymer problem should correspond to the group GL(n,R) and the asymptotics of the corresponding Whittaker functions for large n, but there are significant technical hurdles to overcome before this can be made fully rigourous. This is based on joint work with Fabrice Baudoin.
Click here to insert in your Google calendar
 
Jun 5 Thu David Lucy (University of Lancaster) Statistics Seminar
14:00
Hicks Room K14
Click here to insert in your Google calendar
 
Oct 9 Thu Richard Wilkinson (Sheffield) Statistics Seminar
14:00 Estimating Species Divergence Times Using the Fossil Record
Hicks Room K14
  Abstract:
In this talk I will show how to estimate species divergence times using the fossil record. I will describe how branching process models can be conditioned to contain subtrees originating at a given point in time, and how these can be used to model evolution taking some known phylogenetic structure into account. Inference can be performed using Approximate Bayesian Computation (ABC) and I will describe a hybrid ABC-Gibbs algorithm that can improve the efficiency of the basic ABC algorithm.
Click here to insert in your Google calendar
 
Oct 16 Thu Leo Bastos (University of Sheffield) Statistics Seminar
14:00 Diagnostics for Gaussian Process Emulators
Hicks Room K14
  Abstract:
This work presents some diagnostics to validate and assess the adequacy of a Gaussian process emulator as surrogate for a computer model. These diagnostics are based on comparisons between simulator outputs and Gaussian process emulator outputs for some test data, known as validation data, defined by a sample of simulator runs not used to build the emulator. Our diagnostics take care to account for correlation between the validation data. In order to illustrate a validation procedure, these diagnostics are applied to two different data sets.
Click here to insert in your Google calendar
 
Oct 16 Thu Tom Fricker (University of Sheffield) Statistics Seminar
14:00 Prior specification in Gaussian process emulators: What do we mean by the mean?
Hicks Room K14
  Abstract:
When building an emulator for a computer model, we treat the model output as an unknown deterministic function of the inputs. The data we have are observations of the computer model output at a number of input points, and our task is to make inference about the function using this noiseless data. We use a semiparametric regression model, a priori describing the function as the sum of a parametric mean function and a zero-mean Gaussian process. Often in past a very basic regression function has been used for the mean (either constant or linear in the inputs), and most of the effort has been spent in correctly specifying the Gaussian process to model the residuals. However, in some quarters it is believed that we should attempt to build more prior information about the computer model into the emulator via the mean function. But individual realisations of a zero-mean Gaussian process do not necessarily have a mean value of zero, so what exactly is meant when we talk about `the prior mean' of the model? How far should we go in the mean function's complexity? What happens if we overfit it? And does this extra effort actually improve the emulator's predictions of the computer model? In this talk I shall use some very simple toy examples to explore these questions (but without necessarily offering any answers...)
Click here to insert in your Google calendar
 
Nov 6 Thu Mark Steel (University of Warwick) Statistics Seminar
14:00 Time-Dependent Stick-Breaking Processes
Hicks Room K14
  Abstract:
This paper considers the problem of defining a time-dependent nonparametric prior. A recursive construction allows the definition of priors whose marginals have a stick-breaking form. The processes with Poisson-Dirichlet and Dirichlet process marginals have interesting interpretations that are further investigated. We develop a general conditional MCMC method for inference in a wide subclass of these models. We derive a Polya urn scheme type representation of the Dirichlet process construction. This allows us to develop a marginal MCMC method for this case. The result section shows the relative performance of the two MCMC schemes for the Dirichlet process case and looks at two data examples.
Click here to insert in your Google calendar
 
Nov 13 Thu Dan Crisan (Imperial College) Statistics Seminar
14:00 Sequential Monte Carlo methods - a theoretical perspective
Hicks Room K14
  Abstract:
The aim of the talk is to present a bird's-eye view of sequential Monte carlo methods (including the SIR algorithm and branching algorithms) with emphasis on classical convergence results. Additionally, some recent uniformly convergent particle filters will be discussed. The second part of the talk is based on joint work with K. Heine (see http://www.ma.ic.ac.uk/~dcrisan/crihei2.pdf for details)
Click here to insert in your Google calendar
 
Nov 20 Thu Martin Hairer (University of Warwick) Statistics Seminar
14:00 A weak form of Harris's theorem
Hicks Room K14
  Abstract:
Harris' theorem gives easily verifiable conditions for a Markov operator to have a spectral gap in a weighted supremum norm. We are going to show a new elementary proof of this result. This proof can then be generalised to situations where Harris' theorem fails in order to prove a 'weak' form of it. The range of possible applications includes a number of stochastic PDEs and stochastic delay equations.
Click here to insert in your Google calendar
 
Nov 27 Thu Jon Pitchford (University of York) Statistics Seminar
14:00 Is there something fishy about Lévy processes?
Hicks Room K14
  Abstract:
Lévy flights are loosely defined as random walks in which the step lengths are drawn from some underlying power law distribution. In biology, detecting Lévy-like behaviour is worryingly fashionable and interestingly controversial. Do Lévy flights really occur? If so, then why have they evolved? I will discuss possible answers to these questions, arguing that there may be a role for more general Lévy processes in biology and ecology. I will draw on two examples from my recent research: superspreading in epidemics, and stochastic foraging in patchy environments.
Click here to insert in your Google calendar
 
Dec 4 Thu Mike Campbell (University of Sheffield) Statistics Seminar
14:00 A statistician on a NICE appraisals committee
Hicks Room K14
  Abstract:
NICE stands for the National Institute for Health and Clinical Excellence. The speaker has been on a NICE Appraisals committee for 7 years. He will describe what the committee does and how NICE makes decisions. Much of the evidence to NICE is statistical and a statistician is an important member of the committee. A number of roles for a statistician will be described. One role is checking for errors and he will describe some he has come across.
Click here to insert in your Google calendar
 
Dec 18 Thu George Streftaris (Heriot-Watt University) Statistics Seminar
14:00 Bayesian inference for stochastic epidemic models with non-exponential tolerance to infection
Hicks Room K14
  Abstract:
The transmission dynamics of an infectious disease during the outbreak of an epidemic can be stochastically described through a time-inhomogeneous Poisson process, thus assuming exponentially distributed levels of disease tolerance, following the so-called Sellke (1983) construction. In this talk I will present generalisations of the Sellke structure under the susceptible-exposed-infectious-removed (SEIR) class of epidemic models, and focus on a model with Weibull individual tolerance thresholds. Examples of simulated and real epidemic data are discussed, where inference is carried out using MCMC methods following a Bayesian approach to tackle the issue of the partial observation of the temporal course of the epidemic. The adequacy of the models is assessed using methodology based on the properties of Bayesian latent residuals, demonstrating problems with more commonly used model checking techniques.
Click here to insert in your Google calendar
 
Feb 12 Thu Lindsay Collins (Sheffield) Statistics Seminar
14:00 Climate variability and its effect on atmosphere/terrestrial-biosphere carbon fluxes
Hicks LT7
  Abstract:
In my PhD I will study the effect of climate uncertainty and variability on vegetation carbon dynamics. Our interest in the terrestrial biosphere lies in the carbon that is released into the atmosphere or stored in the soil through the land vegetation. The Sheffield Dynamic Global Vegetation Model (SDGVM) simulates the terrestrial vegetation processes (including photosynthesis and respiration) and provides estimates of terrestrial carbon fluxes. The SDGVM is driven by monthly climate data. The monthly data are downscaled to daily data within the SDGVM using a weather generator so that the vegetation processes can be calculated daily. I will show how temporal variability leads to differing carbon flux estimates. We aim to quantify the uncertainty in the carbon flux estimates directly linked to uncertainty and variability in the climate data using probabilistic sensitivity analysis (PSA) methods developed by Oakley and O'Hagan (2004) making use of the GEM-SA software developed by Kennedy (2004) for working with complex models such as the SDGVM. I will show how the form of the climate data makes the use of this software less than straightforward and introduce methodology by which a PSA may be possible. This will involve the characterisation of the uncertainty in the climate in terms of parameters that can be used as input to GEM-SA rather than actual data.
Click here to insert in your Google calendar
 
Feb 12 Thu Lu Zou (Sheffield) Statistics Seminar
14:00 Multiple Imputations of Bio-Datasets
Hicks LT7
  Abstract:
This presentation will start with a brief introduction to two Bio-datasets involved in my study. One inevitable issue is that many values are missing in both sets. Rather than ignoring them, imputation is considered. This talk will focus on the imputation of continuous variables which are to be used as Biomarkers in two situations: i) normal randomly missing situation and ii) a 'File-matching' situation. Several imputation methods are considered: for single imputation, the K-Nearest Neighbours method (KNN) and the EM-algorithm are studied; for multiple imputations, the Multiple Imputation using Additive Regression, Bootstrapping and Predictive Mean Matching (PMM) and the EM imputation combined with re-sampling methods are investigated. Based on the studies so far, the EM algorithm is relatively more suitable in my case.
Click here to insert in your Google calendar
 
Feb 19 Thu Andrew Stuart (University of Warwick) Statistics Seminar
14:00 Metropolis-Hastings Methods for Sampling Random Functions
Hicks LT7
  Abstract:
Many applied problems require the practitioner to obtain information from a probability measure on functions. Examples include signal processing, weather prediction, oceanography, nuclear waste management and oil recovery. I will show that, despite the wide variety of physical phenomena underlying these examples, there is a common mathematical structure which can be exploited in a number of ways. I will highlight how this structure can be used to design efficient MCMC methods to sample from the desired probability measure, generalizing random walk and other Metropolis-Hastings methods to the function space setting.
Click here to insert in your Google calendar
 
Feb 26 Thu Mike Titterington (Glasgow) Statistics Seminar
14:00 Approximate inference for latent variable models
Hicks LT7
  Abstract:
Likelihood and Bayesian inference are not straightforward for latent variable models, of which mixture models constitute a special case.. For instance, in the context of the latter approach, conjugate priors are not available. The talk will consider some approximate methods that have been developed mainly in the machine-learning literature and will attempt to investigate their statistical credentials. In particular, so-called variational methods and the Expectation-Propagation method will be discussed. It will be explained that, in the Bayesian context, variational methods tend produce approximate posterior distributions that are located in the right place but are too concentrated, whereas the Expectation-Propagation approach sometimes, but not always, gets the degree of concentration, as measured by posterior variance, right as well.
Click here to insert in your Google calendar
 
Mar 5 Thu David Leslie (Bristol) Statistics Seminar
14:00 Posterior weighted reinforcement learning with state uncertainty
Hicks LT7
  Abstract:
Reinforcement learning models are, in essence, online algorithms to estimate the expected reward in each of a set of states by allocating observed rewards to states and calculating averages. Generally it is assumed that a learner can unambiguously identify the state of nature. However in any natural environment the state information is noisy, so that the learner cannot be certain about the current state of nature. Under state uncertainty it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a particular state of the environment. A new technique, posterior weighted reinforcement learning, is introduced. In this process the reinforcement learning updates are weighted according to the posterior state probabilities, calculated after observation of the reward. We show that this modified algorithm can converge to correct reward estimates, and show the procedure to be a variant of an online expectation-maximisation algorithm, allowing further analysis to be carried out.
Click here to insert in your Google calendar
 
Mar 12 Thu Gareth Roberts (Warwick) Statistics Seminar
14:00 Retrospective sampling
Hicks LT7
  Abstract:
This talk will discuss a very simple idea for simulation called retrospective sampling. The method can be applied in the context of many well-used simulation methods such as rejection sampling and MCMC. A number of very simple examples will be described to illustrate the ideas. As time permits, I will give some applications, possibly including exact simulation of diffusion paths and posterior distributions for Dirichlet mixture models.
Click here to insert in your Google calendar
 
Mar 26 Thu Simon Wilson (Trinity College Dublin) Statistics Seminar
14:00 Factor Analysis with a Mixture of Gaussian Factors, with Application to Separation of the Cosmic Microwave Background
Hicks LT7
  Abstract:
Blind source separation is a technique in signal processing where the values of 'sources' are inferred from observations that are linear combinations of them. The typical example is separating two voices (the sources) from a stereo audio recording (each microphone picks up a combination of the two speakers' voices). Both the sources and the matrix of linear 'mixing' coefficients may be unknown. In statistical terms, it is an example of factor analysis, the main difference being that the 'factors' here will have some interpretation and there may exist useful prior information on them. $~~$ Here we describe an approach to factor analysis/source separation where the sources are assumed to be Gaussian mixtures, which may be independent or dependent e.g. mixtures of multivariate Gaussians. An MCMC procedure has been developed that implements a fully Bayesian procedure e.g. it computes the posterior distribution of sources, their Gaussian mixture parameters and the matrix of linear coefficients from the data. $~~$ The method is applied to recovery of the cosmic microwave background (CMB), being an example of source separation applied to image data. The CMB is one of many sources of extraterrestrial microwave radiation and we observe a weighted sum of these sources from the Earth at different frequencies. Its accurate reconstruction is of great interest to astronomers and physicists since knowledge of its properties, and in particular its anisotropies, will place strong restrictions on current cosmological theories. From the perspective of a Bayesian solution, this application is interesting as there is considerable prior information about the linear coefficients and the sources. Results from the analysis of data from the WMAP satellite will be presented, where microwave radiation is observed at 5 frequencies and separated into sources, including the CMB. A discussion of the many outstanding issues in this problem is also presented.
Click here to insert in your Google calendar
 
Mar 26 Thu Peter Goos (Antwerp) Statistics Seminar
16:00 The optimal design of conjoint choice experiments
Hicks LT5
  Abstract:
Stated preference data are commonly collected by means of conjoint choice experiments or discrete choice experiments in marketing, health economics or environmental economics. The optimal design of these experiments is a challenging research area because of the nonlinearity of the statistical models used to analyze the data. These models include the conditional logit model, the mixed logit model and the nested logit model. In this talk, I will discuss recent advances in the optimal design for such models as well as some of the challenging computational aspects of the optimal design search.
Click here to insert in your Google calendar
 
Apr 2 Thu Philip Jonathan (Shell Technology Centre Thornton) Statistics Seminar
14:00 Modelling spatial and directional effects in extreme value analysis
Hicks LT7
  Abstract:
The characteristics of extreme waves in storm-dominated regions vary systematically with a number of covariates, including location and storm direction. Reliable estimation of the magnitude of extreme events associated with a given return period requires incorporation of covariate effects within extreme value models. A spatio-directional extremes model will be outlined, based on a non-homogeneous Poisson model of peaks over threshold. At each location, a non-parametric estimate for extreme threshold as a function of storm direction is made. The rate of occurrences of threshold exceedences is modelled as a Poisson process. The size of threshold exceedences is modelled using a generalised Pareto form, the parameters of which vary smoothly in space, and are estimated using a roughness penalised likelihood approach using thin plate splines. The approach will be motivated and illustrated in application to estimation of structural design criteria for the Gulf of Mexico.
Click here to insert in your Google calendar
 
Apr 23 Thu Goran Peskir (Manchester) Statistics Seminar
14:00 The British Put-Call Symmetry
Hicks LT7
  Abstract:
I will review recent results/problems arising in the British pricing mechanism. This involves optimal stopping with non-monotone free boundaries.
Click here to insert in your Google calendar
 
Apr 23 Thu Gennady Samorodnitsky (Cornell) Statistics Seminar
15:30 The 2009 Applied Probability Trust Lecture\\ Large deviations for point processes based on stationary sequences with heavy tails
Hicks LT7
  Abstract:
In many applications involving functional large deviations for partial sums of stationary, but not iid, processes with heavy tails, a curious phenomenon arises: closely grouped together large jumps coalesce together in the limit, leading to loss of information of the order in which these jumps arrive. In particular, many functionals of interest become discontinuous. To overcome this problem we move from the functional large deviations to the point-process-level large deviations. We develop the appropriate topological framework and prove large deviations theorems for point processes based on stationary sequences with heavy tails. We show that these results are useful in many situations where functional large deviations are not.
Click here to insert in your Google calendar
 
Apr 30 Thu Svetlana Tishkovskaya (Sheffield) Statistics Seminar
14:00 Optimal Quantisation in Bayesian Estimation
Hicks LT7
  Abstract:
I consider Bayesian estimation of a parameter of a continuous distribution when observation space is quantised. Quantisation, as method of approximating a continuous range of values by a discrete set, arises in many practical situations which include modern methods of digital information processing, data compression, and some procedures of collecting data. It is well known that quantising of observations reduces values of convex information functionals. This information loss can be diminished by selecting the optimal partition. I consider two criteria of optimal quantisation in Bayesian estimation: the criterion of Bayes risk minimum and the criterion of minimum of information loss measured using Shannon information. As alternative to optimal partitioning, which realisation is often computationally demanding, an asymptotically optimal quantisation is also considered.
Click here to insert in your Google calendar
 
May 7 Thu Kevin Walters (Sheffield) Statistics Seminar
14:00 Are colonic stem cell data consistent with the immortal model of stem cell division under non-random strand segregation?
Hicks LT7
  Abstract:
Stem cells have the potential to revolutionize modern medicine with their regenerative potential however little is known about tissue stem cell differentiation in-vivo. Technical advances in laboratory methods have started to provide data that allow us to make simple inferences about tissue stem cell behaviour. This talk will focus on a particular model of stem cell differentiation.
Click here to insert in your Google calendar
 
May 14 Thu Erika Hausenblas (Saltzburg) Statistics Seminar
14:00 Stochastic Partial Differential Equations driven by Poisson Random Measure
Hicks LT7
  Abstract:
I will start with pointing out some examples coming from physics to motivate stochastic partial differential equations (SPDEs). Then I will briefly explain the differences in the dynamics between deterministic partial differential equations and SPDEs. After this motivation I will speak about stochastic integration in Banach spaces and point out the differences with the stochastic integral with respect to the Wiener process. Finally, I give some results concerning SPDEs drive by Poisson random measures.
Click here to insert in your Google calendar
 
Jun 4 Thu Katy Klauenberg (Sheffield) Statistics Seminar
14:00 Statistical Modelling for Dating Ice Cores
Hicks LT7
  Abstract:
In ice cores which are drilled through ice sheets in polar regions valuable information about past environment and climate are preserved. A pivotal part of interpreting the information held within the cores is to build ice core chronologies i.e. to relate time to depth. Existing dating methods can be categorised as follows: (1) layer counting using the seasonality in signals, (2) glaciological modelling describing processes such as snow accumulation and plastic deformation of ice, (3) comparison with other dated records, or (4) any combination of these. Conventionally, implementation of these approaches does not use statistical methods. We combine glaciological models with a Bayesian framework. For this purpose, the sources of uncertainty in the glaciological model and the knowledge about these are formalised. Additionally, we include information from layer counting and other dated records (i.e. traces from volcanic eruptions) to constrain the resulting dating. During the talk the setup of this statistical model will be described, the effect of uncertainty in the glaciological model will be demonstrated and the interplay with information from other dating methods will be illustrated. This combined statistical dating approach is applied to date Antarctic ice cores. For the first time the effects of uncertainty implied by the dating method are investigated for ice core chronologies, which provides valueable insights for the applied community.
Click here to insert in your Google calendar
 
Oct 1 Thu Jeremy Oakley (Sheffield) Statistics Seminar
14:00 Eliciting Probability Distributions
Hicks LT6
  Abstract:
Elicitation is the process of extracting expert knowledge about some unknown quantity of interest and representing that knowledge with a suitable probability distribution. It is an important component of Bayesian inference, risk analysis, and decision-making in the presence of uncertainty. In this talk I will give an introduction to the field and discuss some current research interests, including nonparametric elicitation, the trial roulette method, and SHELF: the Sheffield Elicitation Framework.
Click here to insert in your Google calendar
 
Oct 8 Thu Nathan Green (Dstl Porton Down) Statistics Seminar
14:00 Determining the Source of a Hazardous Atmospheric Release
Hicks LT6
  Abstract:
A methodology is explored for making inference about parameters of a hazardous atmospheric release from sensor readings. The key difficulty in performing this inference is that the results must be obtained in a very short timescale (5 min) to make use of the inference for protection. The methodology that has been developed uses some of the components in a sequential Monte Carlo algorithm. However, this inference problem is different from many other sequential Monte Carlo problems, in that there are no state evolution equations, the forward model is highly non-linear and the likelihoods are non-Gaussian. Results for inferences made of atmospheric releases (both real and simulated) of material will be presented, demonstrating that the sampling scheme performs adequately despite constraints of a short time span for calculations. Potential future developments and issues will also be discussed to show areas of future research interest.
Click here to insert in your Google calendar
 
Oct 22 Thu Tim Heaton (Sheffield) Statistics Seminar
14:00 Reconstructing a Wiener process from observations at imprecise times: Bayesian radiocarbon calibration
Hicks LT6
  Abstract:
For accurate radiocarbon dating, it is necessary to identify fluctuations in the level of radioactive carbon 14C present in the atmosphere through time. The processes underlying these variations are not understood and so a data-based calibration curve is required. In this talk we present a novel MCMC approach to the production of the inter- nationally agreed curve and the individual challenges involved. Our methodology models the calibration data as noisy observations of a Wiener process and updates sample paths through use of a Metropolis-within-Gibbs algorithm. Implementation of this algorithm is complicated by certain specific features of the data used, namely that many data points: • relate to the mean of the Wiener process over a period of time rather than at a specific point, • have calendar dates found using methods (e.g. Uranium-Thorium) which are themselves uncertain, • have ordering constraints and correlations in their calendar date uncertainty - for example data are sampled along the same core or have floating calendar dates matched to another sample for which the calendar age is more accurately known. We give an overview of these issues and discuss their implications for the resulting sampler.
Click here to insert in your Google calendar
 
Oct 29 Thu Jianxin Pan (Manchester) Statistics Seminar
14:00 Modelling of Mean-Covariance Structures for Longitudinal Data
Hicks LT6
  Abstract:
It is well known that when analysing longitudinal data, misspecification of covariance structures may lead to very inefficient or even biased estimators of parameters in the mean structure. Covariance structures, like the mean, can be modelled using linear or nonlinear regression models techniques. Various estimation methods have been recently developed for modelling of mean and covariance structures, simultaneously. In this talk, I will introduce such methods on modelling of mean-covariance structures for longitudinal data, including linear and non-linear regression models, variable selection, semiparametric models, etc. Real examples and simulation studies will be presented for illustration.
Click here to insert in your Google calendar
 
Nov 5 Thu Stanislav Volkov (Bristol) Statistics Seminar
14:00 The simple harmonic urn
Hicks LT6
  Abstract:
The simple harmonic urn is a discrete-time stochastic process on Z2 approximating the phase portrait of the harmonic oscillator using very basic transitional probabilities on the lattice, incidentally related to the Eulerian numbers. This urn which we consider can be viewed as a two-colour generalized Polya urn with negative-positive reinforcements, and in a sense it can be viewed as a "marriage" between the Friedman urn and the OK Corral model, where we restart the process each time it hits the horizontal axes by switching the colours of the balls. We show the transience of the process using various couplings with birth and death processes and renewal processes. It turns out that the simple harmonic urn is just barely transient, as a minor modification of the model makes it recurrent. We also show links between this model and oriented percolation, as well as some other interesting processes. This is joint work with Edward Crane, Nicholas Georgiou, Rob Waters and Andrew Wade.
Click here to insert in your Google calendar
 
Nov 12 Thu Vassili Kolokoltsov (Warwick) Statistics Seminar
14:00 SDEs driven by nonlinear Levy noise with application to the construction of Markov processes with a given generator
Hicks LT6
Click here to insert in your Google calendar
 
Nov 26 Thu David Sexton (The Met Office) Statistics Seminar
14:00 Making probabilistic climate projections for the UK
Hicks LT6
  Abstract:
UKCP09, the latest set of climate projections for the UK were released on June 18th 2009. For the first time the climate projections for the UK are probabilistic, so that it is an appropriate tool for people who are taking a risk-based approach to policy and decision making. I will describe how the probabilities were estimated using a) a combination of a number of climate model ensembles which explore parameter uncertainty in different components of the Earth System b) a set of international climate models other than the Met Office Hadley Centre model and c) a Bayesian framework which combines this climate model output with observations to provide probabilities that are relevant to the real world and therefore relevant to risk-based decision making. I will also outline the main areas of the production system that could benefit from further research into statistical methods and better experimental design.
Click here to insert in your Google calendar
 
Dec 3 Thu David Percy (Salford) Statistics Seminar
14:00 Predictive elicitation of subjective prior distributions
Hicks LT6
  Abstract:
This seminar tackles the problem of specifying subjective prior distributions for unknown model parameters. We first review strategies for selecting families of priors for common models, including univariate and multivariate probability distributions, generalized linear models and stochastic processes. We then consider methods for evaluating the hyperparameters of these prior distributions. Specifically, we focus on predictive elicitation using quantiles and cumulative probabilities, illustrating the natural beauty and philosophical benefits of this approach. We discuss problems relating to inherent constraints and computational difficulties, and conclude that some compromise is necessary. We illustrate the technique in applications from sport, medicine and industry.
Click here to insert in your Google calendar
 
Dec 10 Thu Lesley Morrell (Leeds) Statistics Seminar
14:00 Modelling the Selfish Herd: Behavioural mechanisms for aggregation in animals
Hicks LT6
  Abstract:
The theory of the selfish herd (WD Hamilton, 1971) has been highly influential to our understanding of animal aggregation. Hamilton proposed that in order to reduce its risk of predation, an individual should approach its nearest neighbour, reducing its risk at the expense of those around it. Despite extensive empirical support, the selfish herd hypothesis has been criticized on theoretical grounds: approaching the nearest neighbour does not result in the observed dense aggregations, and the nearest neighbour in space is not necessarily the one that can be reached fastest. To combat these problems, increasingly complex movement rules have been proposed, successfully producing dense aggregations of individuals, yet various questions remain unanswered. Is one movement rule always the most successful? How to ecological parameters such as the size and density of the group affect rule success? Is the behaviour of the predator important? Should all individuals within a group use the same rule, or should they adjust their behaviour based on where in the group they are, or in response to the behaviour of others? We use simulation models of animal groups to investigate these questions, and demonstrate that there is no rule that performs best under all circumstances: the ecology of the predator and prey are both key in determining how animals should respond to a predation attempt.
Click here to insert in your Google calendar
 
Dec 17 Thu Ben Youngman (Sheffield) Statistics Seminar
14:00 Modelling phenomena using different data sources
Hicks LT6
  Abstract:
The building of structures requires that their strength be sufficient to withstand day-to-day wear and tear but also, ideally, all levels of extreme punishment. Yet in practice economical grounds require that some trade-off between strength and susceptibility to damage be made to avoid costs spiralling. As it is logical to expect that the largest events will be most damaging, there is therefore motivation to estimate the distribution of extremes by, for example, estimating the probability of exceeding a certain high level. This is a typical problem in extremal analyses. More recently this problem has been extended by seeking estimates of extremal distributions over space, which is the topic of this talk, though here matters will be further complicated by spatio-temporally sparse data. To try to combat this, data obtained via different methods, yet in theory quantifying the same phenomenon, will be modelled simultaneously. Extreme value theory will be drawn upon to tackle this problem. This talk begins with an introduction to the topic and progresses by applying some ideas discussed.
Click here to insert in your Google calendar
 
Dec 17 Thu Afzalina Azmee (Sheffield) Statistics Seminar
14:00 Two-stage testing in three-arm non-inferiority trials
Hicks LT6
  Abstract:
The aim of a non-inferiority trial is to show that the new experimental treatment is not worse than the reference treatment by more than a certain, pre-defined margin. We consider the design of a 3-arm non-inferiority trial, where the inclusion of a placebo group is permissible. The widely used 3-arm non-inferiority procedure was authoritatively first described by Pigeot et al. (2003), which involved establishing superiority of reference against placebo in the first stage before testing non-inferiority of experimental against reference in the second stage. If this preliminary test fails, the second-stage test has to be abandoned. In such an eventuality, we believe the whole study will be wasted as nothing new could be learnt about the new experimental treatment. Therefore, instead of showing superiority in the first stage, we propose that the reference treatment has to be significantly different than placebo as a pre-requisite before using Fieller's confidence interval to assess non-inferiority. This procedure leads to no peculiar intervals (i.e. exclusive or imaginary) and offers easy interpretation regarding the efficacy of experimental and reference treatments.
Click here to insert in your Google calendar
 
Feb 11 Thu Jonathan Jordan (Sheffield) Statistics Seminar
14:00 Geometric preferential attachment graphs
Hicks K14
  Abstract:
Preferential attachment (or "scale-free") random graphs, in which a growing network develops by new vertices attaching preferentially to existing vertices which already have a high degree, were proposed, originally by Barabási and Albert, as models for networks appearing in a wide range of contexts (including biological, technological and social) in which examination of data often reveals an approximately power law distribution of vertex degrees. It was rigorously shown by Bollobás at al that preferential attachment graphs did indeed have this property. In many of the contexts in which random graph models are used it makes sense for the vertices to have some location in space. The original preferential attachment model has no spatial element, and in this talk I will describe a model which combines a preferential attachment element with a spatial element. I will describe results which show that under certain conditions on the spatial element the power law degree property is retained. I intend that most of the talk should be accessible to an applied audience, though there will be a few slides discussing my proof method.
Click here to insert in your Google calendar
 
Feb 18 Thu Vincent Macaulay (Glasgow) Statistics Seminar
14:00 Inference about past human migration episodes from modern DNA data
Hicks K14
  Abstract:
One view of human prehistory is of a set of punctuated migration events across space and time, associated with settlement, resettlement and discrete phases of immigration. It is pertinent to ask whether the variability that exists in the DNA sequences of samples of people living now, something which can be relatively easily measured, can be used to fit and test such models. Population genetics theory already makes predictions of patterns of genetic variation under certain very simple models of prehistoric demography. In this presentation I will describe an alternative, but still quite simple, model designed to capture more aspects of human prehistory of interest to the archaeologist, show how it can be rephrased as a mixture model, and illustrate the kinds of inferences that can be made on a real data set, taking a Bayesian approach.
Click here to insert in your Google calendar
 
Feb 25 Thu Mark Broom (Sussex) Statistics Seminar
14:00 Models of evolution on structured populations with asymmetry
Hicks I19
  Abstract:
We investigate two examples of models of populations with structure, involving asymmetry. These are different in character, with the common theme that both the structure and the asymmetry have an important influence on population outcomes. The first part of the talk concerns the study of evolutionary dynamics on populations with some non-homogeneous structure, a topic in which there is a rapidly growing interest. We investigate the case of non-directed equally weighted graphs and find solutions for the fixation probability of a single mutant in two classes of simple graphs. This process is a Markov chain and we prove several mathematical results. For example we prove that for all but a restricted set of graphs, (almost) all states are accessible from the possible initial states. To find the fixation probability of a line graph we relate this to a two-dimensional random walk which is not spatially homogeneous. We investigate our solutions numerically and find that for mutants with fitness greater than the resident, the existence of an asymmetric population structure helps the spread of the mutants. Thus it may be that models assuming well-mixed populations consistently underestimate the rate of evolutionary change. In the second part we consider a model of kleptoparasitism, the stealing of food from one animal by another. The handling process of food items can take some time and the value of such items can vary depending upon how much handling an item has received. Furthermore this information may be known to the handler but not the potential challenger, so there is an asymmetry between the information possessed by the two competitors. We use game-theoretic methods to investigate the consequences of this asymmetry for continuously consumed food items, depending upon various natural parameters. A variety of solutions are found, and there are complex situations where three possible solutions can occur for the same set of parameters. It is also possible to have situations which involve members of the population exhibiting different behaviours from each other. We find that the asymmetry of information often appears to favour the challenger, despite the fact that it possesses less information than the challenged individual.
Click here to insert in your Google calendar
 
Mar 4 Thu Jonty Rougier (Bristol) Statistics Seminar
14:00 Uncertainty and Risk in Natural Hazards
Hicks K14
  Abstract:
In natural hazards (volcanoes, earthquakes, floods etc) it is useful for modelling purposes to make a distinction between aleatory and epistemic uncertainty, where the former represents the inherent or natural uncertainty of the hazard, and the latter represents everything else. Natural hazards scientists are often reluctant to quantify epistemic uncertainty with probability, due in a large part to its subjective nature. But this challenge should be weighed against the additional problems that non-quantified uncertainty create for the risk manager and the policymaker. This talk explores these issues in the light of the recent NERC scoping study on natural hazards uncertainty and risk.
Click here to insert in your Google calendar
 
Mar 11 Thu John Aston (Warwick) Statistics Seminar
14:00 Using Functional Principal Component Analysis and Mixed Effect Models to Analyse Spoken Language
Hicks I19
  Abstract:
Fundamental frequency (F0, broadly ``pitch'') is an integral part of spoken human language; however, a comprehensive quantitative model for F0 can be a challenge to formulate due to the large number of effects and interactions between effects that lie behind the human voice's production of F0, and the very nature of the data being a contour rather than a point. A semi-parametric functional response model for F0 will be formulated by incorporating linear mixed effects models through the functional principal component scores. This model is applied to the problem of modelling F0 in the tone languages such as Mandarin and Qiang (a dialect from China), languages in which relative pitch information is part of each word's dictionary entry.
Click here to insert in your Google calendar
 
Mar 18 Thu Norman Fenton (Queen Mary) Statistics Seminar
14:00 Uncertainty, Risk and Decision Making
Hicks K14
  Abstract:
Current approaches to uncertain reasoning and risk assessment are often fundamentally flawed. Motivated by real examples from the law and medicine (including a murder trial and a medical negligence trial in which I was an expert witness), I will explain how such flawed reasoning can be avoided by adopting a Bayesian approach. I will introduce the notion of subjective probability and Bayes theorem and argue that this is the only rational approach for handling uncertainty. The problem with this approach is how to scale it up to complex risk assessment problems involving many causally related factors. I will introduce the notion of Bayesian nets and show how they address this problem. I will demonstrate how we have used Bayesian nets in a range of real applications including in legal arguments, medical risk assessment, and software risk assessment.
Click here to insert in your Google calendar
 
Apr 15 Thu John McColl (Glasgow) Statistics Seminar
14:00 Assessment and Feedback in Statistics Courses
Hicks K14
  Abstract:
Giving useful feedback to students about their work ought to be an integral part of the teaching, learning and assessment process, so that learners know where they went wrong and what they can do to improve in the future. In the National Student Survey, student ratings of assessment and feedback are generally less favourable than those for other aspects of their experience, suggesting that this is an area in which UK Higher Education needs to improve. Up till now, there has been little discussion about how best to produce effective feedback for the different assessment methods used in modern Statistics courses. This talk will summarise the characteristics of effective feedback, as described in the research literature, and will indicate how these guidelines can be applied to the assessment of analysis-of-data tasks in Statistics courses. We will then present results from a small study of students in one Statistics course at the University of Glasgow in two conditions, one where feedback was given 'as usual' and the other where feedback was given in accordance with the principles of effective feedback. Finally, we will introduce a freely available, web-based quiz system which has been designed to give tailored feedback to multiple choice questions in a Statistics setting.
Click here to insert in your Google calendar
 
Apr 22 Thu Piotr Fryzlewicz (London School of Economics) Statistics Seminar
14:00 Thick-pen transformation for time series
Hicks K14
  Abstract:
Traditional visualisation of time series data often consists of plotting the time series values against time and "connecting the dots". We propose an alternative, multiscale visualisation technique, motivated by the scale-space approach in computer vision. In brief, our method also "connects the dots", but uses a range of pens of varying thicknesses for this purpose. The resulting multiscale map, termed the Thick-Pen Transform (TPT) corresponds to viewing the time series from a range of distances. We formally prove that the TPT is a discriminatory statistic for two Gaussian time series with distinct correlation structures. Further, we show interesting possible applications of the TPT to measuring cross-dependence in multivariate time series, and to testing for stationarity. In particular, we derive the asymptotic distribution of our test statistic, and argue that the test is applicable to both linear and nonlinear processes under low moment assumptions. Various other aspects of the methodology, including other possible applications, are also discussed.
Click here to insert in your Google calendar
 
Apr 29 Thu Andrew Wade (Strathclyde) Statistics Seminar
14:00 Non-homogeneous random walks with asymptotically zero drifts
Hicks K14
  Abstract:
For this talk a random walk is a discrete-time time-homogeneous Markov process on d-dimensional Euclidean space. If such a random walk is spatially homogeneous, its position can be expressed as a sum of independent identically distributed random vectors. Such homogeneous random walks are classical and the literature devoted to their study extensive, particularly when the state-space is the d-dimensional integer lattice. The most subtle case is when the mean drift (i.e., average increment) of the walk is zero. The assumption of spatial homogeneity, while simplifying the mathematical analysis, is not always realistic for applications. Thus it is desirable to study non-homogeneous random walks. As soon as the spatial homogeneity assumption is relaxed, the situation becomes much more complicated. Even in the zero-drift case, a non-homogeneous random walk can behave completely differently to a zero-drift homogeneous random walk, and can be transient in two dimensions, for instance. Such potentially wild behaviour means that results for non-homogeneous random walks often have to be stated under rather restrictive conditions, and techniques from the study of homogeneous random walks are difficult to apply. I will give an introduction to some of the known results on non-homogeneous random walks with asymptotically zero mean-drift, that is, the magnitude of the drift at a point tends to 0 as the distance of that point from the origin tends to infinity. It turns out that this is the natural regime in which to look for important phase transitions in asymptotic behaviour. This includes work by Lamperti in the 1960s on recurrence/transience behaviour. I will also discuss recent joint work with Iain MacPhee and Mikhail Menshikov (Durham) concerned with angular asymptotics, i.e., exit-from-cones problems. We show that, in contrast to recurrence/transience behaviour, the angular properties of non-homogeneous random walks are remarkably well-behaved in some sense in the asymptotically zero drift regime.
Click here to insert in your Google calendar
 
May 6 Thu Andy Wood (Nottingham) Statistics Seminar
14:00 Fractals, self-similarity and the estimation of fractal dimension: a statistical perspective.
Hicks K14
  Abstract:
The first part of the talk will give an elementary introduction to fractals, and will include discussion of what they are, some of the various ways in which they can arise and why they are of interest. Relevant concepts such as self-similarity will also be explained. The second part of the talk will briefly discuss statistical estimation of the dimension of a random fractal generated as a realisation of a suitable continuous-time stochastic process, which is observed on a finite grid. The estimation of fractal dimension is of theoretical and practical interest in a number of contexts. The asymptotic framework relevant here is "infill" asymptotics, and the limit theory for fractal dimension estimators in this setting can be quite non-standard.
Click here to insert in your Google calendar
 
May 13 Thu Philip O'Neill (Nottingham) Statistics Seminar
14:00 Stochastic models and data analysis for healthcare associated infections
K14
  Abstract:
Antibiotic resistant pathogens such as MRSA and VRE are of considerable importance in healthcare settings in terms of both clinical and economic impact. In this talk we describe analyses of highly detailed datasets taken from hospital studies looking at, among other things, the effectiveness of control measures and the effect of undetected carriage. The methods involve formulating appropriate stochastic transmission models whose parameters are then estimated using MCMC methods.
Click here to insert in your Google calendar
 
May 20 Thu Kate Ren (Sheffield) Statistics Seminar
14:00 Incorporating Prior Information into Clinical Trial Designs
Hicks I19
Click here to insert in your Google calendar
 
May 20 Thu Peter Gregory (Sheffield) Statistics Seminar
14:00 Looking for a simple solution to a simple problem: Bayesian modelling of positively skewed data
Hicks I19
  Abstract:
The motivation for this research was a medical cost data set from a clinical trial. If the proposed new intervention were to be accepted by a Regulatory Body then a Health Care Provider has to budget for future treatments for some members of the rest of the population. In this Bayesian analysis we want to determine the expected value for one unobserved member of this population from its posterior predictive distribution by firstly establishing the parametric data model that best captures the positive skew characteristics of the costs. We then develop a novel approach to modelling the priors that enable an expert's prior beliefs to be elicited while permitting a limited analytical study of the model. These techniques have been applied to recent medical data sets to establish their comparative efficiency when compared with classical estimators.
Click here to insert in your Google calendar
 
May 20 Thu Jonty Rougier (Bristol) Statistics Seminar
15:30 Complex systems: Accounting for model limitations
Hicks I19
  Abstract:
Many complex systems, notably environmental systems like climate, are highly structured, and numerical models, known as simulators, play an important role in prediction and control. It is crucial to account for limitations in simulators, since these can be substantial, and can vary substantially from one simulator to another. These limitations can be categorised in terms of input uncertainty, parametric uncertainty, and structural uncertainty. The talk explains this framework, and the particular challenge of accounting for simulator limitations in dynamical systems, using illustrations from a low-order model for glacial cycles.
Click here to insert in your Google calendar
 
May 27 Thu Graeme Sarson (Newcastle) Statistics Seminar
14:00 Forward models of prehistoric population dynamics
Hicks K14
Click here to insert in your Google calendar
 
Sep 30 Thu Kostas Triantafyllopoulos (Sheffield) Statistics Seminar
14:00 Multivariate stochastic volatility modelling using Wishart autoregressive processes
Lecture Theatre 6
  Abstract:
This talk will discuss some of the research I conducted while in study leave. In particular a new multivariate stochastic volatility estimation procedure for financial time series will be developed. A Wishart autoregressive process is considered for the volatility precision covariance matrix, for the estimation of which a two stage procedure is adopted. In the first stage conditional inference on the autoregressive parameters is developed and the second stage develops unconditional inference, based on a Newton-Raphson iterative algorithm. The proposed methodology, suitable for medium dimensional data, bridges the gap between closed-form estimation and simulation-based estimation algorithms in stochastic volatility modelling. Two examples, consisting of foreign exchange rates data and of data from the common constituents of the Dow Jones 30 Industrial Average index, illustrate the proposed methodology; for both examples we discuss asset allocation using as performance indicator mean-variance portfolio optimization. In this talk we will discuss Wishart processes, which may be of interest in their own right or targeting other than financial applications.
Click here to insert in your Google calendar
 
Oct 21 Thu Lauren Rodgers (Forensic Science Service) Statistics Seminar
14:00 A continuous model for deconvoluting DNA mixtures
Lecture Theatre 6
  Abstract:
There are numerous problems encountered in the interpretation and evaluation of DNA profiles, particularly when there is more than one contributor. The current statistical methods are based on binary models and make limited use of the quantitative information contained in the profile. We have developed a continuous model which can probabilistically take account of allelic dropout, allelic stutter and the amplification efficiency of allele given molecular weight. This presentation will include: on overview of DNA profiling; a description of our proposed continuous model; and some illustrative calculations with DNA mixtures.
Click here to insert in your Google calendar
 
Oct 28 Thu Richard Boys (Newcastle) Statistics Seminar
14:00 Linking systems biology models to data
Lecture Theatre 6
  Abstract:
This talk considers the assessment and refinement of a dynamic stochastic process model of the cellular response to DNA damage. The proposed model is a complex nonlinear continuous time latent stochastic process. It is compared to time course data on the levels of two key proteins involved in this response, captured at the level of individual cells in a human cancer cell line. The primary goal of is to "calibrate" the model by finding parameters of the model (kinetic rate constants) that are most consistent with the experimental data. Significant amounts of prior information are available for the model parameters. It is therefore most natural to consider a Bayesian analysis of the problem, using sophisticated MCMC methods to overcome the formidable computational challenges.
Click here to insert in your Google calendar
 
Nov 18 Thu Piotr Fryzlewicz (London School of Economics) Statistics Seminar
14:00 Haar-Fisz methodology for interpretable estimation of large, sparse, time-varying volatility matrices
Lecture Theatre 6
  Abstract:
The emergence of the recent financial crisis, during which many markets underwent changes in their statistical structure over a short period of time, illustrates the importance of non-stationary modelling in financial time series. We start this talk by advocating a simple non-stationary multivariate model for financial returns. One task of critical importance to a financial analyst is accurate estimation of the volatility matrix, and in our model, this will be a time-varying quantity. Our estimation method is based on Haar wavelet thresholding, supplemented with the essential variance-stabilising Fisz transform (hence the name Haar-Fisz). Thanks to the use of Haar wavelets, our estimator: (a) has a natural in-built sparsity, i.e. local cross-market correlations are naturally estimated as zero wherever possible, which enhances the invertibility of the estimated matrix; (b) adequately captures sudden regime changes; (c) is theoretically tractable, also in the pointwise sense; (d) is rapidly computable, which is important if the matrix is large. In addition, we take advantage of the non-linearity of wavelet thresholding to propose two distinct version of the estimator, one of which is based on the polarisation identity. We use real-data examples to illustrate our methodology.
Click here to insert in your Google calendar
 
Nov 25 Thu Samuel Touchard (Sheffield) Statistics Seminar
14:00 Forecasting pollution levels using Dynamic Linear Models
Lecture Theatre 6
  Abstract:
In this talk, I will try to forecast the pollution levels of 5 five pollutants, from 8 years data. The model I used is a Dynamic Linear Model (DLM), a regression model where the parameter vector is no longer assumed constant over time. Also, 3 covariates (humidity, temperature, wind speed) will be used to get a better estimation. After introducing the issue of pollution, I will describe the model, in the univariate case first, and in the multivariate case afterwards. Then, I will apply this model to the data, do some comments about the results, how it would be possible to improve it, and give some ideas for further work.
Click here to insert in your Google calendar
 
Dec 2 Thu Andrew Parnell (University College Dublin) Statistics Seminar
14:00 Faster joint posterior modelling through marginal posterior mixtures
Lecture Theatre 6
  Abstract:
We discuss the issue of creating a joint posterior distribution for a set of parameters when only marginal posteriors are available (or are reasonable to compute). More specifically, for data '$x$ and parameters $\theta$ in $R^n$, we require $\pi(\theta|x)$ from the marginal data posterior $\pi(\theta_i|x_i)$. Through a simple adjustment of Bayes' theorem we can use $\pi(\theta_i|x_i)$ to inform the joint posterior, provided $\pi(\theta_i)$ and $\pi(\theta)$ (the marginal and joint priors, respectively) are, in some sense, compatible. \par The technique can be further enhanced by treating $\pi(\theta_i|x_i)$ as a mixture of distributions conjugate to the joint prior. In most cases, it is trivial to approximate any marginal posterior distribution as such a mixture. When the joint prior is Gaussian, the resulting posterior can then be obtained extremely quickly via any one of a number of standard Bayesian computational techniques. \par We apply this technique to two problems in palaeoclimatology (both described in Haslett et al 2006). The first involves long-tailed random walk smoothing of temporal climate histories ($c(t)$) created from pollen sediment cores where pollen is sampled at $n$ layers $y_i$, $i=1, . . . , n$. The marginal posteriors $\pi(c_i|y_i)$ are easily obtained by other means, whereas the random walk gives flat marginal prior distributions $\pi(c_i)$. We obtain the joint prior $\pi(c|y)$ in a twostage process without resorting to more burdensome computational methods. The second problem involves spatial forward modelling of pollen changes given modern climate data (also known as response surface modelling; Huntley et al 1993). Here, the marginal posteriors are Gaussian surfaces with few hyperparameters; they are relatively quick to create. The joint posterior surface then becomes a mixture of Gaussian processes. Again, the two-stage process dramatically decreases the computational burden, and allows for parallelisation. The models we propose have much in common with Rue et al (2009) and Holmstrom and Erasto (2002). The technique seems widely applicable across the field of statistical modelling. We explore some of the extensions which may allow for higher dimensional models or more complex prior distributions.
Click here to insert in your Google calendar
 
Dec 9 Thu Alison Etheridge (Oxford) Statistics Seminar
14:00 Modelling evolution in a spatial continuum: the spatial $\Lambda$-Fleming-Viot process
Lecture Theatre 6
  Abstract:
One of the outstanding successes of mathematical population genetics is Kingman's coalescent. This provides a simple and elegant description of the genealogical trees relating individuals in a sample of neutral genes from a panmictic population, that is, one in which every individual is equally likely to mate with every other and all individuals experience the same conditions. But real populations are not like this. Spurred on by the recent flood of DNA sequence data, an enormous industry has developed that seeks to extend Kingman's coalescent to incorporate things like variable population size, natural selection and spatial and genetic structure. But a satisfactory approach to populations evolving in a spatial continuum has proved elusive. In recent joint work with Nick Barton, IST Austria, we introduced a framework for modelling the evolution of populations distributed in a spatial continuum. This leads to a new class of measure-valued processes which we will describe and, as time permits, explore in this talk.
Click here to insert in your Google calendar
 
Dec 16 Thu Grant Bigg (Sheffield) Statistics Seminar
14:00 Using icebergs as a tool in geoscience: how did the needle get into the haystack?
Lecture Theatre 6
  Abstract:
Since the sinking of the Titanic in 1912, icebergs have possessed a powerful aura for polar navigation. However, they are not only a threat to shipping but tell us about climate change, and the sediments dropped from them are key indicators of past climate fluctuations around the globe. In this talk the science of icebergs is explored, paying particular attention to where it intersects with sometimes difficult statistical issues. The power of statistical-dynamical modelling of icebergs to reveal new and interesting facts about past and present climate change is shown. The statistical analysis of remote sensing images is seen to be a powerful tool for aiding navigation as the Arctic sea routes are opened up. And finally, the use of systems control theory will be seen to offer the possibility of a new view of the evolution of the Greenland ice sheet over the last century.
Click here to insert in your Google calendar
 
Feb 17 Thu Mark Strong (University of Sheffield) Statistics Seminar
14:00 Managing Structural Uncertainty in Health Economic Decision Models
Lecture Theatre 6
  Abstract:
It was George Box who famously wrote 'Essentially, all models are wrong'. Given our limited understanding of the highly complex world in which we live this statement seems entirely reasonable. Why then, in the context of health economic decision modelling, do we often act as if our models are right even if we know that they are wrong? Imagine we have built a deterministic mathematical model to predict the costs and health effects of a new treatment, in comparison with an existing treatment. The model will be used by NICE to inform the decision as to whether to recommend the new treatment for use in the NHS. The inputs to the model are uncertain, and we quantify the effect of this input uncertainty on the model output using Monte Carlo methods. We may even quantify the value of obtaining more information. We present our results to NICE as a fait accompli. But, if we believe George Box then surely we should consider that our model output, and our uncertainty analysis, and our estimates of the value of information are all 'wrong' because they are generated by a model that is 'wrong'! The challenge is to quantify how wrong. This seminar will explore the problem of structural uncertainty in health economic decision models, along with some suggested approaches to managing this uncertainty.
Click here to insert in your Google calendar
 
Feb 17 Thu Siti Rahayu (University of Sheffield) Statistics Seminar
14:30 Interpretation Methods of Multivariate Control Chart's Signal
Lecture Theatre 6
  Abstract:
Multivariate control charts have been the most popular tool among the quality control/process control researchers when it comes to multivariate processes monitoring. The impact of correlation among process variables on multivariate process performance, the problem of multiplicity in hypotheses testing and the difficulties in monitoring a large number of univariate control charts simultaneously can be solved readily by implementing a multivariate control chart. The only drawback of using a multivariate control chart is that once the out-of-control signal is triggered, the interpretation of the signal is potentially difficult. There are a number of interpretation methods have been proposed by researchers but so far all the methods give inconsistent results. Some of the interpretation methods will be introduced and the strength and the weaknesses will be discussed. A new approach will be introduced as another option for interpreting a multivariate control chart signal.
Click here to insert in your Google calendar
 
Mar 3 Thu Ajay Jasra (Imperial College) Statistics Seminar
14:00 On the stability of a class of sequential Monte Carlo methods in High Dimensions
Lecture Theatre 6
  Abstract:
We investigate the stability of a Sequential Monte Carlo (SMC) method applied to the problem of sampling from a single target density on R for large d. It is well known, using a single importance sampling step, one produces an approximation for the target distribution that deteriorates as the dimension d increases, unless the number of MC samples N increases at an exponential rate in d. This degeneracy can be avoided by introducing a sequence of artificial targets, starting from a `simple' target density and moving to the one of interest and using an SMC method to sample from the sequence. Using this class of SMC methods with a fixed number of samples, one can produce an approximation for which the effective sample size (ESS) converges to a random variable \varepsilon_N as d -> \infty, such that 1<\varepsilon_{N}
Click here to insert in your Google calendar
 
Mar 17 Thu Martijn Pistorius (Imperial College) Statistics Seminar
14:00 MAXIMAL INCREMENTS OF RANDOM WALKS AND LEVY PROCESSES
Lecture Theatre 6
  Abstract:
A random walk reflected at its minimum is equal to the random walk minus its running minimum. The reflected process plays a role in various applications. It is related to the method of cumulative sums (CUSUM) used in mathematical statistics, and has been employed in various areas in applied probability, such as queueing theory, mathematical finance and mathematical genetics. For a random walk which step-size distribution has finite negative mean and satisfies Cramer's condition, we show that the current value, the rescaled maximum and the overshoot are asymptotically independent, and identify explicitly the limit-distribution of the overshoot. We obtain analogous results for the corresponding statistics of a Levy process. As corollary we obtain a factorization of the exponential distribution. This is joint work with A Mijatovic.
Click here to insert in your Google calendar
 
Mar 24 Thu Tusheng Zhang (University of Manchester) Statistics Seminar
14:00
Lecture Theatre 6
Click here to insert in your Google calendar
 
Apr 7 Thu Adrian Bowman (University of Glasgow) Statistics Seminar
13:30 Flexible regression models for environmental applications
Lecture Theatre 6
  Abstract:
Additive, and more general nonparametric, approaches to modelling extend standard regression methods by allowing very flexible, but smooth, relationships between variables of interest. The role of these models in environmental applications, where there is a need to model complex forms of spatial and temporal trends, as well as spatial and temporal correlation, will be discussed. Technical aspects of the talk will include computational strategies for spatiotemporal smoothing and ways of extending standard inferential methods. The data structures considered will include river networks as well as more standard spatial domains. Applications will include the modelling of SO2 pollution over Europe, water quality in the River Tweed and rainfall-flow response in the river Dee.
Click here to insert in your Google calendar
 
Apr 7 Thu Oztas Ayhan (Middle East Technical University, Turkey) Statistics Seminar
15:00 Memory recall errors and their relation to survey response
Lecture Theatre 5
  Abstract:
This talk covers a study which compares self--reports during an interview with staff and students who attended a University health centre, with the records of visits to the same health centre over the previous 12 months. Design of the study reflects the effects of importance of the event, duration since the event, frequency of the occurrence of the event, measurement scale of the event, and bounded and unbounded recalling. In order to assess the extent of recall error, responses to retrospective questions on health centre visits are compared with administrative records. Statistical models are proposed for short and long term human memory recall error effects on responses.
Click here to insert in your Google calendar
 
May 5 Thu Jim Smith (University of Warwick) Statistics Seminar
14:00 Controlling A Remote Bayesian from being irrational Abstract
Lecture Theatre 6
  Abstract:
UK military commanders have a degree of devolved decision authority delegated from command and control (C2) regulators, and they are trained and expected to act rationally and accountably. Therefore from a Bayesian perspective they should be subjective expected utility maximizers. In fact they largely appear to be so. However when current tactical objectives conflict with broader campaign objective there is a strong risk that fielded commanders will lose rationality and coherence. By systematically analysing the geometry of their expected utilities, arising from a utility function with two attributes, we demonstrate in this paper that even when a remote C2 regulator can predict only the likely broad shape of her agents' marginal utility functions it is still often possible for her to identify robustly those settings where the commander is at risk of making inappropriate decisions.
Click here to insert in your Google calendar
 
May 12 Thu Lee Fawsett (University of Newcastle) Statistics Seminar
14:00
Lecture Theatre 6
Click here to insert in your Google calendar
 
May 19 Thu Mathew Penrose (University of Bath) Statistics Seminar
14:00 Limit Theorems in Stochastic Geometry with Applications
Lecture Theatre 6
  Abstract:
For an empirical point process governed by a probability density function in d-space, consider functionals obtained by summing over each point some function which is locally determined. General laws of large numbers and central limit theorems for such functionals are known. We discuss such results, their extensions to point processes in manifolds, associated local limit theorems, and applications to particular functionals such as multidimensional spacings statistics, dimension estimators and entropy estimators.
Click here to insert in your Google calendar
 
Sep 29 Thu Sawaporn Siripanthana (Sheffield) Statistics Seminar
14:20 Multivariate surveillance for outbreak detection
LT-6
  Abstract:
Early detection with a low false alarm rate is the main aim of outbreak detection as used in public health surveillance or in regard to bioterrorism. Several statistical methods have been implemented and used for monitoring the occurrence of outbreaks. For simplicity, univariate surveillance or parallel surveillance, separate monitoring of each continuous series, is usually implemented in practice. However, this has severe limitations arising because of multiplicity from multiple hypothesis testing and ignoring correlation between series which might reduce detection performance of systems if data are truly correlated. Additionally correlation within series is another issue which is often ignored but which should be considered, as health data are normally dependent over time. This talk will summarise existing univariate methods used for outbreak detection with their strength and weaknesses and look at extensions to the multivariate case. For dimensionality reduction in multivariate surveillance, a method based on the sufficiency property will be introduced.
Click here to insert in your Google calendar
 
Oct 12 Wed Mark Davis (Imperial College) Statistics Seminar
14:15 Pathwise stochastic calculus and applications to options on realized variance.
K14
  Abstract:
If $S_t, t\in[0,T]$ is the price of a financial asset, the realized variance is $\mathrm{RV}^d_T=\sum_{i=1}^n(\log(S_{t_i}/S_{t_{i-1}}))^2$ where $t_i$ a pre-specified increasing sequence of times in $[0,T]$. Most of the literature on this subject studies the continuous-time limit, which is $\mathrm{RV}^c_T=[\log S]_T$, the quadratic variation of the `log-returns' process $X_t=\log S_t$. Questions to be answered are how to price options on realized variance consistently with other options in the market and how to hedge these options. Recent research has focussed on model-free approaches to these questions: we want to say as much as possible without committing ourselves to any particular stochastic process realization of $S_t$. However, this poses an immediate problem of interpretation in the passage from $\mathrm{RV}^d$ to $\mathrm{RV}^c$: we cannot use the standard probabilistic notions of convergence, since we do not have a probability space! An answer to this problem is provided in Hans Föllmer's 1981 paper Calcul d'Itô sans probabilités, where he derives an Itô formula just using real analysis for paths having the `quadratic variation property'. In some cases, we need an Itô formula valid for functions whose second derivatives are not continuous, say $f\in {\cal H}^2$. The standard approach to this in stochastic analysis goes via the Tanaka formula and local time, so the question arises whether we can have a pathwise theory of local time. Föllmer, with a Diploma student, did consider this question, but it seems there may be decisive advantages in considering `Lebesgue' partitions rather than `Riemann' partitions as Föllmer did, thereby getting a direct connection with Lévy's downcrossing representation of local time. This is a preliminary account of work in progress with Jan Ob\lój (Oxford).
Click here to insert in your Google calendar
 
Oct 12 Wed Claudia Klϋppelberg (TU Munich) Statistics Seminar
15:45 An introduction to COGARCH modelling with financial applications.
K14
  Abstract:
Modelling of stochastic volatility has triggered important research in the theory of stochastic processes. New models have been proposed to capture the ``stylized facts'' of volatility such as jumps, heavy-tailed marginals, long range dependence, and clusters in extremes. In recent years particular emphasis has been given to continuous-time modelling, since financial time series in liquid markets are high-frequency and irregularly spaced because of random trading times. Natural candidates of continuous-time models with jumps are Lévy or Lévy-driven models, and we shall discuss some of the prominent examples for volatility modelling. Special emphasis is given to COGARCH models, which are continuous time versions of the very popular GARCH models.
Click here to insert in your Google calendar
 
Oct 20 Thu Simon Wood (University of Bath) Statistics Seminar
14:00
LT-6
Click here to insert in your Google calendar
 
Nov 3 Thu Nicola Loperfido (Universita degli Studi di Urbino) Statistics Seminar
14:00 Kurtosis and the Black Swan: some Fine Financial Findings
LT-6
  Abstract:
The Black Swan: The Impact of the Highly Improbable is a book which became a bestseller by pointing out the relevance of extreme (i.e. tail) events in finance. It also depicted statisticians as being totally inept at dealing with such events, but very apt in deceiving themselves and others using the normal distribution and more complicated models. This is unfair, given the vast statistical literature devoted to non normal models for extreme financial events. However, it is also true that most of it is better suited for professional statisticians than for financial analysts with limited statistical backgrounds and little time to learn advanced statistical techniques. These analysts might find kurtosis a simple and useful tool for dealing with tail events. This seminar examines some properties of kurtosis and apply them to financial decisions. Theoretical results will be illustrated by using data collected from several financial markets.
Click here to insert in your Google calendar
 
Nov 17 Thu Sofia Dias (University of Bristol) Statistics Seminar
14:00 Checking consistency in Mixed Treatment Comparison Meta-analysis
LT-6
  Abstract:
Indirect and mixed treatment comparisons (MTC), also known as network meta-analysis, represent an important development in evidence synthesis, particularly in decision making contexts. Rather than pooling information on trials comparing treatments A and B, A and C, B and C etc separately, MTC combines data from randomised comparisons, A vs B, A vs C, A vs D, B vs D, and so on, to deliver an internally consistent set of estimates while respecting the randomisation in the evidence. MTC allows coherent judgements on which of several treatments is the most effective and produces estimates of the relative effects of each treatment compared to every other treatment in a network -- even though some pairs of treatments may not have been directly compared. However, doubts have been expressed about the validity of MTC, particularly the assumption of consistency between ``direct'' and ``indirect'' evidence. Inconsistency can be thought of as a conflict between ``direct'' evidence on a comparison between treatments B and C, and ``indirect'' evidence gained from A vs C and A vs B trials. Like between-trial heterogeneity, inconsistency is caused by effect-modifiers, and specifically by an imbalance in the distribution of effect modifiers in the direct and indirect evidence. I will begin by defining inconsistency as a property of ``loops'' of evidence, and then provide details of the node-split and other, simpler, methods to assess whether there is inconsistency in a network and where it might be located. The merits and drawbacks of each method will be discussed using illustrative examples.
Click here to insert in your Google calendar
 
Dec 1 Thu Dario Spano (University of Warwick) Statistics Seminar
14:00 Canonical correlation for dependent gamma random measures.
LT-6
  Abstract:
We will focus on the construction of dependent completely random measures (CRMs), with fixed margins, motivated by applications to Bayesian inference and Population Genetics. In particular, we will deal with vectors of gamma CRMs and characterize their distribution in terms of their canonical correlations, that is: we characterize the class of all dependent gamma measures whose finite dimensional distributions are given by a transition kernel with orthogonal polynomial eigenfunctions. We thus provide a results that shows that the canonical correlations (i.e. the kernel eigenvalues) are mixed moments of linear functionals of Dirichlet means evaluated at a random function. Markov-Krein and other identities on Dirichlet random means thus allow for several explicit representation for joint and conditional moment measures of our bivariate CRMs. We provide a few illustrations that show how some well--known dependent vectors are included in our more general framework. Finally, if time allows, we will discuss an extension to measure--valued Markov processes.
Click here to insert in your Google calendar
 
Dec 15 Thu Peter Craig (University of Durham) Statistics Seminar
14:00 Ecotoxicological Risk Assessment: Beyond the Standard Species Sensitivity Distribution Model --- Advantages and Benefits of Being Bayesian and Matters Arising.
LT-6
  Abstract:
Ecotoxicological risk assessment deals with the potential for unwanted ecological effects of chemicals. A key statistical tool for risk assessors and managers is the use of the species sensitivity distribution (SSD) model as a proxy for the effects of a chemical in real ecosystems; in particular, the "safe concentration" calculation is based on an estimate of the 5th percentile of the SSD, obtained from relatively small amount of data. The standard procedure (Aldenberg and Jaworska, 2000) is based on a log-normal model assuming exchangeability. Much of this talk will discuss a number of recent developments in the modelling and use of SSDs: drawing strength from other data; use of loss functions; assessing and modelling non-exchangeability and the consequences for decision-making; handling the issue of "measurement error" (inter-test variation); understanding and exploiting inter-species correlation; hierarchical random effects models. In parallel, I will consider the ongoing shift from frequentist to Bayesian methodology/philosophy in ecotoxicology, the advantages for the statistician of the Bayesian approach and the benefits this provides for ecotoxicology. I will finish by discussing some of the problems of being Bayesian, the questions they raise and some of the issues which Bayesians need to address.
Click here to insert in your Google calendar
 
Feb 9 Thu Vanessa Didelez (University of Bristol) Statistics Seminar
14:00 Mendelian Randomisation as an Instrumental Variable Approach to Causal Inference
LT-6
  Abstract:
In epidemiology we often want to estimate the causal effect of an exposure on a health outcome based on observational data, where the possibility of unobserved confounding cannot be excluded. To deal with this problem, it has recently become popular to use a technique called Mendelian randomisation, where it is exploited that the exposure is associated with a genetic variant, which can be assumed to be unaffected by the same confounding factors and which makes it suitable as a so-called instrumental variable. In my talk, this technique is illustrated with various examples, in particular with the effect of alcohol consumption on blood pressure / hypertension. Different methods of using an instrumental variable to estimate the causal effect on a binary outcome are compared based on their theoretical properties as well as by simulation. Finally, it will be discussed if a Bayesian approach is useful in the context of Mendelian randomisation. References:Didelez and Sheehan (2007). Mendelian randomisation as an instrumental variable approach to causal inference, Statistical Methods in Medical Research, 16, 309-330. Didelez, Meng and Sheehan (2010). Assumptions of IV methods for observational epidemiology, Statistical Science, 25, 22-40. Palmer, Sterne, Harbord, Lawlor, Sheehan, Meng, Granell, Davey Smith, Didelez (2011). Instrumental variable estimation of causal risk ratios and causal odds ratios in Mendelian randomization analyses, The American Journal of Epidemiology, 173 (12). Jones, Thompson, Didelez and Sheehan (2012). On the choice of parameterisation and priors for the Bayesian analyses of Mendelian randomisation studies. To appear in Statistics in Medicine.
Click here to insert in your Google calendar
 
Feb 16 Thu Emma Jones (University of Sheffield) Statistics Seminar
14:00 Using A Bayesian Hierarchical Model for Tree-Ring Dating
LT-6
  Abstract:
The width of tree-rings are determined by several factors including a local climatic signal apparent in that year, and the tree's growth trend. The climatic signal influences growth such that if the summer is warm and wet, the ring tends to be wider than if the summer is cold and dry. The growth trend describes the fast growth of the tree when it is young producing wide rings, followed by narrower rings as it ages. Other factors such as the soil conditions, presence of pests and diseases and competition for light and nutrients can also effect the ring width. The impact of these latter factors are collectively known as noise. It is assumed that trees within the same geographical region are exposed to the same climatic signal in each year, but that this differs from year to year. Tree-ring dating involves matching sequences of tree-ring widths from timbers of unknown age to dated sequences known as 'master' chronologies. Before matching takes place, all data are preprocessed to remove the growth trends. The timbers of unknown age (typically from a single building or woodland) are, firstly, sequentially matched against one another to identify the relative offsets with the 'best' match. The sequence produced is known as a 'site'chronology. The site chronology is then further matched to a local master chronology, to attempt to produce a date estimate for the site chronology. Traditionally the quality of the matches (both within the site chronology and between the site chronology and the master chronology) are assessed via the classical statistical t-test. A match at a particular offset is only considered to be 'best' if it produces the largest t-value of all of the possible offsets and is greater than (an arbitrary value of) 3.5. The success rate of dating varies within sites and across regions; the national average being approximately 60-70% but in some geographical areas the success rate can be much lower. One of the reasons for this is that the t-test does not utilise the wide range of information that could be used if a Bayesian model was used for tree-ring dating. A Bayesian model for tree-ring dating allows important prior information on parameters to be drawn into the inference process; this prior information can be taken from trees and can also be elicited from expert dendrochronologists. The model assumes that each ring width is composed of an overall climatic signal and some noise, and can be further extended to include climatic signals at varying geographic scales. Probabilities for a match at each offset can be produced conditional on the data and the prior specifications. The method removes the need to identify a single 'best' match, but it does rely on careful prior specification of parameters. Consequently, we have collated ring width data from trees of known age from several woods in the UK and are using these to provide informative prior knowledge.
Click here to insert in your Google calendar
 
Feb 16 Thu Seungjin Han (University of Sheffield) Statistics Seminar
14:30 Adaptive filtering for algorithmic pairs trading
LT-6
  Abstract:
Pairs trading as a statistical arbitrage methodology has received considerable attention and popularity since its initial application in the 1980's. It is based on the assumption that a spread of two assets is mean-reverted, and any violating fluctuations are taken advantage in order to realize profits. For real time detection of mean reversion, we employ a time-varying autoregressive model in a state-space form, online estimation of which is achieved by recursions of Kalman filtering and adaptive forgetting. Two novel algorithms for a variable forgetting factor are proposed and compared with a standard recursive least squares algorithm with adaptive memory.
Click here to insert in your Google calendar
 
Feb 23 Thu Jim Griffin (University of Kent) Statistics Seminar
14:00 Shrinking to some purpose
LT-6
  Abstract:
In Bayesian statistics there has recently been interested in using priors whose density has a spike at zero in regression problems. These priors can lead to adaptive shrinkage of regression effects and so can be used for sparse regression problems where many of the regression coefficients are assumed to be zero (or very close to zero). This talk will consider the Normal-Gamma prior and extensions of it to encourage more general forms of shrinkage. For example, we might want to shrink differences of regression effects, or we might want to allow the ``importance'' of regression effects to change over time.
Click here to insert in your Google calendar
 
Mar 1 Thu Chris Sherlock (University of Lancaster) Statistics Seminar
14:00 A hidden Markov model for disease interactions
LT-6
  Abstract:
Interactions between parasite species in a host are of great interest to ecologists but are often too complex to predict a priori. A longitudinal study of a population of field voles was undertaken with presence or absence of six different parasite species measured repeatedly. Although trapping sessions were regular, a different set of voles was caught at each session leading to incomplete profiles for all subjects. A simple analysis, which discards much of the data, has already been carried out; we offer a more powerful alternative. We use a discrete-time hidden Markov model for each disease with transition probabilities dependent on covariates via a set of logistic regressions. For each disease the hidden states for each of the other diseases at a given time point form part of the covariate set for the Markov transition probabilities from that time point to the next. This allows us to gauge the influence of each parasite species on the transition probabilities for each of the other parasite species. Inference is performed via a Gibbs sampler, one iteration of which cycles through each of the diseases, first using an adaptive Metropolis-Hastings step to sample from the conditional posterior of the covariate parameters for that particular disease given the hidden states for all other diseases and then sampling from the hidden states for that disease given the parameters using the Forward-Backward algorithm.
Click here to insert in your Google calendar
 
Mar 29 Thu Eleanor Stillman (University of Sheffield) Statistics Seminar
14:00 Optimal design for multiresponse experiments
LT-6
  Abstract:
Many statistical investigations require data to be collected so that the influence of explanatory variables on responses of interest can be deduced. Once there is more than a single response variable, there are potential conflicts of interest in selecting experiments which are efficient at estimating all responses. In this talk I will begin by introducing the general ideas of optimal experimental design and then focus on extensions to multiple responses. In particular, I will introduce a new composite optimality criterion which seeks to estimate a primary continuous response efficiently particularly when a second, binary, response has a positive outcome. I will also examine the practically important case of simultaneous estimation of both mean and variance of a single response.
Click here to insert in your Google calendar
 
Apr 26 Thu Ronnie Loeffen (University of Manchester) Statistics Seminar
14:00 Spectral representations for affine processes
LT-6
  Abstract:
Affine processes are widely used in various areas of mathematical finance, like credit risk modelling, interest rate modelling and stochastic volatility models. One of the advantages of working with affine processes is that one can compute European option prices via Laplace/Fourier inversion after solving a system of non-linear, first order ODEs. However, an explicit solution to this system exists only in a limited number of cases and numerically solving it seems cumbersome. Based on the work of Ogura (1974/75) on continuous-state branching processes, we discuss an alternative method in which the system of ODEs is replaced by a number of decoupled, linear, first order PDEs. Pros and cons of the method will be indicated and also some examples will be provided.
Click here to insert in your Google calendar
 
May 3 Thu Simon Wood (University of Bath) Statistics Seminar
14:00 Simple statistical models for complex ecological data
LT-6
  Abstract:
Much ecological theory is based on models that are relatively simple to write down and simulate from, while at the same time being capable of displaying very complicated dynamics. This talk suggests that such near chaotic dynamics provide a case where it is sensible to abandon conventional likelihood or Bayesian approaches in favour of inference based on carefully chosen statistics of the data. The statistics should be designed to avoid the irregularity produced by highly non-linear dynamics, while still being informative about the dynamic structure of the system being modelled. A simple approach to inference is proposed, which requires only the ability to simulate from the model. The approach has links to ABC, generalized method of moments, indirect inference and similar approaches, but requires rather little tuning.
Click here to insert in your Google calendar
 
May 9 Wed Sotiris Bersimis (University of Piraeus) Statistics Seminar
14:00 Multivariate SPC with emphasis in multi-attribute processes
LT-10
  Abstract:
Initially, the area of multivariate SPC will briefly overviewed and the basic procedures for implementing multivariate statistical process control via control charting will be reviewed. Specifically, multivariate extensions for all kinds of univariate control charts, such as multivariate Shewhart type control charts, MCUSUM control charts, and MEWMA control charts will be summarized and the problem of interpreting an out-of-control signal will be briefly discussed. Additionally, since in the literature, little work has been done to deal with multivariate attributes processes, which are very important in practical production processes, the presentation will close by presenting the special case, which arises when the quality of process of interest is not characterized by continuous characteristics. Furthermore, after the key points of multi-attribute process will presented, some procedures for controlling such processes will be discussed.
Click here to insert in your Google calendar
 
May 17 Thu Lee Fawcett (University of Newcastle) Statistics Seminar
14:00 Estimating return levels from serially dependent extremes
LT-6
  Abstract:
In this talk, we investigate the relationship between return levels of a process and the strength of serial correlation present in the extremes of that process. Estimates of long period return levels are often used as design requirements, and peaks over thresholds (POT) analyses have, in the past, been used to obtain such estimates. However, analyses based on such declustering schemes are extremely wasteful of data, often resulting in great estimation uncertainty represented by very wide confidence intervals. Using simulated data, we show that - provided the extremal index is estimated appropriately - using all threshold excesses can give more accurate and precise estimates of return levels, allowing us to avoid altogether the sometimes arbitrary process of cluster identification. We then apply our method to two data examples concerning sea-surge and wind speed extremes.
Click here to insert in your Google calendar
 
Oct 4 Thu Sigurd Assing (University of Warwick) Statistics Seminar
14:00 On the spatial dynamics of the stochastic heat equation
  Abstract:
When modeling complex phenomena by random fields $u(x,t)$ depending on a $d$-dimensional space parameter $x$ and time $t$ it is often useful to describe the dynamical behaviour of these fields by stochastic partial differential equations (SPDE). If a random field $u(x,t)$ is a solution of an SPDE then it is usually understood as a Markov process $u(\cdot,t)$, $t \geq 0$, taking values in a function space. Unfortunately this wipes out any structure of the solutions in the space parameter $x$. In this talk we recover this structure in the case where the SPDE is the so-called stochastic heat equation which is a simple toy example. The method used is mainly based on the technique of enlargement of filtrations and on Malliavin calculus. There is hope that it can be also applied w.r.t. other SPDEs.
Click here to insert in your Google calendar
 
Oct 11 Thu Andrew Beckerman (Sheffield (Animal and Plant Sciences)) Statistics Seminar
14:00 Graphs and Covariance in Ecology and Evolution
K14
  Abstract:
Here I introduce two major research themes in ecology and evolution: food web networks and quantitative genetics. Food web network theory borrows heavily, if inelegantly, from graph theory with vertices/nodes typically representing species and edges representing anything from binary connection to process. In this section I introduce two classes of food web models, and issues currently facing their use centred on observation and process error. Quantitative genetics centres on estimating genetic variation and covariation among traits that are important to survival and reproduction of organisms. We focus on these, represented as a variance-covariance matrix, because variation is required for evolution to happen, and positive and negative covariation represents constraint on what can happen among traits. In this section I introduce the hierarchical modelling we typically use, important eigensystem properties of the var-cov matrix, and recent transitions from parametric to Bayesian MCMC tools. The Bayesian MCMC methods appear to allow several types of comparisons among groups of individuals with strong inference.
Click here to insert in your Google calendar
 
Oct 18 Thu Markus Riedle (Kings College London) Statistics Seminar
14:00 The stochastic heat equation driven by cylindrical Levy processes
K14
  Abstract:
The heat equation driven by Gaussian noise is the most fundamental and simplest example of a stochastic partial differential equation. Most of its properties and characteristics are well understood. However, given the restriction of Gaussian noise it is important to understand this fundamental equation if driven by a more general noise. In this talk we consider the heat equation driven by cylindrical Levy processes. These kinds of processes were introduced together with D. Applebaum a few years ago and they are a natural generalisation of the Gaussian noise. We give several examples of cylindrical Levy processes and introduce a stochastic integral with respect to these processes. In the main part, we explain how the heat equation can be solved and we show some of the phenomena which arise if the heat equation is no longer perturbed by a Gaussian noise but by a cylindrical Levy process.
Click here to insert in your Google calendar
 
Oct 25 Thu Charles Taylor (University of Leeds) Statistics Seminar
14:00 Regression for circular data
  Abstract:
We consider data of the form $(x_i,y_i)$ in which either $x$ and/or $y$ is measured as an angle and we seek to model a relationship in which $y$ can be predicted from $x$. Starting with a review of existing parametric models, we put these into a common framework and discuss problems with estimation. Various nonparametric models, which make use of circular kernels, are described, as well as their asymptotic behaviour and approaches to bandwidth selection.
Click here to insert in your Google calendar
 
Nov 1 Thu Post-graduate talks - Mike Spence and Steph Llewelyn (Sheffield) Statistics Seminar
14:00 Parameter Estimation of Individual-based models (Mike) and Statistical Modelling of Fingerprints (Steph)
K14
  Abstract:
Mike's Talk: Parameter Estimation of Individual-based models \par Individual-based models are increasingly used in ecological modelling as a way of trying to understand how individuals' behaviour leads to the emergent behaviour of the system. Generally the behaviour of the individuals is determined through a series of rules or algorithms, rather than described in a formal mathematical way, and this can represent a good way of capturing an ecologist's expertise and intuition. \par Quantifying uncertainty, estimating parameters and so on for a model of this sort are complicated by the fact that its probabilistic behaviour is implicit in its rules, rather than made explicit as in a more conventional statistical or stochastic model. This means that there is generally no explicit likelihood function available. I will discuss a number of methods of dealing with this and illustrate these methods with Railsback and Grimm's (2012) simplified model of woodhoopoe population dynamics.


Stephanie's Title: Statistical Modelling of Fingerprints \par It is believed that fingerprints are determined in embryonic development. Unlike other personal characteristics the fingerprint appears to be a result of a random process. For example fingerprints of identical twins (whose DNA is identical) are distinct, and extensive studies have found little evidence of a genetic relationship in terms of types of fingerprint, certainly at the small scale. At a larger scale the pattern of ridges on fingerprints can be categorised as belonging to one of five basic forms: loops (left and right), whorls, arches and tented arches. The population frequencies of these types show little variation with ethnicity and a list of the types occurring on the ten digits can be used as an initial basis for identification of individuals. However, such a system would not uniquely identify an individual although the frequency of certain combinations could be extremely small. At a smaller scale various minutiae or singularities can be observed in a fingerprint. These include ridge endings and bifurcations, amongst others. Typical fingerprints have several hundred of these as well as two key points (with the exception of a simple arch) referred to as the core and delta, which are focal points of the overall pattern of ridges. Modern identification systems are based upon ridge endings and bifurcations, not least because they are the easiest to determine automatically from image analysis. The configuration of these minutiae is unique to the individual. \par The presentation will give an introduction to fingerprints from a forensic context and also outline a method use for matching a finger mark to a fingerprint.
Click here to insert in your Google calendar
 
Nov 15 Thu Simon Spencer (University of Warwick) Statistics Seminar
14:00 Causal inference for biochemical networks
  Abstract:
In observation experiments it is impossible to distinguish between association and causation. To uncover causal relationships, interventions must be included in the experimental design. In complex systems, such as biochemical networks, there is frequently a high degree of association between interacting parts of the system. The aim of causal network inference is to untangle the causal structure behind these associations. In this study we developed a statistical model that captures the effect of inhibitors (an intervention) in a protein signalling network. We then used this model to perform causal network inference on protein microarray data from breast cancer cell lines. We were able to demonstrate that a causal inference approach increases the accuracy of the inferred networks.
Click here to insert in your Google calendar
 
Nov 22 Thu Barbel Finkenstadt (University of Warwick) Statistics Seminar
14:00 Modeling and inference for gene expression time series data (an overview)
  Abstract:
A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements of gene expression. We present an overview of the modeling approaches based on stochastic population dynamic models and their approximations. On the mesoscopic scale (small populations), we present a two dimensional continuous-time Bayesian hierarchical model which has the potential to address the different sources of variability that are relevant to the stochastic modelling of transcriptional and translational processes at the molecular level, namely, intrinsic noise due to the stochastic nature of the birth and deaths processes involved in chemical reactions, extrinsic noise arising from the cell-to-cell variation of kinetic parameters associated with these processes and noise associated with the measurement process. Inference is complicated by the fact that only the protein and rarely other molecular species are observed which is typically entailing problems of parameter identification in dynamical systems. On the macroscopic (or large populations) scale, we introduce a mechanistic 'switch' model for encoding a continuous transcriptional profile of genes over time with the aim of identifying the timing properties of mRNA synthesis which is assumed to switch between periods of transcriptional activity and inactivity, each time leading to the transition of a new steady state, while mRNA degradation is an ongoing linear process. The model is rich enough to capture a wide variety of expression behaviours including periodic genes. Finally, I will also give a brief introduction to some recent work on inferring the periodicity of the expression of circadian and other oscillating genes. Joint work with: Maria Costa, Dan Woodcock, Dafyd Jenkins, David Rand, Michal Komorowski (Warwick Systems Biology)
Click here to insert in your Google calendar
 
Nov 29 Thu Christopher Brignell (Nottingham) Statistics Seminar
14:00 Statistical shape analysis, with an application to chemoinformatics
K14
  Abstract:
Statistical methods for evaluating and comparing shapes are necessary is a wide range of disciplines. For example, in biology we may wish to classify an organism based on its shape, or in computer science we may wish to develop methods for automated face or fingerprint recognition. One emerging application is to molecular structures such as proteins and DNA to investigate properties of chemical bonding. In this talk I will provide an introduction to shape analysis and then apply the results to chemoinformatics.
Click here to insert in your Google calendar
 
Dec 6 Thu Ian Vernon (Durham) Statistics Seminar
14:00 Galaxy Formation: A Bayesian Uncertainty Analysis
K14
  Abstract:
The question of whether there exists large quantities of Dark Matter in our Universe is one of the most important problems in modern cosmology. This project deals with a complex model of the Universe known as Galform, developed by the ICC group, at Durham University. This model simulates the creation and evolution of approximately 1 million galaxies from the beginning of the Universe until the current day, a process which is very sensitive to the presence of Dark Matter. A major problem that the cosmologists face is that Galform requires the specification of a large number of input parameters in order to run. The outputs of Galform can be compared to available observational data, and the general goal of the project is to identify which input parameter specifications will give rise to acceptable matches between model output and observed data, given the many types of uncertainty present in such a situation. As the model is slow to run, and the input space large, this is a very difficult task. We have solved this problem using general techniques related to the Bayesian treatment of uncertainty for computer models. These techniques are centred around the use of emulators: fast stochastic approximations to the full Galform model. These emulators are used to perform an iterative strategy known as history matching, which identifies regions of the input space of interest. Visualising the results of such an analysis is a non-trivial task. The acceptable region of input space is a complex shape in high dimension. Although the emulators are fast to evaluate, they still cannot give detailed coverage of the full volume. We have therefore developed fast emulation techniques specifically targeted at producing lower dimensional visualisations of higher dimensional objects, leading to novel, dynamic 2- and 3-dimensional projections of the acceptable input region. These visualisation techniques allow full exploitation of the emulators, and provide the cosmologists with vital physical insight into the behaviour of the Galform model.
Click here to insert in your Google calendar
 
Dec 13 Thu Jenny Barrett (Leeds) Statistics Seminar
14:00 Identifying causal genetic variants and other related problems in statistical genetics
K14
  Abstract:
Genome-wide association (GWA) studies have been successful in recent years at finding associations between common genetic variants and disease by careful application of simple statistical methods. For most common diseases, this has led to the identification of a number of genetic regions that clearly harbour a genetic variant or variants that influence risk of disease. However, due to strong and complex patterns of correlation between genetic variants located close together, it is usually still unknown which variant(s), and often even which gene, in the region actually has a causal effect on the trait. We are applying statistical approaches to shed light on what is going on in the genetic regions associated with melanoma. Our primary approach is to select the most parsimonious model(s) that explain the association signal in the region (e.g. using penalized logistic regression of all variants in the regions simultaneously), and then as a second step to look at biological plausibility of the models. There are various outstanding problems in this area. Is there a more effective way of combining statistical and biological information? Regions may be genotyped at several different levels of density, right down to the highest resolution of knowing the entire genetic sequence in the region. If data are available at different densities on different subsets of individuals, how can they best be combined? Can including related individuals in the analysis help in the identification of causal variants, especially if these are rare? These problems will be discussed in further detail, with time for questions -- and any suggestions of answers!
Click here to insert in your Google calendar
 
Jan 30 Wed James Norris and Jean Bertoin (Sheffield Probability Day) (Cambridge and ETH Zurich) Statistics Seminar
14:15 James Norris (Cambridge) 2.15 pm A consistency estimate for Kac's model of elastic collisions in a dilute gas.

Jean Bertoin (ETH Zurich) 3.45 pm The 2012 Applied Probability Trust Lecture: Almost giant clusters for percolation on large trees with logarithmic heights.
LT 7
  Abstract:
Abstract for James Norris's talk:
Kac's process is a natural stochastic particle model, of mean field type, for the evolution of particle velocities under elastic collisions. Formally this should converge to the spatially homogeneous Boltzmann equation in the large particle number limit. In one of the physically interesting cases, namely hard sphere collisions, this was proved by Sznitman. We will discuss a new proof this result, which leads to some quantitative refinements, based on the simple approach of treating the martingale decomposition for linear functions of Kac's process as a random perturbation of Boltzmann's equation.

Abstract for Jean Bertoin'ss talk:
We consider Bernoulli bond percolation on a tree with size $n\gg 1$, with a parameter $p(n)$ that depends on the size of that tree. Our purpose is to investigate the asymptotic behavior of the sizes of the largest clusters for appropriate regimes. We shall first provide a simple characterization of tree families and percolation regimes which yield giant clusters, answering a question raised by David Croydon. In the second part, we will review briefly recent results concerning two natural families of random trees with logarithmic heights, namely recursive trees and scale-free trees. We shall see that the next largest clusters are almost giant, in the sense that their sizes are of order $n/\ln n$, and obtain precise limit theorems in terms of certain Poisson random measures. A common feature in the analysis of percolation for these models is that, even though one addresses a static problem, it is useful to consider dynamical versions in which edges are removed, respectively vertices are inserted, one after the other in certain order as time passes.
Click here to insert in your Google calendar
 
Feb 7 Thu Amy Baddeley and Stefan Blackwood Statistics Seminar
14:00 Any Baddeley: Using Bayes Factors to analyse fine-mapped genotype data

Stefan Blackwood: Partially observed systems
K14
  Abstract:
Abstract for Amy Baddeley's talk:
Recent developments in genetic analysis mean that we have been able to identify many associations between genetic variants and common diseases. However, it is likely that most of the variants identified so far are not actually the causal variants, but are in fact confounders. Now the priority is shifting to identifying the causal variant in a disease association region (fine-mapping). Methods utilised in published studies to identify causal variants include the likelihood ratio (LR) and other frequentist methods. However, high levels of correlation, rare causal variants and those with small effect sizes mean such analyses may not work in all situations. The restrictive effects of these may be partially countered by incorporating functional biological information into an analysis.
I will begin by giving a brief introduction to the genetic setting of the problem and the problem itself. I will then outline a general framework of analysis, "filtering", and the main method that will be presented uses the Bayes Factor (BF) in this framework. BF is the ratio of the probability of the data under alternative and null hypotheses, with a larger value indicating more evidence in favour of the alternative hypothesis. I will show the results of analyses using realistic simulated datasets and explore using fairly uninformative priors compared to using priors based on functional data. Our results indicate that BFs are a promising tool for incorporating functional information into fine-mapping studies.

Abstract for Stefan Blackwood's talk:
Suppose you have a random system which is not directly observable, instead you have a sequence of partial observations. Using the information gathered from these observations, what can we infer about the underlying system? Using stochastic models to make these deductions is known as stochastic filtering.
During this talk I will provide a brief account of linear and non-linear stochastic filtering in the presence of Lévy noise and their respective cornerstones the Kalman Bucy filter, and the Zakai equation.
Click here to insert in your Google calendar
 
Feb 14 Thu Elizabeth Boggis and Samuel Touchard Statistics Seminar
13:30 Elizabeth Boggis:
Exploiting Bayesian Shrinkage within a Linear Model Framework to identify Exome Sequence Variants associated with Gene Expression

Samuel Touchard:
MicroRNA predictions using Bayesian graphical models
K14
  Abstract:
Elizabeth's Abstract:
Next-Generation exome sequencing identifies thousands of DNA sequence variants in each individual. Methods are needed that can effectively identify which of these variants are associated with changes in gene expression. The Normal-Gamma prior has been shown to induce effective and flexible shrinkage in the Bayesian linear model framework (Griffin and Brown 2010). Using simulated data we assess the efficacy and limitations of this Bayesian shrinkage framework in parsimoniously identifying such sequence variants. We further develop a Bayesian linear model to include the uncertainty in gene expression; SNP functional information obtained from on-line databases; and the uncertainty in the allele calls as quantified by the quality score.

Samuel's Abstract:
In this presentation we describe miRNA networks for patients suffering from Acute Coronary Syndrome (ACS). miRNA are non-coding RNAs that regulate gene expression. We are interested in building an association network, which will identify (within quantifiable uncertainty) miRNAs that regulate particular genes (or groups of genes) and thus providing important information of genetic functionality or dis-functionality. Data were collected, consisting of gene expression levels of miRNAs and mRNAs of patients who suffer from ACS. RNA was extracted from blood samples at two time points, and expression levels were quantified with affymetrix genechip arrays and normalised using puma package for microarray data analysis. The method is broken down to 3 stages. In the first stage a dimensionality reduction is performed; using TargetScan association scores the miRNA expressions are narrowed down, as are the gene expressions by using distance similarity procedures such as clustering and latent process decomposition. In the second stage a Bayesian graphical model is proposed, according to which associations of gene expressions and miRNA expressions are inferred and an association matrix is extracted. The methodology uses simulation-based methods, in particular Markov chain Monte Carlo, and benefits by managing uncertainty at a complex network. Finally, in the third stage and using the association matrix the network is constructed. Some extensions of this model will be discussed.
Click here to insert in your Google calendar
 
Feb 21 Thu Steven Perkins (Bristol) Statistics Seminar
14:00 Stochastic Fictitious Play with Continuous Action Sets
K14
  Abstract:
Stochastic approximation is a widely used tool which allows the limiting behaviour of a stochastic, discrete time, learning procedures on $\mathbb{R}^K$ to be studied using an associated continuous time, deterministic, dynamical system. We extend the asymptotic pseudo-trajectory approach to stochastic approximation so that the processes can take place on any Banach space. This allows us to consider an iterative process of probability measures (or probability densities) on a compact subset of $\mathbb{R}$ as opposed to the regular stochastic approximation framework which is limited to probability mass functions on $\mathbb{R}^K$.

A common application of stochastic approximation in game theory is to study the limiting behaviour of a discrete time learning algorithm, such as stochastic fictitious play, in normal form games. However, whilst learning dynamics in normal form games are now well studied, it is not until recently that their continuous action space counterparts have been examined. Our Banach space stochastic approximation framework shows that in a continuous action space game the limiting behaviour of stochastic fictitious play can be studied using the associated smooth best response dynamics on the space of finite signed measures. We show that stochastic fictitious play will converge to an equilibrium point in single population negative definite games, two-player zero-sum games and $N$-player potential games, when they have Lipschitz continuous rewards over a compact subset of $\mathbb{R}$.
Click here to insert in your Google calendar
 
Feb 28 Thu Marian Farah (MRC Cambridge) Statistics Seminar
14:00 Bayesian Emulation and Calibration of a Dynamic Epidemic Model for H1N1 Influenza
K14
  Abstract:
Increasingly, mechanistic epidemic models are playing an important role in strategies for epidemic management. In the attempt to control an epidemic, the goal of model development is to provide efficient estimation of model parameters to allow timely assessment and prediction of the epidemic evolution as new data become available. In this work, we address the problem of efficient parameter estimation in the context of a model for H1N1 influenza, implemented as a dynamic computer simulator. We propose an efficient approximation to the dynamic simulator using an emulator, a statistical model, that combines a Gaussian process prior for the output function of the simulator with a dynamic linear model for its evolution through time. This modelling framework is both flexible and tractable, resulting in efficient posterior inference through Markov Chain Monte Carlo. We illustrate the proposed methodology using simulated H1N1 influenza epidemic data.
Click here to insert in your Google calendar
 
Mar 7 Thu Dennis Prangle (Lancaster) Statistics Seminar
14:00 Summary statistics for likelihood-free model choice
LT C
  Abstract:
A central statistical goal is to choose between alternative explanatory models. This work is motivated by population genetic models, which are typically complicated stochastic processes whose likelihoods are numerically intractable. Hence it is not possible to use statistical methods based on evaluating likelihood functions. Approximate Bayesian computation (ABC) is a commonly used likelihood-free method for such situations. ABC simulates data for many parameter values under each model and compares these to the observed data. The comparison is based on vectors of summary statistics of the data. More weight is given to models which produce simulated vectors close to that for the observations. The choice of summaries turns out to be crucial to the efficiency and accuracy of the inference algorithm. This talk presents a method to select good summary statistics for ABC model choice. An application is also presented, choosing between demographic models of Campylobacter jejuni, a bacterial pathogen responsible for a large proportion of gastroenteritis cases.
Click here to insert in your Google calendar
 
Mar 14 Thu Keith Worden (Sheffield - Mechanical Engineering) Statistics Seminar
14:00 Applications of Probability and Statistics in Structural Dynamics
LT C
  Abstract:
Probability and statistics are vital tools in the modern analysis of structural dynamic systems. This is partly because many of the forces which excite the structures we are interested in are random and partly because many of the measurements and processes we study are (sometimes extremely) uncertain. This talk will present some applications of probability and statistics made in the Dynamics Research Group in Sheffield in recent years. Topics covered may include the design of damage detection systems based on statistical pattern recognition; removal of artefacts from data using concepts from econometric time series analysis; Bayesian sensitivity analysis of large nonlinear models and modelling of nonlinear dynamical systems using Markov Chain Monte Carlo methods.
Click here to insert in your Google calendar
 
Mar 21 Thu John Stevens (ScHaRR) Statistics Seminar
14:00 Health Technology Assessment: A Day in the Life of a HEDS Statistician
K14
  Abstract:
Health technology assessment (HTA) typically involves comparing the population mean costs and benefits of two or more interventions. The assessment is done using a decision analytic model over a lifetime horizon which gives rise to structural and parameter uncertainty. After introducing the current decision rule based on the incremental costeffectiveness ratio, we will discuss some of the statistical issues involved in an HTA such as making comparisons between treatments that have not been compared in randomised controlled trials (RCTs); the extrapolation of evidence beyond the duration of a trial to estimate population mean survival; modelling non-fatal events such as development of Type 2 diabetes; modelling bivariate outcomes such as progression-free survival and death. In some cases, methods are available that are not well known in the health economic literature, whilst others depend on the format of the data and the amount of data that is available.
Click here to insert in your Google calendar
 
Apr 11 Thu Heather Battey (Bristol) Statistics Seminar
14:00 Nonparametric estimation of a multidimensional density: some recent theory and methodology.
K14
  Abstract:
Density estimation is one of the most actively studied challenges in statistics. Whilst fully agnostic estimators can be appealing in low dimensions, the performance of such estimators deteriorates rapidly for a fixed sample size as the number of dimensions grows. This provides motivation for estimating within a restricted subset of the set of all p-dimensional Lebesgue densities, thereby reducing estimation error, even if this produces some approximation error when the constraint is not satisfied.

In the first half of the talk, I will consider the restriction to the class of p-dimensional elliptic densities and, within this framework, present a two-stage nonparametric estimator for the Lebesgue density based on Gaussian mixture sieves. Under the on-line Exponentiated Gradient (EG) algorithm of Helmbold et al. (1997) and without restricting the mixing measure to have compact support, the estimator produces estimates converging uniformly in probability to the true elliptic density at a rate that is independent of the dimension of the problem. The rate performance (and optimal tuning parameter) associated with our estimator depends on the tail behaviour of the underlying density rather than on smoothness properties, and we provide a rule of thumb for estimating the relevant quantity based on observables. Although the rule of thumb is based on a particular member of the elliptic class, simulations indicate that the procedure generalises to other members of this class.

In the second half of the talk, I will present some ongoing work on multidimensional density estimation. I will introduce a new class of procedures that are attractive in that they offer both flexibility and the possibility of incorporating constraints, whilst possessing a succinct representation which may be stored and evaluated easily. The latter property is of paramount importance when dealing with large datasets, which are now commonplace in many application areas. In a simulation study, we show that our approach is universally unintimidated across a range of data generating mechanisms, and can often outperform popular nonparametric estimators. Moreover, its performance is shown to be robust to the choice of tuning parameters, which is an important practical advantage of our procedure. The estimator is implemented in a binary classification task arising in medical statistics.
Click here to insert in your Google calendar
 
Apr 18 Thu Jochen Einbeck (Durham) Statistics Seminar
14:00 Principal curves and surfaces: Data visualization, compression, and beyond
K14
  Abstract:
Principal curves and surfaces have been proposed about two decades ago as a tool for nonlinear dimension reduction. Descriptively, they can be defined as smooth objects (of dimension 1 and 2, respectively) capturing the "middle" of a (potentially high-dimensional) data cloud.

Though a relatively large amount of literature has discussed methods and algorithms for the estimation of principal curves and surfaces, most of this research stops here, and does not consider exploiting the fitted curve or surface once it is established. One may find this surprising, as the parametric analogue, linear principal component analysis, is rarely used as an end in itself, but unfolds is power only when used as an integrated data compression step for some high- dimensional, say, regression or classification problem.

One reason for this reluctance may be that several rather cumbersome technicalities, such as the computation of distances or projection indexes, need to be solved before a fitted principal curve or surface can be used for further inferential purposes such as regression or classification. In this talk, we describe briefly how such problems can be resolved, and give some examples, stemming from current collaborative work, which illustrate how "local" principal curves and surfaces can be efficiently used as a nonparametric dimension reduction tool, enabling further statistical analysis based on the fitted principal object. We will focus on a case study involving the compression of the thermochemical state space of chemical combustion systems.
Click here to insert in your Google calendar
 
Apr 25 Thu Alex Mijatovic (Imperial) Statistics Seminar
14:00 A new look at short-term implied volatility in asset price models with jumps
K14
  Abstract:
This talk discusses the implied volatility smile for options close to expiry in the exponential Lévy class of asset price models with jumps. We introduce a new renormalisation of the strike variable with the property that the implied volatility converges to a non-constant limiting shape, which is a function of both the diffusion component of the process and the jump activity (Blumenthal-Getoor) index of the jump component. Our limiting implied volatility formula relates the jump activity of the underlying asset price process to the short end of the implied volatility surface and sheds new light on the difference between finite and infinite variation jumps from the viewpoint of option prices: in the latter, the wings of the limiting smile are determined by the jump activity indices of the positive and negative jumps, whereas in the former, the wings have a constant model-independent slope. This result gives a theoretical justification for the preference of the infinite variation Lévy models over the finite variation ones in the calibration based on the short-maturity option prices.
Click here to insert in your Google calendar
 
May 2 Thu Christopher Hunter (Sheffield - Chemisty) Statistics Seminar
14:00
K14
Click here to insert in your Google calendar
 
May 9 Thu Idris Eckley (Lancaster) Statistics Seminar
14:00 Coherence analysis of multivariate time series
K14
  Abstract:
Data collection systems are widely used within our everyday lives. For example within the energy sector they are used to record process activity within energy generations sites. These loggers are capable of sampling data at high rates, at a number of locations and recording multiple process aspects at each location. Such series are typically non-stationary in nature, with potentially time-varying dependence between the various series components. In this talk we consider the problem of modelling and estimating the coherence structure within such time series. In particular we focus on the challenge of identifying whether the dependence between a pair of components is direct or indirectly driven by other components of the series, illustrating our approach using examples taken from neuroimaging and wind energy.
Click here to insert in your Google calendar
 
May 16 Thu Peter Moerters (Bath) Statistics Seminar
14:00 Clustering in spatial preferential attachment networks
K14
  Abstract:
I define a class of growing networks in which new nodes are given a spatial position and are connected to existing nodes with a probability mechanism favouring short distances and high degrees. The competition of preferential attachment and spatial clustering gives this model a range of interesting properties. Empirical degree distributions converge to a limiting power law, and the average clustering coefficient of the networks converges to a positive limit. A phase transition occurs in the global clustering coefficients and empirical distribution of edge lengths. The talk is based on joint work with Emmanuel Jacob (ENS Lyon).
Click here to insert in your Google calendar
 
Oct 3 Thu Stephen Connor (York) Statistics Seminar
14:00 Mixing time for a random walk on a ring
  Abstract:
We consider a variant of a process used in random number generation, and previously studied by Chung, Diaconis and Graham. This a random walk on the integers mod n (n odd), which at each step either increments by 1 or doubles its value, but where the probability of doubling is a decreasing function of n. We use a mixture of representation theory and probability to show that the total variation distance for this process exhibits a cutoff phenomenon. This is joint work with Michael Bate (York).
Click here to insert in your Google calendar
 
Oct 9 Wed Andreas Kyprianou (Bath) Statistics Seminar
14:15 Censored Stable Processes
LT6
  Abstract:
We look at a general two-sided jumping strictly alpha-stable process where alpha is in (0,2). By censoring its path each time it enters the negative half line we show that the resulting process is a positive self-similar Markov Process. Using Lamperti's transformation we uncover an underlying driving Lévy process and, moreover, we are able to describe in surprisingly explicit detail the Wiener-Hopf factorization of the latter. Using this Wiener-Hopf factorization together with a series of spatial path transformations, it is now possible to produce an explicit formula for the law of the original stable processes as it first *enters* a finite interval, thereby generalizing a result of Blumenthal, Getoor and Ray for symmetric stable processes from 1961. This is joint work with Juan Carlos Pardo and Alex Watson.
Click here to insert in your Google calendar
 
Oct 9 Wed Thomas Mikosch (Copenhagen) Statistics Seminar
15:45 Power Law Tails in Applied Probability - Some Recent Developments. [The 2013 Applied Probability Trust Lecture]
  Abstract:
For many decades, regular variation has been a useful tool in various areas of applied probability theory, including queuing, branching, renewal theory, stochastic networks, time series analysis, extreme value theory, insurance, and tails (i.e., distributions with power law tails) naturally appear as limits for normalized and centered maxima and sums of independent and identically distributed random variables or as domain of attraction condition for such limit laws. However, models whose components have power law tails are not always motivated by asymptotic theory; regular variation is a convenient way of describing unusually large values, for example, catastrophic claims in an insurance portfolio, large and long transmission times in the Internet, big losses/gains on the stock market, etc. Since the encyclopedia Regular Variation by N. Bingham, C. Goldie and J. Teugels (Cambridge UP) appeared in 1987, various extensions and modifications of regular variation have been successfully developed and applied. In this talk, we consider some newer developments. Those include the notion of a regularly varying time series (i.e., the finite-dimensional distributions of such a series have power law tails), functional regular variation of stochastic processes, random fields and random sets, and large deviations of regularly varying structures.
Click here to insert in your Google calendar
 
Oct 10 Thu Ziyad Alhussain (Sheffield) Statistics Seminar
14:00 Eliciting beliefs about a variance parameter
Hicks Seminar Room J11
  Abstract:
In eliciting an expert's opinion, we ask the expert to report judgements about the observable quantity. Then we fit those judgements into a probability distribution that best describes the expert's beliefs. One of the challenges in elicitation is to make direct judgements about the variance parameter of the normal distribution. Hence, we aim to find an elicitation method that best fits the expert's opinion about the variance into a probability distribution.
In this talk, I will present two elicitation methods that attempt to fit the expert's judgements about the variation of normally distributed data into a probability distribution. The first method depends on Bayes' theorem where the expert is asked to update the initial judgements given hypothetical data. We then illustrate that the expert may find difficulty in updating judgements using Bayes' theorem. Therefore, we propose an elicitation method that does not depend on Bayes' theorem, easier to use and works for the assumption of conjugate and non-conjugate prior distributions. We conclude by an interactive example using a proposed software tool.
Click here to insert in your Google calendar
 
Oct 10 Thu Fatimah Aloef (Sheffield) Statistics Seminar
14:00 Bayesian experimental design in health economics
Hicks Seminar Room J11
  Abstract:
In health economics, the concept of health care evaluation refers to identify, measure, value and compare the cost as well as the benefits of different health care innervations to allocate the limited health recourses wisely. Cost-Effectiveness Analysis (CEA) has been the most widely used method to derive such allocation decisions, especially for those at the National Institute for Health and Clinical Excellence (NICE) in the UK. This evaluation technique uses Quality Adjusted Life Years (QALYs) as an outcomes measure in order to be able to compare different health care interventions directly. There are different techniques to measure the "Q" part of this quantity which reflects the quality of life for health outcomes, namely utility. Recently, there has been an increase interest in using Discrete Choice Experiments (DCEs) to elicit health state utilities as an alternative for the cardinal methods.

Utilities are required for all health states defined by a classification system. However, discrete choice data is collected to a subset of health states, and then a model fitted to estimate the utilities for any health state defined by the classification system. Thus, an optimal choice design is required to estimate the utilities within QALYs framework precisely. In this talk I will consider the problems of constructing choice design for health evaluation purpose. Particularly, anchoring health utility values produced by the DCE into 0-1(dead-full health) scale to be used within QALY framework, the dependency problem of optimum choice design on the unknown choice model's parameters, and simplifying the choice task and its effect on the design efficiency. The experimental design used in our work is illustrated through a pair-wise comparison of practical health example, AQL-5D classification system.
Click here to insert in your Google calendar
 
Oct 17 Thu Dennis Prangle (Lancaster) Statistics Seminar
14:00 Summary statistics for likelihood-free model choice
  Abstract:
A central statistical goal is to choose between alternative explanatory models. This work is motivated by population genetic models, which are typically complicated stochastic processes whose likelihoods are numerically intractable. Hence it is not possible to use statistical methods based on evaluating likelihood functions. Approximate Bayesian computation (ABC) is a commonly used likelihood-free method for such situations. ABC simulates data for many parameter values under each model and compares these to the observed data. The comparison is based on vectors of summary statistics of the data. More weight is given to models which produce simulated vectors close to that for the observations. The choice of summaries turns out to be crucial to the efficiency and accuracy of the inference algorithm. This talk presents a method to select good summary statistics for ABC model choice. An application is also presented, choosing between demographic models of Campylobacter jejuni, a bacterial pathogen responsible for a large proportion of gastroenteritis cases.
Click here to insert in your Google calendar
 
Oct 24 Thu Lindsey Lee (School of Earth and Environment - Leeds University ) Statistics Seminar
14:00 Statistical Methods for Understanding Uncertainty in a Global Aerosol Model
Hicks Seminar Room J11
  Abstract:
Uncertainty is inherent in the modelling of complex processes associated with climate science. Model uncertainty arises in any computer model that is restricted in terms of computational power and current knowledge but can broadly be defined in terms of input, parametric and structural uncertainty.

Structural uncertainty can be considered by comparing outputs from different computer models. A lot of progress has been made in quantifying the effect of structural uncertainty on aerosol model predictions through the AEROCOM project. We have made progress in the quantification and understanding of parametric and input uncertainty by application of statistical methods in the NERC AEROS project.

In this talk I will explain the statistical methods that have been applied in the AEROS project to help us understand and quantify parametric uncertainty in the GLOMAP aerosol model. These methods include expert elicitation, experimental design, emulation and sensitivity analysis. I will then show some of the results we have from applying these methods to study 28 uncertain parameters (and emissions) and their effects on GLOMAP model predictions.
Click here to insert in your Google calendar
 
Oct 31 Thu Michael Salter-Townshend (University College Dublin) Statistics Seminar
14:00 Modelling Multiple Social Relations
Hicks Seminar Room J11
  Abstract:
Social network analysis is the rapidly expanding field that deals with interactions between individuals or groups. The literature has tended to focus on single network views, i.e. networks comprised of a group of nodes with a single type of link between node pairs. However, nodes may interact in different ways with the same alters. For example, on twitter one user may retweet, follow, list or message another user. There are thus 4 separate networks to consider. Current approaches include examining all network views independently or aggregating the different views to a single super network. Neither of these approaches are satisfying as the interaction between relationship types across network views is not explored.

We are motivated by an example consisting of the census of 75 villages in the Karnataka province in India. The data was collated for use by a microfinance company and 12 different link types are recorded. We develop a novel method for joint modelling of multiview networks as follows; we begin with the popular latent space model for social networks and then extend the model to multiview networks through the addition of a matrix of interaction terms. The theory behind this extension is due to emerging work on Multivariate Bernoulli models. We first present the theory behind our new model. We then explore the relationship between the interaction terms and the correlation of the links across network views and finally we present results for the Karnataka dataset.

Inference is a challenge and we adopt the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo for Bayesian inference.
Click here to insert in your Google calendar
 
Nov 7 Thu Peter Neal (Lancaster) Statistics Seminar
14:00 MCMC for a birth-death-mutation (BDM) model
Hicks Seminar Room J11
  Abstract:
A birth-death-mutation (BDM) model has been used by a number of authors to model the evolution of a tuberculosis epidemic in San Francisco in the early 1990s. The observed data is assumed to be a cross-sectional study of the tuberculosis outbreak. It is impossible to write down the likelihood for the model without substantial, non-trivial data augmentation which prohibits the use of standard MCMC algorithms. However it is trivial to simulate a realisation of the BDM model and ABC algorithms have been used to estimate the parameters of the BDM model. Starting from the ABC perspective that simulation is straightforward, we construct an MCMC algorithm which uses simulation. Specifically we use a non-centered parameterisation which enables us to treat the simulation process as a data augmentation problem and takes similar amounts of time per iteration as the ABC algorithms. The MCMC algorithm is successfully applied to the San Francisco tuberculosis data.
Click here to insert in your Google calendar
 
Nov 21 Thu Axel Finke (Warwick) Statistics Seminar
14:00 Static-parameter estimation in piecewise deterministic processes using particle Gibbs samplers
Hicks Seminar Room J11
  Abstract:
We give a brief introduction to recent advances in sequential Monte Carlo and pseudo-marginal MCMC methods as well as to piecewise deterministic processes (PDPs). The latter form a class of stochastic processes that jump randomly at a countable number of stopping times but otherwise evolve deterministically in continuous time. We then develop a particle Gibbs sampler for static-parameter estimation in PDPs that are observed only partially, noisily and in discrete time. We present a reformulation of the original particle filter for PDPs. This permits the use of a variance-reduction technique known as ancestor sampling that greatly improves mixing of the particle Gibbs chain. We compare our method with a particle Gibbs sampler based on the variable rate particle filter. Our approach is further illustrated on a shot-noise-Cox-process model that has applications in finance. This is joint work with Adam Johansen and Dario Spanò.
Click here to insert in your Google calendar
 
Nov 28 Thu Marton Balazs (Bristol) Statistics Seminar
14:00 Anomalous fluctuations in one dimensional interacting systems
Hicks Seminar Room J11
  Abstract:
I will describe a family of one dimensional interacting particle systems that contains the simple exclusion and the zero range processes, and many more. In the stationary distribution the current fluctuations show anomalous scalings, I will sketch parts of the proof of this phenomenon for some of our models. Meanwhile I will try to make it clear how convexity of a function of central importance leads to such unusual behaviour. The technical point that prevents us from proving anomalous scaling in great generality will also be pointed out. Our methods work with probabilistic arguments and couplings, hence it might give more intuition than alternative existing techniques of heavy combinatorics and analysis.
Click here to insert in your Google calendar
 
Dec 5 Thu Keith Harris (Sheffield) Statistics Seminar
14:00 Bayesian hierarchical models for microbial metagenomics
Hicks Seminar Room J11
  Abstract:
In this talk, we will introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored all these features. The Dirichlet mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. Our results suggested that obesity is not associated with a distinct microbiota but instead increases the chance that an individual derives from a disturbed enterotype.

We will also show how the Dirichlet multinomial framework for defining enterotypes can be adapted to develop a Bayesian approximation to the Unified Neutral Theory of Biodiversity in ecology, which has been proposed as a null model for the structure of microbial communities. The approximation was developed as the existing maximum likelihood based genealogical approach for fitting the multi-site UNTB is too computationally demanding for the large datasets typically encountered in microbiomics. The key to our strategy is the observation that the UNTB is, in the limit of large population sizes, equivalent to the hierarchical Dirichlet process (HDP) in statistics, which can be exploited to derive an efficient Gibbs sampler for the neutral model. We firstly validated this method by applying it to synthetic data and twenty-nine tropical tree plots from Panama that had already been shown to satisfy the neutral model. We then used it to determine the extent to which gut microbial communities are neutrally assembled.
Click here to insert in your Google calendar
 
Dec 12 Thu Rocio Campos (Sheffield) Statistics Seminar
14:00 Statistical approach to systems biology and human nutrition: building a novel biological network around metabolic programming of health outcomes influenced by nutrients during lactation
Hicks Seminar Room J11
  Abstract:
Human milk contains a host of bioactive factors including hormones, growth factors, neuropeptides, anti-inflammatory and immunomodulatory components, as well as multiple nutrients as minerals, vitamins, amino acids and fatty acids. In addition, milk contains known and unknown molecules with important metabolic regulatory functions. Basic milk composition has already been established in the 60s, but this knowledge can be improved thanks to novel analytical techniques and systems biology approaches. Now we propose a nutrigenomic-based characterization of milk composition in order to get a full comprehensive view of milk characteristics and its role in infant growth. Moreover, the recent finding of microRNAs, with gene regulatory functions, in human milk is one of the key points that will be studied in this project. Therefore, this proposal intends to define relationships between molecular milk components and the potential influence of maternal diet on both milk composition and infant growth. Specifically, we are going to focus on the first two years of life and try to define (according to experimental models already developed in our home groups) potential adulthood predisposition to metabolic diseases, in particular on obesity.
Click here to insert in your Google calendar
 
Dec 12 Thu Martin Legarreta Statistics Seminar
14:00 Mapping of badger territories from field data
Hicks Seminar Room J11
  Abstract:
European badgers are animals that defend their territories not only with direct aggression but also through the use of detectable signs such as latrines. The aim of the research is to reconstruct maps of badger territories from data collected through bait-marking, where plastic markers placed in bait have been recovered after excretion and the spatial locations of latrines recorded. Latrines can be classified into three types: hinterland, boundary and outliers i.e. those from extraterritorial excursions. We have developed a Conditional Outlier Prediction Model which uses logistic regression to estimate the probability that a latrine is an outlier, based on its location, the types of other latrines in the same direction and other covariate information. This research extends previous work by estimating joint probabilities that multiple latrines are outliers and, combined with the Minimum Convex Polygon method, allowing the reconstruction of boundaries and quantifying the uncertainty in the reconstruction of a territory.
Click here to insert in your Google calendar
 
Feb 13 Thu Partha Dey (Warwick) Statistics Seminar
14:00 Multiple phase transitions in long-range first-passage percolation on square lattices.
Hicks Seminar Room J11
  Abstract:
We consider a model of long-range first-passage percolation on the d-dimensional square lattice in which any two distinct vertices x, y are connected by an edge having exponentially distributed passage time with mean $|x-y|^s$, where $s>0$ is a fixed parameter and $|.|$ is the $l_1$--norm on $Z^d$. We analyze the asymptotic growth rate of the set $B_t$, which consists of all $x \in Z^d$ such that the first-passage time between the origin 0 and $x$ is at most $t$, as $t\to\infty$. We show that depending on the values of $s$ there are four growth regimes:
  • instantaneous growth for $s < d$,
  • stretched exponential growth for $s\in (d,2d)$,
  • superlinear growth for $s\in (2d,2d+1)$ and finally
  • linear growth for $s>2d+1$ like the nearest-neighbor first-passage percolation model corresponding to $s=\infty$.
We find explicit growth rates and also analyze the behavior at the boundary values $s=d,2d,2d+1$. Based on joint work with Shirshendu Chatterjee.
Click here to insert in your Google calendar
 
Feb 20 Thu Mohamed Shakandli (Sheffield) Statistics Seminar
14:00 Particle filtering applied to medical time series
Hicks Seminar Room J11
  Abstract:
This talk concerns the set-up and application of particle filtering to medical time series. Considering count time series (such as number of asthma patients recorded over time) we discuss and propose non-linear and non-Gaussian state space models, in particular dynamic generalized linear models (DGLMs). Inference and forecasting is achieved by employing sequential Monte Carlo methods, also known as particle filters. These are simulation based methods that can be used for tracking and forecasting dynamical systems subject to both process and observation noise in non-linear and non-Gaussian models.
Click here to insert in your Google calendar
 
Mar 6 Thu Penny Watson (ScHaRR - University of Sheffield) Statistics Seminar
14:00 The Use of Health Economic Methods in the Development of New Interventions for Systemic Lupus Erythematosus
Hicks Seminar Room J11
  Abstract:
I aim to evaluate alternative trial designs for a new intervention for systemic lupus erythematosus (SLE) from the perspective of a pharmaceutical company. The cost-effectiveness of new treatments for SLE can be evaluated in a cost-effectiveness simulation describing individual patient disease pathways and the costs and health outcomes associated with them. The CE model for SLE included using SLE registry data to describe long-term outcomes, and simulated Phase II trial outcomes to describe treatment efficacy. I developed a Bayesian Clinical Trial Simulation for a Phase III SLE trial to evaluate the value of trials with alternative design characteristics. I describe an analytic method to compare SLE Phase III RCTs with variable sample size and duration of follow-up. The BCTS was used to simulate trial datasets given a particular design specification. The trial data were combined with prior parameters of the CE model to estimate posterior densities for the CE model inputs and update the outcomes of the CE model. Initially Bayesian updating was completed using a traditional calculation using Markov Chain Monte Carlo Simulation (MCMC) in WinBUGS. However, this method would take years to generate results. An approximation method was used to speed up analysis time. I will present the outcomes of the analysis from 1,600 BCTS iterations and discuss the limitations of value of information analyses for complex diseases.
Click here to insert in your Google calendar
 
Mar 13 Thu Jeremy Oakley (Sheffield) Statistics Seminar
14:00 Bayesian calibration for computer models using likelihood emulation
Hicks Seminar Room J11
  Abstract:
I will start by giving a short overview of the field of "Uncertainty Quantification": a variety of problems related to uncertainty in mathematical models of physical systems. I will then present some recent work (in collaboration with Ben Youngman) on calibration: finding model inputs such that the model outputs fit physical observations. Our approach is motivated by a case study involving a natural history model for colorectal cancer patients. The model is stochastic and computationally expensive, which inhibits evaluation of the likelihood function. We use a history matching approach, where we first exclude regions of input space where we can easily identify poor fits. We then construct an "emulator" (a fast statistical approximation) of the likelihood, which is used within importance sampling to sample from the posterior distribution of the computer model inputs.
Click here to insert in your Google calendar
 
Mar 20 Thu Marina Knight (York) Statistics Seminar
14:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
Apr 3 Thu Tom Stafford (Psychology, University of Sheffield) Statistics Seminar
14:00 Measuring the learning curve (n= 854,064)
Hicks Seminar Room J11
  Abstract:
I will present the results of a study of learning in players of a simple online game. In contrast to the high-precision experimental tasks which are common in experimental psychology, this study leverages the statistical power gained by having a study population of 854,064 people.
Use of game data allowed us to connect, for the first time, rich details of training history with measures of performance from participants engaged for a sustained amount of time in effortful practice. We showed that lawful relations exist between practice amount and subsequent performance, and between practice spacing and subsequent performance. Our methodology allowed an in situ confirmation of results long established in the experimental literature on skill acquisition. Additionally, we showed that greater initial variation in performance is linked to higher subsequent performance, a result we link to the exploration/exploitation trade-off from the computational framework of reinforcement learning.
All the raw data and analysis code is available online, an example of "open science".
Stafford, T. & Dewar, M. (2014). Tracing the trajectory of skill learning with a very large sample of online game players. Psychological Science, 25(2) 511-–518. http://pss.sagepub.com/content/25/2/511
Data and analysis code. https://github.com/tomstafford/axongame
Click here to insert in your Google calendar
 
May 8 Thu Chris Jackson (MRC Cambridge) Statistics Seminar
14:00 Comparing structures of state-transition models for disease progression
Hicks Seminar Room J11
  Abstract:
Stochastic processes representing transitions between discrete states are often used to represent disease progression. Markov models are typical, and they may evolve in either discrete or continuous time. I will discuss the choice between models with different state-transition structures. The models will have some features in common, so that they can be used for the same purpose, such as estimating expected survival. For example, two adjacent states of disease severity could either be merged or separated, and we want to know which gives better estimates of survival. However, if the models are estimated from data at different levels of aggregation, standard likelihood-based model comparison methods do not apply since the likelihoods are on different scales.

In one common situation, the transition probabilites or rates are estimated from a single longitudinal dataset consisting of observations of the states of a number of individuals over time. In this case, a modification of AIC or cross-validation can be used to compare the predictive ability of different models assessed on the data which they have in common. In the models used in health economic evaluations, however, the transition probabilities can typically only be estimated from data aggregated over individuals, or indirect data. In this case, models with split and merged states can often be compared by defining constraints on the parameters in the larger model. This produces a proxy for the merged model that can be compared against the larger one using standard methods.

I will give examples from estimating the progression of health-related quality of life in psoriatic arthritis, and a health economic model for diagnostic tests for coronary artery disease.
Click here to insert in your Google calendar
 
May 15 Thu Student seminar - Sujunya and Joe (Sheffield) Statistics Seminar
14:00 Joe: Reconstructing the timescale of an ice-core

Sujunya: Bayesian Semi-supervised Classification for Satellite Imagery
Hicks Seminar Room J11
  Abstract:
Joe's abstract: The concentrations of various chemicals, particles and gases in ice-cores hold a continuous record of climatic and environmental information dating back hundreds of thousands of years. These data are recorded as a depth series and in order to meaningfully interpret them we must first learn about their underlying, unobserved timescale. We present a fully Bayesian bivariate approach to obtaining a marginal posterior distribution for the time of year, as well as the date, at any given depth.

Sujunya's abstract: The aim of our research is to develop a Bayesian classification model for combining the two data sources of multispectral satellite images and field survey data. It is motivated by a practical problem of remote sensing studies when we have a very small labelled sample from a ground survey and a substantial number of unlabelled-class pixels from satellite images. This problem can be solved according to a semi-supervised framework. We construct a semi-supervised model with mixture distributions as an incomplete-data problem, of which the unlabelled data are unknown classes. Then, we produce the Bayesian semi-supervised procedure by using two-step Gibbs sampling. To evaluate the proposed model, the experimental results of the real satellite images and the simulated data had been compared with the existing techniques, the ML supervised decision rule and the semi-supervised classification based on the EM algorithm. The numerical investigation has shown the benefits and the limitations of using the unlabelled data. In conclusion, I will discuss the strength and the weakness of semi-supervised techniques.
Click here to insert in your Google calendar
 
May 22 Thu Enrico Scalas, Tusheng Zhang (Sussex) Statistics Seminar
14:00 Enrico Scalas: On the compound fractional Poisson process

Tusheng Zhang: Strong Convergence of Wong-Zakai Approximations of Reflected SDEs in A Multidimensional General Domain
LT-4
  Abstract:
Enrico Scalas: The compound fractional Poisson process (CFPP) is a random walk subordinated to a fractional Poisson process (FPP). The latter is a simple generalisation of the Poisson process where waiting times between events follow a Mittag–Leffler distribution. Several results on both CFPP and FPP will be presented related to applications in different fields of science.

Tusheng Zhang: In this paper, we obtain the strong convergence of Wong-Zakai approximations of reflected stochastic differential equations in a general multidimensional domain giving an affirmative answer to a question posed by Evans and Stroock in their recent paper.
Click here to insert in your Google calendar
 
Oct 2 Thu Chris Farmer (Oxford) Statistics Seminar
14:00 Ensemble Variational Filters for Sequential Inverse Problems
LT7
  Abstract:
Given a model dynamical system, a model of any measuring instrument relating states to measurements, and a prior assessment of uncertainty, the probability density of subsequent system states, conditioned upon the history of the measurements, is of some practical interest. When measurements are made at discrete times, it is known that the evolving probability density is a solution of the discrete Bayesian filtering equations. This talk describes the difficulties in approximating the evolving probability density using a Gaussian mixture (i.e. a sum of Gaussian densities). In general this leads to a sequence of optimisation problems and high-dimensional integrals. Attention is given to the necessity of using a small number of densities in the mixture, the requirement to maintain sparsity of any matrices and the need to compute first and second derivatives of the misfit between predictions and measurements. Adjoint methods, Taylor expansions, Gaussian random fields and Newton's method can be combined to, possibly, provide a solution.
Click here to insert in your Google calendar
 
Oct 9 Thu Peter Young (Lancaster) Statistics Seminar
14:00 Refined Instrumental Variable Estimation: Maximum Likelihood Optimization of a Unified Box-Jenkins Model
LT7
  Abstract:
For many years, various methods for the identification and estimation of parameters in linear, discrete-time transfer function models have been available and implemented in widely available software environments, such as Matlab. This seminar considers a unified Refined Instrumental Variable (RIV) approach to the estimation of discrete and continuous-time transfer functions characterized by a unified operator that can be interpreted in terms of backward shift, derivative or delta operators. The paper shows that the resulting iterative RIV algorithm provides a reliable solution to the maximum likelihood optimization equations for an appropriately unified Box-Jenkins transfer function model and so its en bloc or recursive parameter estimates are optimal in maximum likelihood, prediction error minimization and instrumental variable terms. The backward shift and derivative operator versions of the algorithm are available as the RIVBJ and RIVCBJ routines in the freely available CAPTAIN Toolbox for Matlab and these have been used for Data-Based Mechanistic (DBM) modelling (see e.g Young, 2011) in areas ranging from engineering though economics and ecology to the environment. The seminar will describe a recent application where the RIVCBJ routine is used to identify and estimate a differential equation model of the latest globally averaged climate data.

P. C. Young (2011). Recursive Estimation and Time-Series Analysis: An Introduction for the Student and Practitioner, Springer-Verlag, Berlin.
Click here to insert in your Google calendar
 
Oct 23 Thu Claudie Beaulieu (National Oceanography Centre Southampton) Statistics Seminar
14:00 Detecting abrupt changes in the Earth’s climate system
LTD
  Abstract:
The Earth’s climate system and ecosystems exhibit abrupt changes and thresholds, which are especially challenging socio-economically due to the rapidity at which society has to adapt. Change-point detection techniques provide a valuable tool for the detection of abrupt changes in the climate and ecosystems. In this talk, the usefulness of change-point detection will be demonstrated through a range of applications. The possibility to anticipate abrupt changes will also be discussed.
Click here to insert in your Google calendar
 
Nov 13 Thu Joao Domingos Scalon (Department of Exact Sciences – Federal University of Lavras – Brazil) Statistics Seminar
14:00 Gibbs Point Processes for modelling spatial distribution of second-phase particles in composite materials
LTD
  Abstract:
Silicon carbide reinforced aluminium alloy composites are the typical candidates for engineering applications due to their enhanced mechanical properties over the corresponding aluminium alloys such as high strength and fatigue resistance. However, these mechanical properties can be highly sensitive to local variations in spatial distribution of reinforcement particles and, consequently, the analysis of such distribution is of prime importance in materials science. The aim of this seminar is to present Gibbs point processes as an intuitively appealing way for characterizing spatial patterns formed by the locations of second-phase particles in composite materials.
Click here to insert in your Google calendar
 
Nov 20 Thu Kamila Zychaluk (Liverpool) Statistics Seminar
14:00 Semi-parametric models for coral reef dynamics
LTD
  Abstract:
There are many mathematical models for the dynamics of coral reefs. Typically, these models assume the functional relationships that are responsible for changes in the reef community but there is often little evidence on which to choose the functional relationships. Furthermore, the parameters of such models are difficult to estimate. Instead, we propose a statistical model based on many data but relatively few assumptions. We use a large database of repeated observations of the composition of coral communities to make predictions about the dynamics of reef composition. We use our model to estimate a regional dynamic equilibrium in reef composition.

We have observations of the proportion of space occupied by three components (hard corals, macroalgae, and others). These observations were made in consecutive years at Caribbean, Kenyan and Great Barrier Reef sites. We assume that the state of the reef after one year follows a Dirichlet distribution with parameters dependent on the current state of the reef. These parameters are estimated using a local linear estimator with cross-validation bandwidth. These estimates are then used in a transition equation to obtain the stationary distribution of reef composition. The stationary distributions for the Caribbean and Great Barrier reef appear very different, in accordance with biological knowledge. These stationary distributions correspond to the dynamic equilibria for the two regions, if conditions remain as they are now. In addition to making predictions, our semi-parametric models provide a summary of the major features of reef dynamics, which more mechanistic models should be able to reproduce.

Joint work with Matthew Spencer, Damian Clancy, John F. Bruno and Tim McClanahan
Click here to insert in your Google calendar
 
Nov 27 Thu Duncan Lee (Glasgow) Statistics Seminar
14:00 Cluster detection and risk estimation for spatio-temporal health data
LTD
  Abstract:
In epidemiological disease mapping one aims to estimate the spatio-temporal pattern in disease risk and identify high-risk clusters, allowing health interventions to be appropriately targeted. Bayesian spatio-temporal models are used to estimate smoothed risk surfaces, but this is contrary to the aim of identifying groups of areal units that exhibit elevated risks compared with their neighbours. Therefore, in this paper we propose a new Bayesian hierarchical modelling approach for simultaneously estimating disease risk and identifying high-risk clusters in space and time. Inference for this model is based on Markov chain Monte Carlo simulation, using the freely available R package CARBayesST that has been developed in conjunction with this paper. Our methodology is motivated by two case studies, the first of which assesses if there is a relationship between Public health Districts and colon cancer clusters in Georgia, while the second looks at the impact of the smoking ban in public places in England on cardiovascular disease clusters.
Click here to insert in your Google calendar
 
Dec 4 Thu John Moriarty (Manchester) Statistics Seminar
14:00 A solvable two-dimensional degenerate singular stochastic control problem with non convex costs
LTD
  Abstract:
This optimisation problem is motivated by a storage-consumption model in an electricity market, and features a stochastic real-valued spot price modelled by Brownian motion. Although the possibility of negative prices makes the cost function neither convex nor concave, we show that the problem is nevertheless solvable and find analytical expressions for the value function, the optimal control and the boundaries of the action and inaction regions. Both boundaries may be interpreted as repelling, although interestingly the well known smooth fit condition holds at one boundary but not the other.
Click here to insert in your Google calendar
 
Dec 11 Thu Marina Knight (York) Statistics Seminar
14:00 Hurst exponent estimation for long-memory processes using wavelet lifting.
LTD
  Abstract:
Reliable estimation of long-range dependence (LRD) parameters, such as the Hurst exponent, is a well studied problem in the statistical literature. However, when the observed time series presents missingness or is naturally irregularly sampled, the current literature is sparse, with most approaches requiring heavy modifications. In this talk I shall present a technique for estimating the Hurst exponent of an LRD time series that naturally deals with the time domain irregularity. The method is based on a flexible wavelet transform built by means of the lifting scheme, and we shall demonstrate its performance.
Click here to insert in your Google calendar
 
Dec 18 Thu Vincent Bonhomme (Sheffield) Statistics Seminar
14:00
LTD
Click here to insert in your Google calendar
 
Feb 5 Thu Matt Nunes (Lancaster) Statistics Seminar
14:00 Analysis of time series observed on networks
Hicks Seminar Room J11
  Abstract:
In this talk we consider analysis problems for time series that are observed at nodes of a large network structure. Such problems commonly appear in a vast array of fields, such as environmental time series observed at different spatial locations or measurements from computer system monitoring. The time series observed on the network might exhibit different characteristics such as nonstationary behaviour or strong correlation, and the nodal series evolve according to the inherent spatial structure. The new methodology we develop hinges on reducing dimensionality of the original data through a change of basis. The basis we propose is a second generation wavelet basis which operates on spatial structures. As such, the (large) observed data is distilled down to key information on a reduced network topology. We discuss the potential of this dimension reduction method for time series analysis tasks. This is joint work with Marina Knight (University of York) and Guy Nason (University of Bristol).
Click here to insert in your Google calendar
 
Feb 26 Thu Sayan Banerjee (University of Warwick) Statistics Seminar
14:00 Maximal couplings and geometry
Hicks Seminar Room J11
  Abstract:
Maximal couplings are couplings of Markov processes where the tail probabilities of the coupling time attain the total variation lower bound (Aldous bound) uniformly for all time. Markovian couplings are coupling strategies where neither process is allowed to look into the future of the other before making the next transition. These are easier to describe and play a fundamental role in many branches of probability and analysis. Hsu and Sturm proved that the reflection coupling of Brownian motion is the unique Markovian maximal coupling (MMC) of Brownian motions starting from two different points. Later, Kuwada proved that to have a MMC for Brownian motions on a Riemannian manifold, the manifold should have a reflection structure, and thus proved the first result connecting this purely probabilistic phenomenon (MMC) to the geometry of the underlying space. In this work, we investigate general elliptic diffusions on Riemannian manifolds, and show how the geometry (dimension of the isometry group and flows of isometries) plays a fundamental role in classifying the space and the generator of the diffusion for which an MMC exists. We also describe these diffusions in terms of Killing vector fields (generators of rigid motions on manifolds) and dilation vector fields around a point. This is joint work with W.S. Kendall.
Click here to insert in your Google calendar
 
Mar 12 Thu Kevin Wilson (Strathclyde) Statistics Seminar
14:00 Expert judgement informed reliability growth models and the allocation of reliability tasks
Hicks Seminar Room J11
  Abstract:
There are many mathematical models in the literature for how a system’s reliability grows during development as a result of the Test, Analyse and Fix (TAAF) cycle. Most are based on convenient parametric forms and are extensions of simple models such as Poisson Processes. Often we can find one of these parametric models which fits our data well. However, parameters in such models are typically not observable and so eliciting a subjective prior distribution, which is often desirable due to a lack of observed data, is a challenging task. Further, engineers can be rightly sceptical of models based on parameters with no physical interpretation. In this talk we present a model for a reliability growth programme developed with engineering experts in the aerospace industry. All of the model parameters can be elicited from observable quantities and so priors can be specified directly. The model is used to identify an optimal subset of reliability tasks from a large number based on targets for cost, time on test and system reliability. The optimal subset is identified by maximising the prior expectation of a multi-attribute utility function.
Click here to insert in your Google calendar
 
Mar 19 Thu Gwilym Pryce (Sheffield Methods Institute) Statistics Seminar
14:00 Urban Inequalities in Exposure to Crime and the Impact on Education
Hicks Seminar Room J11
  Abstract:
This seminar will set out two statistical problems. First, how to measure crime exposure for each residential address in a city, and in particular, how to ascertain the optimal distance decay function for the crime exposure measure. Second, how to estimate the impact on school performance controlling for other factors. Both questions have important applications. Being able to measure crime exposure for individual address potentially overcomes the modifiable aerial unit problem associated with using averages for administrative areas, and allows us to better understand nuances in the spatial variation in crime and how these change over time. Developing robust measures of crime exposure is also the first step in enabling researchers better understand the true cost of crime in terms of the impact on of a variety of social factors including educational performance, health, well-being, house prices, and other life outcomes. The seminar will set out the main methodological challenges as the basis for discussion on how to best to design an appropriate research strategy.
Click here to insert in your Google calendar
 
Apr 16 Thu Nicos Georgiou (Sussex) Statistics Seminar
14:00 Geometric aspects of directed last passage percolation on the plane
Hicks Seminar Room J11
  Abstract:
In this talk we present the corner growth model -an infection spreading in an orderly way through the sites in the first quadrant- and explain certain geometric aspects of the infection spread. In particular, we are concerned with understanding the law of large numbers of the infection surface and the microscopic random infinite geodesics associated with the model. This talk is intended for a diverse audience.
Click here to insert in your Google calendar
 
Apr 23 Thu Student seminar: Christian Fonseca Mora and Jian Wang (Sheffield) Statistics Seminar
14:00 Christian: Stochastic partial differential equations with Lévy noise in some infinite dimensional spaces
Jian: Multivariate Stochastic Volatility Estimation using Particle Filters
Hicks Seminar Room J11
  Abstract:
Christian:
In this talk we consider stochastic evolution equations driven by Lévy noise in some infinite dimensional spaces. Such equations are important from a theoretical point of view and also because they have a wide range of applications. The spaces in which this equations take values are called duals of nuclear spaces and play an important role in different areas of mathematics as partial differential equations, harmonic analysis and probability in infinite dimensional spaces.
The talk is intended to be an introduction to the subject and to the main results that we have obtained so far.

Jian:
This presentation considers a modelling framework for multivariate volatility in financial time series. The talk will briefly review particle filtering or sequential Monte Carlo methods. An overview of the multivariate volatility modelling literature will be given. As most financial returns exhibit heavy tails and skewness, we are considering a model for the returns based on the skew-t distribution, while the volatility is assumed to follow a Wishart autoregressive process. We define a new type of Wishart autoregressive process and highlight some of its properties and some of its advantages. Particle filter based inference for this model is discussed and a novel approach of estimating static parameters is provided. Furthermore, an alternative for estimating the higher dimension data will be given. The proposed methodology is illustrated with two data sets consisting of asset returns of the FTSE-100 stock exchange and the current exchange rate.
Click here to insert in your Google calendar
 
Apr 30 Thu Janine Illian (University of St Andrews, St Andrews, UK and NTNU Trondheim, Norway) Statistics Seminar
14:00 Developing complex spatial models for the real world – a multi-disciplinary symbiosis
Hicks Seminar Room J11
  Abstract:
Strongly motivated by interdisciplinary research substantial advances have been made in the development of practically relevant, spatial statistical methodology. In the context of spatial point process models, this has been the case, in particular, for log Gaussian Cox processes. Facilitated by the recent development of efficient and very accurate approximation methods for fitting models based on spatial random fields it has become possible to develop and apply flexible and realistically complex spatial models without prohibitive computational cost (Rue et al. 2009; Lindgren et al. 2011, Illian et al. 2012a and b). The R library R-INLA has been instrumental in making these methods available to non-specialist users and promote their usage in practice. This talk outlines the mutual benefits of developing both methodology and software as part of a continuing dialogue between method developers and ecologists. Highlights of this symbiosis and recent developments resulting from it are presented. We illustrate these with a number of applications from ecology and beyond.
Click here to insert in your Google calendar
 
May 7 Thu Daniel Williamson (Exeter) Statistics Seminar
14:00 Posterior belief assessment: extracting meaningful subjective judgements from Bayesian analyses with complex statistical models
Hicks Seminar Room J11
  Abstract:
In a Bayesian analysis of any reasonable complexity, many, if not all of the prior and likelihood judgements we specify in order to make progress are not believed (or owned) by either analyst or subject expert. In what sense then, should we be able to attribute meaning to a large sample from the posterior distribution? Foundationally, is the posterior distribution a probability distribution at all and, if not, what is it and what can it be used for? In this talk I will present a methodology for extracting judgements for key quantities from a large Bayesian analysis. We call this Posterior Belief assessment and it is based on the idea that there are many other Bayesian analyses that you might have performed (where, for example, you used different prior/model forms for sub-components of the statistical model). We impose forms of exchangeability and co-exchangeability over key derived posterior quantities under each of these theoretical Bayesian analyses and use these, a handful of alternative analyses and temporal sure preference to derive posterior judgements that we show are closer to what de Finetti termed prevision than the corresponding judgements from your original analysis. We argue that posterior belief assessment is a tractable and powerful alternative to robust Bayesian analysis and illustrate with an example of calibrating an expensive ocean model in order to quantify uncertainty about global mean temperature in the real ocean.
Click here to insert in your Google calendar
 
May 14 Thu Zdzislaw Brzezniak (York) Statistics Seminar
14:00 Strong and weak solutions to stochastic Landau-Lifshitz equations
Hicks Seminar Room J11
  Abstract:
I will speak about the existence of weak solutions (and the existence and uniqueness of strong solutions) to the stochastic Landau-Lifshitz equations for multi (and one)-dimensional spatial domains. I will also describe the corresponding Large Deviations principle and it's applications to a ferromagnetic wire. The talk is based on a joint works with B. Goldys and T. Jegaraj.
Click here to insert in your Google calendar
 
Sep 24 Thu Nic Freeman (Sheffield) Statistics Seminar
14:15 Cluster growth in a forest fire model.
LT7
  Abstract:
I will discuss the limiting behaviour of a mean field forest fire model as the size of the model tends to infinity. The model is closely related to the dynamical Erdős–Rényi random graph. We study a particular regime in which the model displays self-organized criticality and produces clusters of heavy tailed size.
Click here to insert in your Google calendar
 
Sep 24 Thu Remco van Hofstad (Eindhoven ) Statistics Seminar
15:45 Competition and diffusion in random graphs (The 2015 Applied Probability Trust Lecture)
LT7
  Abstract:
Empirical findings have shown that many real-world networks share fascinating features. Indeed, many real-world networks are small worlds, in the sense that typical distances are much smaller than the size of the network. Further, many real-world networks are scale-free in the sense that there is a high variability in the number of connections of the elements of the networks, making these networks highly inhomogeneous. Such networks are typically modeled using random graphs with power-law degree sequences. In this lecture, we will investigate the behavior of competition processes on scale-free random graphs with finite-mean, but infinite-variance degrees. Take two vertices uniformly at random, or at either side of an edge chosen uniformly at random, and place an individual of two distinct types at these two vertices. Equip the edges with traversal times, which could be different for the two types. Then let each of the two types invade the graph, such that any other vertex can only be occupied by the types that gets there first. Let the speed of the types be the inverse of the expected traversal times of an edge by that types. We distinguish two cases. When the traversal times are exponential, we see that one (not necessarily the faster) types will occupy almost all vertices, while the losing types only occupied a bounded number of vertices. This is reflected in the ABBA lyrics ``The winner takes it all, the loser's standing small''. In particular, no asymptotic coexistence can occur. Work in progress investigates whether this occurs more generally. On the other hand, for deterministic traversal times, the fastest types always gets the majority of the vertices, while the other occupies a subpolynomial number. When the speeds are the same, asymptotic coexistence (in the sense that both types occupy a positive proportion of the vertices) occurs with positive probability. This lecture is based on joint work with Mia Deijfen, Julia Komjathy and Enrico Baroni, and builds on earlier work with Gerard Hooghiemstra, Shankar Bhamidi and Dmitri Znamenski.
Click here to insert in your Google calendar
 
Nov 26 Thu Francisco Alejandro Díaz De la O (Liverpool) Statistics Seminar
14:00 Subset Simulation for Bayesian Updating and Model Selection
Lecture Theatre B
  Abstract:
On the one hand, the problems of model updating and model selection can be tackled using a Bayesian approach: the model parameters to be identified are treated as uncertain and the inference is done in terms of their posterior distribution. On the other hand, the engineering structural reliability problem can be solved by advanced Monte Carlo simulation techniques such as Subset Simulation. Recently, a formulation that connects the Bayesian updating problem and the structural reliability problem has been established. This opens up the possibility of efficient model calibration and model selection using Subset Simulation. The formulation, called BUS (Bayesian Updating with Structural reliability methods), is based on a rejection principle. Its theoretical correctness and efficiency requires the prudent choice of a multiplier, which has remained an open question. Motivated by this problem, this talk presents a study of BUS. The discussion will lead to a revised formulation that allows Subset Simulation to be used for Bayesian updating and model selection without having to choose a multiplier in advance.
Click here to insert in your Google calendar
 
Feb 12 Fri Simon Tavare (Cambridge) Statistics Seminar
14:00 How often does a random mapping have distinct component sizes?
Hicks Seminar Room J11
  Abstract:
One of the classical results about a random permutation of $[n] = \{1,2, \ldots,n\}$ is that the probability it has distinct cycle lengths is asymptotically $\exp(-\gamma) \approx 0.561$; here $\gamma$ is Euler’s constant. In this talk I will discuss the analogous problem for a broad class of random decomposable combinatorial structures that includes random mappings. I will illustrate how discrete process approximations can be used to answer the question in the title, and many related problems, in a very simple way. As a by-product I will describe some interesting methods for simulating the component count process of these structures.
Click here to insert in your Google calendar
 
Feb 18 Thu Joakim Beck (UCL) Statistics Seminar
14:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
Mar 3 Thu Andrew Golightly (Newcastle) Statistics Seminar
14:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
Mar 17 Thu Jim Griffin (Kent) Statistics Seminar
14:00 Adaptive MCMC schemes for variable selection problems Co-authors: Krys Latuszynski and Mark Steel
Hicks, Lecture theatre C
  Abstract:
Data set with many variables (often, in the hundreds, thousands, or more) are routinely collected in many disciplines. This has lead to interest in variable selection in regression models with a large number of variables. A standard Bayesian approach defines a prior on the model space and uses Markov chain Monte Carlo methods to sample the posterior. Unfortunately, the size of the space (2^p if there are p potential variables) and the use of simple proposals in Metropolis-Hastings steps has lead to samplers that mix poorly over models. In this talk, I will describe two adaptive Metropolis-Hastings scheme which adapts an independence proposals to the posterior distribution. This leads to substantial improvements in the mixing over standard algorithms in large data sets. The methods will be illustrated on simulated and real data with with hundreds or thousands of possible variables.
Click here to insert in your Google calendar
 
Apr 21 Thu Ruth King (Edinburgh) Statistics Seminar
14:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
May 5 Thu Pete Dodd (Sheffield) Statistics Seminar
14:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
May 19 Thu Dler Kadir and Abdulaziz Alenazi (Sheffield) Statistics Seminar
14:00 Dler: Markov chain Monte Carlo estimation for autoregressive time series Abdelaziz: A fully Bayesian differential-shrinkage approach to incorporating functional genomic information into case-control fine mapping studies
Hicks, Lecture Theatre C
  Abstract:
Dler: The purpose of this talk is to discuss Markov chain Monte Carlo estimation (MCMC) for stationary autoregressive time series. In order to do this, we need to derive the stationary conditions to put priors for estimating parameters of autoregressive models. Therefore, first, we study stationary conditions because the stationary affects what priors we will set up in a Bayesian setting. Next, we will apply MCMC in order to estimate parameters based on mentioned priors. Our interest is focused on the autoregressive model of order p (AR(p)) and the development and utility of Bayesian inference. One of the major obstacles in setting up a Bayesian estimation procedure for autoregressive models is the assumption of stationarity. In our view it is the reason why Bayesian estimation for such models is relatively limited. In this talk the stationary conditions of AR(2) to AR(3) are revisited. We show that for the most general model AR(p) one can achieve sufficient stationary conditions, consisting of a set of linear inequalities. This can then be exploited to set up a Metropolis within Gibbs simulation scheme. We discuss in some detail the problem in the case of AR(3) and we propose a second MCMC scheme for the AR(3) model. Throughout, we use simulated data to illustrate the proposed methodology. Abdelaziz: Bayesian approaches are particularly useful in fine mapping case-control studies as they naturally allow the inclusion of prior information relating to functional significance. We use the normal-gamma (NG) prior proposed by Griffin and Brown and modify it to allow the inclusion of function information in the form of published functional significance scores. These scores assimilate functional information from many online sources and combine them into a single score. Rather than use the correct logistic likelihood for the response which is computationally more demanding, we use the asymptotic Gaussian distribution for our maximum likelihood estimate of the model coefficients (log odds ratios). This enables us to speed up our MCMC analysis by using the Gaussian linear model framework. The NG prior assumes a hierarchical form for the coefficients which is similar to the normal-exponential-gamma prior used in Hyperlasso but allows more flexibility in the shrinkage imposed by the prior. We calibrate the NG hyperparameters using published top hits from large breast cancer genome wide association studies. We allow the functional significance scores to alter the prior probability density function of the log odds ratio on a SNP by SNP basis and show how this can be used to improve the detection of causal variants. We show by using simulated case-control data that our modified NG prior can give higher true positive rates at relevant low false positive rates compared to logistic regression, piMASS, HyperLASSO and the standard NG prior.
Click here to insert in your Google calendar
 
Oct 26 Wed Professor Stephen Senn (Competence Center for Methodology and Statistics, CRP-Sante) Statistics Seminar
17:15 Numbers needed to mislead, meta-analysis and muddled thinking
Lecture Theatre 4, The Diamond
  Abstract:
The ardent espousal by the evidence based medicine movement of numbers needed to treat (NNT) as a way of making difficult statistical concepts simple and concrete, has has the unintended consequence of sowing confusion. Many users, including many in the evidence based movement themselves, have interpreted these statistics as indicating what proportion of patients benefit from treatment. However, they cannot deliver this information.

I shall explain this, with the example of a recent Cochrane Collaboration meta-analysis of paracetamol against placebo in trials of tension headache for which the plain language summary claimed:

The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo (high quality evidence), meaning that only 10 in 100 people benefited because of paracetamol 1000 mg.

With the aid of a simple model also illustrated (just for fun) by a simulation, I shall show that the plain language conclusion is plain wrong. The observed facts do not necessarily mean that only 10 in 100 people benefited. The combination of arbitrary dichotomies and NNTs has a dangerous ability to deceive and may be leading us to expect much more of personalised medicine than it can deliver.

All welcome. Admission to the lecture is free, but registration is required.

Click here to insert in your Google calendar
 
Oct 27 Thu Alison Parton (Sheffield, SoMaS) Statistics Seminar
14:00 A hybrid MCMC sampler for inferring animal movements and behaviours from GPS observations
F20
  Abstract:
Although animal locations gained via GPS, etc. are typically observed on a discrete time scale, movement models formulated in continuous time are preferable; avoiding the struggles experienced in discrete time when faced with irregular observations or the prospect of comparing analyses on different time scales. A class of models able to emulate a range of movement ideas are defined by representing movement as a combination of stochastic processes describing both speed and bearing. This framework can then be extended to allow multiple behavioural modes through a continuous time Markov process. Bayesian inference for such models is described through the use of a hybrid MCMC approach. Such inference relies on an augmentation of the animal’s locations in discrete time, with a more detailed movement path gained via simulation techniques. Simulated and real data on an individual reindeer (Rangifer tarandus) will illustrate the presented methods.
Click here to insert in your Google calendar
 
Nov 3 Thu Heiko Strathmann (Gatsby, UCL) Statistics Seminar
14:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
Nov 10 Thu Dr David Wyncoll (HR Wallingford) Statistics Seminar
14:00 National-scale multivariate extreme value analysis for coastal flood risk analysis
Hicks Seminar Room J11
  Abstract:
Coastal flooding in the UK is driven by the joint occurrence of large waves, winds and sea levels. In order to quantify the flood risk at a single site it is important to study the dependence between these variables in extreme values. The spatial dependence between coastal locations is also important for quantifying the likelihood of single large-scale coastal flooding events. We present a national-scale multivariate extreme value analysis of offshore drivers of coastal flooding in England and Wales. This appropriately captures dependences between both extreme and non-extreme driving variables at and between multiple coastal locations. The output of this analysis is a large Monte Carlo sample of plausible joint events that may be propagated though a chain of emulated numerical models to estimate the risk of large-scale coastal flooding.
Click here to insert in your Google calendar
 
Nov 10 Thu Sajni Malde ( HR Wallingford) Statistics Seminar
15:30
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
Dec 8 Thu Stefano Castruccio (Newcastle) Statistics Seminar
14:00 Global Space-Time Emulators for Ensemble of Opportunities: Assessing Scenario Uncertainty for CMIP5
Hicks Seminar Room J11
  Abstract:
Simulating Earth System Models (ESMs) is among the most challenging exercises of contemporary science. ESMs require an extremely high-dimensional input comprising of a value of the forcing scenario for each year, and produce an even higher dimensional output in space, time and variables. Given the considerable computational and logistic challenges of performing even a small set of simulations, an ensemble comprises of a very limited number of runs. In the case of the CMIP5 ensemble, the reference for the latest IPCC assessment report, each modelling group submitted long-term simulations under at most four scenarios, thus providing very limited information for policy making. An emulator in scenario space can be developed to overcome these limitations. However, the modest number of runs, paired with the extremely large dimensionality of the input and output space, poses significant challenges in the development of the statistical methodology. In this talk, I will present a scenario emulator for ESMs that leverages on the temporal structure of the input/output space, on the causality principle and on the gridded geometry of the output. I will present an application to this methodology for temperature and wind data in the case of two ensembles, and I will show how the emulator provides accurate results for a dataset of tens of millions of data points.
Click here to insert in your Google calendar
 
Dec 8 Thu Finn Lindgren (Edinburgh) Statistics Seminar
15:30 EUSTACE: Latent Gaussian process models for weather and climate reconstruction
Hicks Seminar Room J11
  Abstract:
The EUSTACE project will give publicly available daily estimates of surface air temperature since 1850 across the globe for the first time by combining surface and satellite data using novel statistical techniques. To this end, a spatio-temporal multiscale statistical Gaussian random field model is constructed, using connections between SPDEs and Markov random fields to obtain sparse matrices for the practical computations. The extreme size of the problem necessitates the use of iterative solvers, making use of the multiscale structure of the model to design an effective preconditioner.
Click here to insert in your Google calendar
 
Oct 12 Thu Dino Sejdinovic (Oxford) Statistics Seminar
14:00 Approximate Kernel Embeddings and Symmetric Noise Invariance
LT 9
  Abstract:
Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric hypothesis testing and for learning on distributional inputs. I will give an overview of this framework and present some of the applications of the approximate kernel embeddings to Bayesian computation. Further, I will discuss a recent modification of MMD which aims to encode invariance to additive symmetric noise and leads to learning on distributions robust to the distributional covariate shift, e.g. where measurement noise on the training data differs from that on the testing data. https://arxiv.org/abs/1703.07596
Click here to insert in your Google calendar
 
Oct 19 Thu Mauricio Alvarez (Sheffield) Statistics Seminar
14:00
Click here to insert in your Google calendar
 
Nov 9 Thu Arthur Gretton (UCL) Statistics Seminar
14:00
Click here to insert in your Google calendar
 
Nov 16 Thu Timothy Waite (Manchester) Statistics Seminar
14:00
Click here to insert in your Google calendar
 
Dec 7 Thu Maria Kalli (Kent) Statistics Seminar
14:00
Click here to insert in your Google calendar
 
Feb 15 Thu Jeremy Colman (Sheffield) Statistics Seminar
15:00 Stan: better faster MCMC - A user review
F41
Click here to insert in your Google calendar
 
Apr 19 Thu Martine Barrons (Warwick) Statistics Seminar
14:00
LT3
Click here to insert in your Google calendar
 
Feb 7 Thu Jeremy Colman (Sheffield) Statistics Seminar
14:00 Accounting for Uncertainty in Estimates of Extremes
LT E
  Abstract:
Devastating consequences can flow from the failure of certain structures, such as coastal flood defences, nuclear installations, and oil rigs. Their design needs to be robust under rare (p < 0.0001) extreme conditions, but how can the designers use data typically from only a few decades to predict the size of an event that might occur once in 10,000 years? Extreme Value Theory claims to provide a sound basis for such far-out-of-sample prediction, and using Bayesian methods a full posterior distribution can be obtained. If the past data are supplemented by priors that take into account expert opinion, seemingly tight estimates result. Are such claims justified? Has all uncertainty been taken into account? My research is addressing these questions.
Click here to insert in your Google calendar
 
Feb 21 Thu Sophia Wright (Warwick) Statistics Seminar
14:00 Bayesian Networks, Total Variation and Robustness
LT E
  Abstract:
This talk explores the robustness of large Bayesian Networks when applied in decision support systems which have a pre-specified subset of target variables. We develop new methodology, underpinned by the total variation distance, to determine whether simplifications which are currently employed in the practical implementation of such graphical systems are theoretically valid. This same process can identify areas of the system which should be prioritised if elicitation is required. This versatile framework enables us to study the effects of misspecification within a Bayesian network (BN), and also extend the methodology to quantify temporal effects within Dynamic BNs. Unlike current robustness analyses, our new technology can be applied throughout the construction of the BN model; enabling us to create tailored, bespoke models. For illustrative purposes we shall explore the field of Food Security within the UK.
Click here to insert in your Google calendar
 
Feb 28 Thu Wil Ward (Sheffield) Statistics Seminar
14:00 A Variational Approach to Approximating State Space Gaussian Processes
LT E
  Abstract:
The state space representation of a Gaussian process (GP) models the dynamics of an unknown (non-linear) function as a white-noise driven Itô differential equation. Representation in this form allows for the construction of joint models that mix known dynamics (e.g. population) with latent unknown input. Where these interactions are non-linear, or observed through non-Gaussian likelihoods, there is no exact solution and approximation techniques are required. This talk introduces an approach using black box variational inference to model surrogate samples and estimate the underlying parameters. The approximations are compared with full batch solutions and demonstrated to be indistinguishable in two-sample tests. Software and implementation challenges will also be addressed.
Click here to insert in your Google calendar
 
Mar 7 Thu Christian Fonseca Mora (Costa Rica) Statistics Seminar
14:00 Stochastic PDEs in Infinite Dimensional Spaces
LT E
  Abstract:
In this talk we will give an introduction to SPDEs in spaces of distributions. In the first part of the talk we consider a model of environmental pollution with Poisson deposits that will help to introduce the basic concepts for the study of SPDEs on infinite dimensional spaces. In the second part of the talk, we introduce a generalized form of SPDEs in spaces of distributions and explain conditions for the existence and uniqueness of its solutions. For this talk we will not assume any previous knowledge on SPDEs.
Click here to insert in your Google calendar
 
Mar 14 Thu Jeremy Oakley (Sheffield) Statistics Seminar
14:00 Variational inference reading group
LT E
  Abstract:
We will be spending two seminar slots on the following: Variational Inference: A Review for Statisticians https://arxiv.org/abs/1601.00670 David M. Blei, Alp Kucukelbir, Jon D. McAuliffe
Click here to insert in your Google calendar
 
Mar 21 Thu Theo Kypraios (Nottingham) Statistics Seminar
14:00 Recent Advances in Identifying Transmission Routes of Healthcare Associated Infections using Whole Genome Sequence Data
LT E
  Abstract:
Healthcare-associated infections (HCAIs) remain a problem worldwide, and can cause severe illness and death. It is estimated that 5-10% of acute-care patients are affected by nosocomial infections in developed countries, with higher levels in developing countries.
Statistical modelling has played a significant role in increasing understanding of HCAI transmission dynamics. For instance, many studies have investigated the dynamics of MRSA transmission in hospitals, estimating transmission rates and the effectiveness of various infection control measures. However, uncertainty about the true routes of transmission remains and that is reflected on the uncertainty of parameters governing transmission. Until recently, the collection of whole genome sequence (WGS) data for bacterial organisms has been prohibitively complex and expensive. However, technological advances and falling costs mean that DNA sequencing is becoming feasible on a larger scale.
In this talk we first describe how to construct statistical models which incorporate WGS data with regular HCAIs surveillance data (admission/discharge dates etc) to describe the pathogen's transmission dynamics in a hospital ward. Then, we show how one can fit such models to data within a Bayesian framework accounting for unobserved colonisation times and imperfect screening sensitivity using efficient Markov Chain Monte Carlo algorithms. Finally, we illustrate the proposed methodology using MRSA surveillance data collected from a hospital in North-East Thailand.
Click here to insert in your Google calendar
 
Mar 28 Thu Jeremy Oakley (Sheffield) Statistics Seminar
14:00 Variational inference reading group
LT E
  Abstract:
We will be spending two seminar slots on the following: Variational Inference: A Review for Statisticians https://arxiv.org/abs/1601.00670 David M. Blei, Alp Kucukelbir, Jon D. McAuliffe
Click here to insert in your Google calendar
 
Apr 2 Tue Arne Grauer, Lukas Lüchtrath (Cologne) Statistics Seminar
16:00 The age-dependent random connection model
F28
  Abstract:
We consider a class of growing graphs embedded into the $d$-dimensional torus where new vertices arrive according to a Poisson process in time, are randomly placed in space and connect to existing vertices with a probability depending on time, their spatial distance and their relative ages. This simple model for a scale-free network is called the age-based spatial preferential attachment network and is based on the idea of preferential attachment with spatially induced clustering. The graphs converge weakly locally to a variant of the random connection model, which we call the age-dependent random connection model. This is a natural infinite graph on a Poisson point process where points are marked by a uniformly distributed age and connected with a probability depending on their spatial distance and both ages. We use the limiting structure to investigate asymptotic degree distribution, clustering coefficients and typical edge lengths in the age-based spatial preferential attachment network.
Click here to insert in your Google calendar
 
May 9 Thu Rebecca Killick (Lancaster) Statistics Seminar
14:00 Computationally Efficient Multivariate Changepoint Detection with Subsets
LT E
  Abstract:
Historically much of the research on changepoint analysis has focused on the univariate setting. Due to the growing number of high dimensional datasets there is an increasing need for methods that can detect changepoints in multivariate time series. In this talk we focus on the problem of detecting changepoints where only a subset of the variables under observation undergo a change, so called subset multivariate changepoints. One approach to locating changepoints is to choose the segmentation that minimises a penalised cost function via a dynamic program. The work in this presentation is the first to create a dynamic program specifically for detecting changes in subset-multivariate time series. The computational complexity of the dynamic program means it is infeasible even for medium datasets. Thus we propose a computationally efficient approximate dynamic program, SPOT. We demonstrate that SPOT always recovers a better segmentation, in terms of penalised cost, then other approaches which assume every variable changes. Furthermore under mild assumptions the computational cost of SPOT is linear in the number of data points. In small simulation studies we demonstrate that SPOT provides a good approximation to exact methods but is feasible for datasets that contain thousands of variables observed at millions of time points. Furthermore we demonstrate that our method compares favourably with other commonly used multivariate changepoint methods and achieves a substantial improvement in performance when compared with fully multivariate methods.
Click here to insert in your Google calendar
 
May 16 Thu Christopher Fallaize (Nottingham) Statistics Seminar
14:00 Unlabelled Shape Analysis with Applications in Bioinformatics
LT E
  Abstract:
In shape analysis, objects are often represented as configurations of points, known as landmarks. The case where the correspondence between landmarks on different objects is unknown is called unlabelled shape analysis. The alignment task is then to simultaneously identify the correspondence between landmarks and the transformation aligning the objects. In this talk, I will discuss the alignment of unlabelled shapes, and discuss two applications to problems in structural bioinformatics. The first is a problem in drug discovery, where the main objective is to find the shape information common to all, or subsets of, a set of active compounds. The approach taken resembles a form of clustering, which also gives estimates of the mean shapes of each cluster. The second application is the alignment of protein structures, which will also serve to illustrate how the modelling framework can incorporate very general information regarding the properties we would like alignments to have; in this case, expressed through the sequence order of the points (amino acids) of the proteins.
Click here to insert in your Google calendar
 
Oct 10 Thu Richard Glennie (St Andrews) Statistics Seminar
14:00 Modelling latent processes in population abundance surveys using hidden Markov models
K14
  Abstract:
Distance sampling and spatial capture-recapture are statistical methods to estimate the number of animals in a wild population based on encounters between these animals and scientific detectors. Both methods estimate the probability an animal is detected during a survey, but do not explicitly model animal movement and behaviour. The primary challenge is that animal movement in these surveys is unobserved; one must average over all possible histories of each individual. In this talk, a general statistical model, with distance sampling and spatial capture-recapture as special cases, is presented that explicitly incorporates animal movement. An algorithm to integrate over all possible movement paths, based on quadrature and hidden Markov modelling, is given to overcome common computational obstacles. For distance sampling, simulation studies and case studies show that incorporating animal movement can reduce the bias in estimated abundance found in conventional models and expand application of distance sampling to surveys that violate the assumption of no animal movement. For spatial capture-recapture, continuous-time encounter records are used to make detailed inference on where animals spend their time during the survey. In surveys conducted in discrete occasions, maximum likelihood models that allow for mobile activity centres are presented to account for transience, dispersal, and heterogeneous space use. These methods provide an alternative when animal movement causes bias in standard methods and the opportunity to gain richer inference on how animals move, where they spend their time, and how they interact.
Click here to insert in your Google calendar
 
Oct 14 Mon Jeremy Oakley (Sheffield) Statistics Seminar
13:00 Deep Learning reading group: Chapter 6 from Goodfellow et al. (2016)
LT 6
  Abstract:
Discussion of Chapter 6 from "Deep Learning", by Goodfellow, Bengio and Courville https://www.deeplearningbook.org/
Click here to insert in your Google calendar
 
Oct 15 Tue Emma Gordon (Director of Administrative Data Research UK) Statistics Seminar
16:00 Royal Statistical Society (RSS) Sheffield Local group seminar.
The potential and pitfalls of linked administrative data
LT B
  Abstract:
Administrative databases that are linked with each other or with survey data can allow deeper insights into the population’s life trajectories and needs and signal opportunities for improved and ultimately more personalised service delivery. Yet government agencies have to meet several prerequisites to realise these benefits. First among them is a stable legal basis. Appropriate laws and regulations have to exist to allow data merging within the limits of existing privacy protection. When different institutions are involved, these regulations have to clearly define each agencies’ responsibilities in collecting, safeguarding and analysing data. Second are technical requirements. This includes creating a safe infrastructure for data storage and analysis and developing algorithms to match individuals when databases do not share common unique personal identifiers. Third is the buy-in of the population. Public communication can highlight the value-added of linked databases and outline the steps taken to ensure data security and privacy. Involving citizens in dialogues about what data uses they are and are not comfortable with can help build public trust that appropriate limits are set and respected.
Click here to insert in your Google calendar
 
Oct 24 Thu Lyudmila Mihaylova (Sheffield) Statistics Seminar
14:00 Nonparametric Methods and Models with Uncertainty Propagation
LT E
  Abstract:
We are experiencing an enormous growth and expansion of data provided by multiple sensors. The current monitoring and control systems face challenges both in processing big data and making decisions on the phenomena of interest at the same time. Urban systems are hugely affected. Hence, intelligent transport and surveillance systems need efficient methods for data fusion, tracking and prediction of individual vehicular traffic and aggregated flows. This talk will focus on two main methods able to solve such monitoring problems, by fusing multiple types of data while dealing with nonlinear phenomena – sequential Markov Chain Monte Carlo (SMCMC) methods with adaptive subsampling and Gaussian Process regression methods. The first part of this talk will present a SMCMC approach able to deal with massive data based on adaptively subsampling the sensor measurements. The main idea of the method to approximate the logarithm of the likelihood ratio by performing a trade-off between complexity and accuracy. The approach efficiency will be demonstrated on object tracking tasks. Next, Gaussian Process methods will be presented – for point and extended object tracking, i.e. both in space and in time. Using the derivatives of the Gaussian Process leads to an efficient replacement of multiple models that usually are necessary to represent the whole range of behaviour of a dynamic system. These methods give the opportunity to assess the impact of uncertainties, e.g. from the sensor data on the developed solutions.
Click here to insert in your Google calendar
 
Oct 28 Mon Jeremy Oakley (Sheffield) Statistics Seminar
13:00 Deep Learning reading group: 6.5-7.2 from Goodfellow et al. (2016)
LT 6
Click here to insert in your Google calendar
 
Oct 31 Thu Tom Hutchcroft (Cambridge) Statistics Seminar
14:00 Phase transitions in hyperbolic spaces
LT E
  Abstract:
Many questions in probability theory concern the way the geometry of a space influences the behaviour of random processes on that space, and in particular how the geometry of a space is affected by random perturbations. One of the simplest models of such a random perturbation is percolation, in which the edges of a graph are either deleted or retained independently at random with retention probability p. We are particularly interested in phase transitions, in which the geometry of the percolated subgraph undergoes a qualitative change as p is varied through some special value. Although percolation has traditionally been studied primarily in the context of Euclidean lattices, the behaviour of percolation in more exotic settings has recently attracted a great deal of attention. In this talk, I will discuss conjectures and results concerning percolation on the Cayley graphs of nonamenable groups and hyperbolic spaces, and give the main ideas behind our recent result that percolation in any transitive hyperbolic graph has a non-trivial phase in which there are infinitely many infinite clusters. The talk is intended to be accessible to a broad audience.
Click here to insert in your Google calendar
 
Nov 7 Thu Deborah Ashby (Imperial College London, President Royal Statistical Society) Statistics Seminar
14:15 Royal Statistical Society (RSS) Sheffield Local group seminar.
Pigeon-holes and mustard seeds: Growing capacity to use data for society
Hicks Seminar Room J11
  Abstract:
The Royal Statistical Society was founded to address social problems ‘through the collection and classification of facts’, leading to many developments in the collection of data, the development of methods for analysing them, and the development of statistics as a profession. Nearly 200 years later an explosion in computational power has led, in turn, to an explosion in data. We outline the challenges and the actions needed to exploit that data for the public good, and to address the step change in statistical skills and capacity development necessary to enable our vision of a world where data are at the heart of understanding and decision-making.
Click here to insert in your Google calendar
 
Nov 11 Mon CANCELLED Statistics Seminar
13:00 Deep Learning reading group
LT 6
Click here to insert in your Google calendar
 
Nov 21 Thu Leo Bastos (LSHTM) Statistics Seminar
14:00 Modelling reporting delays for disease surveillance data
LT E
  Abstract:
One difficulty for real-time tracking of epidemics is related to reporting delay. The reporting delay may be due to laboratory confirmation, logistic problems, infrastructure difficulties and so on. The ability to correct the available information as quickly as possible is crucial, in terms of decision making such as issuing warnings to the public and local authorities. A Bayesian hierarchical modelling approach is proposed as a flexible way of correcting the reporting delays and to quantify the associated uncertainty. Implementation of the model is fast, due to the use of the integrated nested Laplace approximation (INLA). The approach is illustrated on dengue fever incidence data in Rio de Janeiro, and Severe Acute Respiratory Illness (SARI) data in Paraná state, Brazil.
Click here to insert in your Google calendar
 
Nov 28 Thu POSTPONED: Marcel Ortgiese (Bath) Statistics Seminar
14:00
LT E
Click here to insert in your Google calendar
 
Dec 5 Thu POSTPONED: Heather Battey (Imperial) Statistics Seminar
14:00 Aspects of high-dimensional inference
LT 10
Click here to insert in your Google calendar
 
Dec 12 Thu Jeremy Colman (Sheffield) Statistics Seminar
14:00 Simulation-Based Calibration (SBC)
LT E
  Abstract:
SBC is a relatively new method for checking Bayesian inference algorithms. Its advocates (Talts et al. (2017)) argue that it identifies inaccurate computation and inconsistencies in model implementation and also provides graphical summaries to indicate the nature of the underlying problems. An example of such a summary is given. Although SBC has emerged from the Stan development team it is applicable to any Bayesian model that is capable of generating posterior samples. It does not require the use of any particular modelling language. I shall explain why there might indeed be a gap that SBC could fill, demonstrate how SBC works in practice, and discuss the balance between its costs and benefits.
Click here to insert in your Google calendar
 
Feb 13 Thu Ines Krissaane (Sheffield) Statistics Seminar
14:00 Robustness of Variational Inference under Model Misspecification
LT 6
  Abstract:
In many complex scientific problems, we deal with a model that is misspecified relative to the data generating process, in the sense that there is no parameter setting that allows the model to perfectly replicate the data. We will review the recent paper Generalized Variational Inference (https://arxiv.org/pdf/1904.02063.pdf) and expose arguments for using VI under model misspecification. As an application, we will focus on the Hodgkin Huxley model of action potentials, and infer parameters from uncertain experimental measurements using a variational auto encoder method.
Click here to insert in your Google calendar
 
Feb 27 Thu Mark Dunning, Tim Freeman, Sorkatis Kariotis (Sheffield) Statistics Seminar
16:30 Statistical and Data Analysis Challenges in Bioinformatics
K14
  Abstract:
Bioinformatics is a multi-disciplinary subject that combines aspects of biology, computer science and statistics. Modern experimental techniques are able to generate vast amounts of data that can profile an individual's genome and offer insights into the development of disease and potential novel therapeutics. In this talk, I will describe the challenges faced by Bioinformaticians trying to deal with such data on a daily basis and the opportunities for collaboration with other disciplines to develop new analytical methods.
Click here to insert in your Google calendar
 
Mar 19 Thu Susan Cox (KCL) Statistics Seminar
14:00
LT 6
Click here to insert in your Google calendar
 
Apr 30 Thu Heather Battey (Imperial) Statistics Seminar
14:00
LT 6
Click here to insert in your Google calendar
 
May 14 Thu Steven Julious (Sheffield) Statistics Seminar
16:00 Florence Nightingale: The Passionate Statistician
https://teams.microsoft.com/l/meetup-join/19%3ameeting_YjVlZTY1NTItNGU4Mi00N2ZjLThmYWEtM2Y1NjExNjc5MTA1%40thread.v2/0?context=%7b%22Tid%22%3a%2219c3a1c9-f583-4a18-b6ad-75cc9c14243c%22%2c%22Oid%22%3a%22da5c99d8-843a-4aa7-84c2-29a3732945ed%22%7d
  Abstract:
The Passionate Statistician was the title given to Florence Nightingale by her first biographer Sir Edward Cook. Florence Nightingale was a firm believer in the accurate quantification of evidence to inform decisions. It was her belief in the accurate collection and presentation of data that informed the work she undertook to improve military hospitals.

She was of the view that “to understand God’s thoughts, we must study statistics for these are the measure of His purpose” and she used her statistical abilities to inform debates that led to a decline in preventable deaths in military and civilian hospitals.

This year marks 200 years since the birth of Florence Nightingale and in this talk Steven will pay tribute to her work in statistics and its long lasting impact.

The webinar will take place on Microsoft Teams - you can join on the web or on the Teams app (if you have it), but you should not need to have an account.
Click here to insert in your Google calendar
 
May 27 Wed Adam Butler (BIOSS) Statistics Seminar
14:00
LT 6
Click here to insert in your Google calendar
 
May 12 Wed Kevin Wilson and Cameron Williams (Newcastle) Statistics Seminar
14:00 A comparison of prior distribution aggregation methods
Google Meet
  Abstract:
When eliciting prior distributions from experts, it may be desirable to combine them into a single group prior. There are many methods of expert-elicited prior aggregation, which can roughly be categorised into two types. Mathematical aggregation methods combine prior distributions using a mathematical rule, while behavioural aggregation methods assist the group of experts to come to a consensus prior through discussion. As many commonly used aggregation methods have different requirements in the elicitation stage, there are few, if any, comparisons between them.

Using a clinical trial into a novel diagnostic test for Motor Neuron Disease as a case study, we elicited a number of prior distributions from a group of experts. We then aggregated these prior distributions using a range of mathematical aggregation methods, including Equal Weights linear pooling, the Classical Method, and a Bayesian aggregation method. We also undertook an in-person behavioural aggregation with the experts, using the Sheffield Elicitation Framework, or SHELF.

Using expert answers to seed questions, for which the elicitors know the true values, we compare and contrast the different aggregation methods and their performance. We also demonstrate how all considered aggregation methods outperform the individual experts.
Click here to insert in your Google calendar
 
May 22 Mon Lisa Hampson (Novartis) Statistics Seminar
14:00 Bayesian methods to improve quantitative decision making in drug development and the role of expert elicitation
Hicks LT 6 / meet.google.com/ony-uzab-qyz
  Abstract:
There are several steps to confirming the safety and efficacy of a new medicine. A sequence of trials, each with its own objectives, is usually required. Bayesian measures of risk, such as assurance or more generally probability of success (PoS), can be useful for informing decisions about whether a medicine should transition from one stage of development to the next. In this presentation, we describe a Bayesian approach for calculating PoS before pivotal (confirmatory) clinical trials are run which synthesizes internal clinical data, industry-wide success rates, and expert opinion or external data if needed. In particular, where there are differences between early phase and confirmatory trials due to a change in outcome for example, we propose eliciting expert judgements to relate existing data to the unknown quantities of interest. We discuss two approaches for establishing a multivariate distribution for several related efficacy treatment effects within the Sheffield Elicitation Framework (SHELF) and describe how they were applied to evaluate the PoS of the registrational program of an asthma drug. We conclude by reflecting on some of the opportunities and practical challenges encountered when using elicitation to support the evaluation of PoS.
Click here to insert in your Google calendar
 
Nov 9 Thu Wei Xing (Sheffield) Statistics Seminar
14:00 Reliable AI for Engineering
Hicks Seminar Room J11
  Abstract:
Artificial intelligence (AI) has seismically shifted the landscape across multiple domains including scientific computing, manufacturing, and engineering. However, the importance of Reliable AI extends beyond what general AI can offer, particularly in scenarios where the stakes are high. Reliable AI, as the name suggests, emphasizes reliability, robustness, and trustworthiness, crucial for real-world applications where uncertainties and high-stakes decisions are the norms. In this talk, I will share our development of reliable AI techniques using Bayesian models and how these methods can be implemented to improve problems in integrated circuit design and some other broader applications in engineering such as digital twins.
Click here to insert in your Google calendar
 
Nov 20 Mon Richard Wilkinson (Nottingham) Statistics Seminar
15:00 Adjoint-aided inference for latent force models
Hicks Seminar Room J11
  Abstract:
Linear systems occur throughout engineering and the sciences, most notably as differential equations. In many cases the forcing function for the system is unknown, and interest lies in using noisy observations of the system to infer the forcing, as well as other unknown parameters. In this talk I will show how adjoints of linear systems can be used to efficiently infer forcing functions modelled as Gaussian processes. Adjoints have recently come to prominence in machine learning, but mainly as an approach to compute derivatives of cost functions for differential equation models. Here, we use adjoints in a different way that allows us to analytically compute the least-squares estimator, or the full Bayesian posterior distribution of the unknown forcing. Instead of relying on solves of the original (forward model), we can recast the problem as n adjoint problems, where n is the number of data points. All that is required is the ability to solve adjoint systems numerically: it does not rely upon additional tractability of the linear system such as the ability to compute Green’s functions. We'll demonstrate this approach by inferring the pollution source in an advection-diffusion-reaction equation.
Click here to insert in your Google calendar
 
Feb 13 Tue Emmanouil Kalligeris (Sheffield) Statistics Seminar
15:00 A Twisted Markov Switching Mechanism for the Modelling of Incidence Rate Data
Hicks Seminar Room J11
  Abstract:
Various time series models have been used over the years to capture the dynamic behaviour of significant variables in various scientific fields such as epidemiology, seismology, meteorology, finance, etc. In this work, a conditional mean Markov regime switching model with covariates is proposed and studied for the analysis of incidence rate data. The components of the model are selected by both penalised likelihood techniques in conjunction with the Expectation Maximisation algorithm, with the aim of achieving a high level of robustness with respect to modelling the dynamic behaviour of epidemiological data. In addition to statistical inference, changepoint detection analysis is used to select the number of regimes, reducing the complexity associated with likelihood ratio tests. [Kalligeris EN, Karagrigoriou A, Parpoula C. (2023): On Stochastic Dynamic Modeling of Incidence Data. Int J Biostat, 10.1515/ijb-2021-0134]
Click here to insert in your Google calendar
 
Feb 27 Tue Prof. Robin Henderson (Newcastle University) Statistics Seminar
15:00 Event History and Topological Data Analysis
Hicks Seminar Room J11
  Abstract:
Topological data analysis has become popular in recent years, though mainly outside the statistical literature. In this talk we review some of the elements of topological data analysis and we show links to event history and survival analysis. We argue that exploiting topological data as event history can be useful in the analysis of data in the form of images. We propose a version of the well-known Nelson-Aalen cumulative hazard estimator for the comparison of topological features of random fields and for testing parametric assumptions. We suggest a Cox proportional hazards approach for the analysis of embedded metric trees. The Nelson-Aalen method is illustrated on globally distributed climate data and on neutral hydrogen distribution in the Milky Way. The Cox method is used to compare vascular patterns in fundus images of the eyes of healthy and diabetic retinopathy patients.
Click here to insert in your Google calendar
 
Apr 16 Tue Dominic Grainger and Dr Ben Wigley (Sheffield) Statistics Seminar
15:00 Dominic: The Efficient Modelling of Individual Animal Movement in Continuous Time; Ben: Stressing over shape: A Procrustean investigation of dental fluctuating asymmetry.
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
May 30 Thu Statistics UQ Reading group Statistics Seminar
14:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
Jun 11 Tue Dr Zexun Chen (Edinburgh) Statistics Seminar
15:00 Peer-induced Fairness: A Simple Causal Approach for Algorithmic Bias Discovery in Credit Approval
  Abstract:
In today's world, where AI and automation increasingly shape decision-making processes, ensuring algorithmic fairness is paramount. While much attention has been given to fairness concepts like statistical parity and equal opportunity, practical challenges in detecting and addressing bias remain. Traditional methods often involve embedding fairness metrics into algorithms, which can compromise their accuracy.

In this seminar, I will introduce a fundamental shift in tackling algorithmic bias by presenting our novel "peer-induced fairness" framework. This approach leverages counterfactual fairness and advanced causal inference techniques, including the Single World Intervention Graph, to detect bias at the individual level through peer comparisons and hypothesis testing. Focusing on the context of credit approval, our framework addresses common issues such as data scarcity and imbalance, and operates independently of specific decision-making methodologies, such as classifier selection. It provides explainable feedback to individuals who receive adverse decisions, distinguishing between algorithmic bias, discrimination, and the capabilities of the subjects involved. Our framework has been validated using a dataset of SMEs, demonstrating its effectiveness in identifying unfair practices and suggesting practical interventions. The results show that 'peer-induced fairness' not only improves fairness in algorithmic decisions but also serves as a flexible, transparent, and adaptable tool for diverse applications.

Finally, if time allows, I will present some of my working ideas around Gaussian process modelling, including multivariate Gaussian processes and constrained Gaussian processes.
Click here to insert in your Google calendar
 
Jul 1 Mon Statistics UQ Reading group Statistics Seminar
15:00
Hicks Seminar Room J11
Click here to insert in your Google calendar
 
Jul 16 Tue Jeremy Oakley (Sheffield) Statistics Seminar
15:00 Reading group: Auto-Encoding Variational Bayes (Kingma and Welling, https://arxiv.org/pdf/1312.6114)
Hicks LTD
Click here to insert in your Google calendar
 
Jul 23 Tue Jeremy Oakley (Sheffield) Statistics Seminar
14:00 Reading group: Auto-Encoding Variational Bayes (Kingma and Welling, https://arxiv.org/pdf/1312.6114) - Continued!
Hicks Seminar Room J11
Click here to insert in your Google calendar